Paper Sharing|《DeepCluster》

Clustering (Cluster) is a classic unsupervised learning method, but few works combine it with deep learning. This article proposes a new clustering method DeepCluster, which combines end-to-end learning and clustering, while learning the parameters of the network and clustering the features of the network output. The authors successfully applied DeepCluster to large-scale datasets and some migration tasks, outperforming current state of art unsupervised work. It shows that combined with simple clustering algorithms, unsupervised methods can also learn good features.

background

The pre-trained convolutional model has played a great role in various tasks, such as target detection and semantic segmentation. These pre-trained models have extracted some very good general features, which can be applied to different tasks. ImageNet has played a very good role in this process. Although ImageNet contains 1 million+ pictures, in practice this number is still very small, and the diversity of ImageNet is not enough. How to deal with larger-scale unlabeled data requires an effective unsupervised learning method.

Method

frame

This paper proposes a method that combines clustering and depth. This method can learn some useful general features. This framework is shown in the figure below. The whole process includes clustering features, and then based on the clustering results as pseudo label, update the parameters of the network, and let the network predict these pseudo-labels, these two processes are performed in sequence. This process seems simple, but can achieve better performance than previous unsupervised methods.

 

Expressing the whole process with mathematical formulas is the following two formulas. The first formula is to generate pseudo-labels through clustering, and the second formula is to calculate the loss value based on pseudo-labels, and then update the network parameters.

 

avoid trivial solutions

The above method of alternating clustering and model updating makes it easy for the network to find some tricky ways to get some meaningless results.

Empty clusters

Specifically, using the model to predict pseudo-labels may cause the features generated by the network to be located around a certain cluster center after clustering, leaving other cluster centers without samples. This problem is due to the fact that there is no limit that a cluster center cannot have no samples. One solution is to limit the minimum number of samples for each cluster center, which needs to calculate the entire data set, and the cost is too high; another way is to randomly select a non-empty cluster center when a cluster center is empty. Add some small perturbations as new cluster centers, and let the samples belonging to non-empty cluster centers also belong to new cluster centers.

Trivial parametrization

Another problem is that a large amount of data is clustered into a small number of categories. An extreme scenario is to be clustered into one category. In this case, the network may produce the same output for any input. A solution to this problem is to uniformly sample samples according to categories (or pseudo-labels).

implementation details

Structure: AlexNet, using BN instead of LRN; VGG16+BN.
Training data: ImageNet; the data is processed using a Sobel-based operator to remove color information
Optimization : when clustering, use the sample features of the center crop, and use data enhancement (left-right flip, random size and aspect ratio) when training the model cropping), other training are common configurations. In addition, PCA dimensionality reduction to 256 dimensions was used for clustering.

experiment

Preliminary study

The experimental part first looks at some changes in DeepCluster as the training process progresses. Here NMI (Normalized Mutual Information) is used to measure the interdependence of two random variables. For example, when two random variables are completely independent until one of them provides no information for inferring the other, the NMI value is also 0.

Let's look at the relationship between the cluster center and the real label of the picture (Fig 2(a)). From Fig 2(a), it can be seen that the dependence between the cluster center and the label becomes higher and higher with the training process, indicating that the feature gradually Contains information about the image category.

Let’s look at the relationship between the cluster center of the t-1epoch and the cluster center of the t epoch (Fig. 2(b)). It can be seen from Fig. 2(b) that the NMI is gradually increasing, indicating that the cluster center is gradually stabilizing. But in the end, the NMI saturation value is less than 0.8, indicating that each epoch has a batch of samples that are frequently changing their cluster centers.

Finally, look at the impact of choosing different K on the accuracy (Fig. 2(c)).

Activation-Based Linear Classification

Using different convolutional layer features to train a linear classifier, experiment on the ImageNet and Places datasets, the results are in the table below. On ImageNet, the performance of DeepCluster in the conv2 - conv5 layer exceeds other methods by varying degrees.

Experiment on the dataset Pascal VOC 2007

Apply the features extracted by the DeepCluster method to the data set Pascal VOC 2007, and compare the performance of different tasks, including image classification, target detection, and semantic segmentation. The experimental results are as follows. It can be seen that DeepCluster is different in the three tasks. degree of improvement.

discuss

The above experiments are based on the ImageNet and AlexNet structures. Let's compare the results of using different data sets and different structures.

ImageNet versus YFCC100M

ImageNet is a target-oriented classification, and the distribution of each category is relatively uniform. DeepCluster is more suitable for this situation, and the number of clusters matches the number of categories of ImageNet. In order to measure the impact of this data distribution, 1 million pictures are randomly selected from YFCC100M for pre-training. Statistics based on hashtag show that this batch of data is uneven. The performance of the pre-trained features based on ImageNet and YFCC 100M on different tasks. It can be seen that DeepCluster is robust to data distribution and can get some good general features.

AlexNet versus VGG

In supervised learning, deeper networks tend to have better performance, and we expect DeepCluster to have a similar effect. Using the features trained in ImageNet for Pascal VOC 2007 target detection, it can be seen that VGG-16 can achieve better performance than AlexNet.

Summarize

This paper proposes a simple and effective unsupervised method. This unsupervised pre-training method can also learn good general features, and the performance of using these features on the transfer task is getting closer to the way of supervised learning.

references:

1. Caron, Mathilde, et al. "Deep Clustering for Unsupervised Learning of Visual Features." arXiv preprint arXiv:1807.05520 (2018).

 

Welcome to follow our WeChat public account: geetest_jy

Guess you like

Origin blog.csdn.net/geek_wh2016/article/details/81231641