[Deep Learning] Comparative Learning

1. What is comparative learning?

        Some papers call contrastive learning self-supervised learning, and some papers call it unsupervised learning. Self-supervised learning is a form of unsupervised learning. There is no formal distinction between the two in the existing literature. Both titles can be used. The main idea is that the model can re-learn the characteristics of the encoder: shrink the distance between similar samples as much as possible, and widen the distance between positive and negative samples. This can be understood as making the boundaries of clustering more obvious.

        Self-supervised learning can avoid a large number of labels on the data set. Use self-defined pseudo labels as training signals, and then use the learned representations for downstream tasks. The purpose is to learn an encoder that encodes similar data similarly and makes the encoding results of different types of data as different as possible. It introduces more external information through proxy tasks to obtain a more general representation.

        First, we must distinguish the difference between supervised learning and unsupervised learning. The following figure is an example. The training data of supervised learning is labeled. Its purpose is to determine that Figures 1 and 2 are dogs, and Figure 3 is a cat. The training data of unsupervised learning has no labels. It only needs to determine that Figure 1 and Figure 2 are in the same category. Figure 3 is not the same category as Figure 1 and Figure 2. As for what the picture describes, for unsupervised learning It does not matter!

         Contrastive learning does not need to know the real label of each image, it only needs to know who is similar to whom and who is not similar to whom. Assume that the three pictures pass through a network and obtain the features f1, f2, and f3 corresponding to the three pictures. We hope that contrastive learning can pull f1 and f2 into the feature space and stay away from f3. In other words, the goal of contrastive learning is that all similar objects are in adjacent areas of the feature space, and dissimilar objects are in non-adjacent areas.

        Contrastive learning requires knowing who is similar to whom and who is not similar to whom. The implication is that contrastive learning also requires label information for supervised learning? The reason why contrastive learning is considered an unsupervised training method is that people can use proxy tasks (pretext tasks) to define who is similar to whom and who is not similar to whom. Pretext tasks are usually artificially set rules. These rules define which pictures are similar to which pictures and which pictures are not similar to which pictures, thereby providing a supervision signal to train the model, which is the so-called self-supervision. Data augmentation is a common means of implementing agent tasks.

2. Paradigm of contrastive learning

        The typical paradigm of contrastive learning is: agent task + objective function . Agent tasks and objective functions are also the biggest differences between contrastive learning and supervised learning.

        In the process of supervised learning, x is input, y is obtained through the model output, and the output y and the real label (ground truth) are used to calculate the loss through the objective function to perform model training.

        For unsupervised learning or self-supervised learning, there is no ground truth, so what should we do? The proxy task is here to solve this problem. We use the proxy task to define the positive and negative samples of contrastive learning. Once unsupervised learning has the output y and the real label, it needs an objective function to calculate the loss of the two to guide the model. learning direction. How do surrogate tasks and objective functions work in contrastive learning? The following is explained through the contrastive learning framework proposed by SimCLR.

 (1) Agent task stage. \widetilde{x_{i}}For the same sample x, two samples are generated respectively through two proxy tasks \widetilde{x_{j}}. simCLR belongs to the paper in the field of computer vision. The paper uses data enhancement methods as proxy tasks, such as random cropping of pictures, random color distortion, random Gaussian blur, \widetilde{x_{i}}and \widetilde{x_{j}}It is called a positive sample pair.

(2) Feature extraction encoder. f(⋅) is an encoder. There are no restrictions on the encoder used. ResNet is used in SimCLR, \widetilde{x_{i}}and the sum \widetilde{x_{j}}is obtained through f(⋅) . Can be understood as an embedding vector.h_{i}h_{j}

(3) MLP layer. After feature extraction, enter the MLP layer. SimCLR emphasizes that adding this MLP layer will be better than not adding it. The output of the MLP layer is where the objective function of contrastive learning works. The sum is output through the MLP z_{i}layer z_{j}.

(4) Objective function action stage. The loss function in contrastive learning is generally infoNCE loss, z_{i}and z_{j}the loss function of sum is defined as follows:

        Among them, N represents the number of samples in a batch, that is, for N samples in a batch, N pairs of positive sample pairs are obtained through data enhancement. At this time, there are a total of 2N samples. What are the negative samples? The approach in SimCLR is that for a given pair of positive samples, the remaining 2 (N-1) samples are all negative samples, that is, the negative samples are all generated based on the data of this batch. In the above formula, sim(zi,zj) is actually the calculation formula of cosin similarity. 1[k≠i] inputs 0 or 1. When k is not equal to i, the result is 1 otherwise it is 0. τ is the temperature coefficient.

        As can be seen from the above formula, only the distance between positive sample pairs is calculated in the numerator, and negative samples will only appear in the denominator of the contrast loss. When the distance between positive sample pairs is smaller, the distance between negative sample pairs is larger, and the loss is smaller.

Reference:

        1.《Self-supervised Learning for Large-scale Item Recommendations》

        2.《Momentum Contrast for Unsupervised Visual Representation Learning》

        3. Contrastive Learning (Contrastive Learning), you must know it and you must know it - Zhihu

Guess you like

Origin blog.csdn.net/weixin_44750512/article/details/132302847