supervised——>self-supervised

In CV, based on data and neural network, we usually train the network in supervised and unsupervised ways. The purpose of these behaviors is to make the learned network have better feature representation capabilities. , for things like classification, object detection, semantic segmentation, etc. The main similarities and differences between these two methods are whether the data has a label or not.

Supervised:

Take a simple classification task as an example:

There are different cat data sets: n pieces, all labels are represented by 1; different dog data sets: n pieces, all labels are represented by 0;

Train in a supervised way, input these samples into the network model, construct a loss through data and labels, and perform gradient back-propagation updates. The ultimate goal is to make the feature representations of the same class as identical as possible.

The main reason why the above method is feasible is that these samples have similar feature distribution under the same class label .

However, in real life, data in many fields often lack labels, such as medical diagnostic image data, and how do we use these data reasonably and effectively, mine and analyze useful information from these data to assist our lives and provide convenience for our lives? appears to be extremely important.

Self-supervised

In self-supervised, the method of contrastive learning is often used to generate pseudo labels in the form of building pretext tasks for model training.

The method of data enhancement is mainly used to construct positive and negative sample pairs, with the purpose of maximizing the similarity between positive sample pairs and minimizing the similarity between negative sample pairs.

Defect : In the above methods, only the corresponding positive example is considered as one class, and all remaining samples in the batch are negative examples of this sample, while ignoring that other samples in these samples may contain positive samples. If other examples belong to the same class, they should not all be classified as negative samples.

The issue of statistics is involved here. As long as there are enough negative samples in this batch of data, and in other words, there are relatively few positive samples, the other positive samples will be classified as negative samples. The impact will be very small. Statistically classifying them as negative samples is a desirable problem, so the data requirements for negative samples in contrastive learning (SimCLR, MoCO) are very large, because the more negative samples, the more positive samples are regarded as negative samples. The smaller the proportion of samples, the less impact on the network, and the better the performance of the finally learned network.

insert image description here

Refer to an interpretation of Zhihu: https://www.zhihu.com/question/402452508/answer/1352959115

Through the augmentation of the data set, and iteratively learn the distribution of samples through the network in the feature space. The inherent principle that this method can work is that the network does not strictly learn the characteristics of the sample itself, but learns the characteristics of the distribution of the sample.

Guess you like

Origin blog.csdn.net/weixin_41807182/article/details/114273729