What is self-supervised learning?

 Self-supervised learning (Self-Supervised Learning) is a relatively popular research field in the past two years. It aims to improve the feature extraction ability of the model by designing auxiliary tasks to mine the characteristic features of the data itself as supervisory signals for unlabeled data. .

Self-supervised learning (Self-Supervised Learning) is a kind of unsupervised learning, also known as (pretext task). Self-supervised learning mainly uses auxiliary tasks (pretext) to mine its own supervisory information from large-scale unsupervised data, and trains the network through this structured supervisory information, so that it can learn valuable representations for downstream tasks.

The advantage of self-supervised learning is that training can be completed on unlabeled data, while supervised learning requires a large amount of labeled data, and reinforcement learning requires a large number of interactive attempts with the environment. In the era when data is king, this feature also makes everyone fully believe that Self-supervised learning is the direction of development of artificial intelligence.

The main methods of self-supervision can be divided into three categories:

● Context based

Based on the context information of the data itself, we can construct many tasks, such as the important Word2vec algorithm in the NLP field. Word2vec mainly uses the order of sentences. For example, CBOW predicts the central word by using the surrounding words, and Skip-Gram predicts the surrounding words by using the central word.

 In the field of images, researchers construct auxiliary tasks (pretext) through a method called Jigsaw (puzzle). Divide a picture into 9 parts, and then generate a loss by predicting the relative positions of these parts. For example, input the eyes and right ear of the kitten in this picture, and then let the model learn that the right ear of the cat is on the upper right of the eye. If the model can complete this task well, then it can be considered that the representation learned by the model is with semantic information.

 

 

Based on timing (Temporal Based)

Most of the context-based methods are based on the information of the samples themselves, and there are actually many constraint relationships between the samples, so the timing constraints can be used for self-supervised learning. The data type that best reflects timing is video.

In the field of video, research can be based on the similarity of frames. For each frame in the video, there is a concept of similar features. The similarity between them is low. Self-supervised constraints are performed by constructing such similar (positive) and dissimilar (negative) samples.

● Contrastive Based

A third class of self-supervised learning methods is based on contrastive constraints, which build representations by learning to encode the similarity or dissimilarity of two things. The timing-based method introduced in the second part has involved contrast-based constraints, which achieve self-supervised learning by constructing positive and negative samples, and then measuring the distance between positive and negative samples.

 

Reprinted: Popular Science | What is self-supervised learning? 

Guess you like

Origin blog.csdn.net/modi000/article/details/132152249