Machine learning 13 - self-supervised unsupervised learning

1 Why self-supervised learning self-supervised learning

Self-supervised learning is a special way of unsupervised learning. As we said in the unsupervised learning, labeling is very valuable. It generally requires manual marking, and the time and labor costs are very high. But in reality, it is relatively easy to get unlabeled data. We can crawl many, many texts, pictures, voices, product information, etc. on the Internet. How to use these unlabeled data has always been an important direction of unsupervised learning. Self-supervised learning provides a solution.

Self-supervised learning uses part of data to predict other parts, and provides supervision signals by itself, thus realizing self-supervised learning. Using self-supervised learning, you can learn a certain text or picture representation, which is beneficial to the development of downstream tasks. This is pretrain-finetune

image.png

 

2 Self-supervised learning implementation plan

Self-supervised learning implementation schemes mainly include

  1. Use part of data to reconstruct the entire data. This is actually a kind of Denoising Auto-Encoder. In terms of NLP, the Mask language model of the BERT series and the image restoration In-painting of the CV field use this scheme.
  2. Some tasks in the CV field. For example, divide the picture into 9 pieces, and after scrambling, perform a puzzle. Rotate the picture by a certain angle, and then predict the angle of rotation.
  3. Contrast learning. Such as word2vec, Contrastive Predictive Coding, SimCLR

 

3 Self-supervised learning in the field of NLP

Various NLP pre-training models are implemented using self-supervised learning. Such as Elmo, GPT, BERT, XLNet, Electra, T5, etc. They can be regarded as a denoising Auto-Encoder.

image.png

3.1 Auto-Encoder和Auto-regressive LM

They are divided into two types

  1. Auto-Encoder. For example, the Mask language model used by BERT. It masks part of the tokens in the sequence, and then lets the model predict the position of the mask. Its advantage is that it can make full use of sentence information in both directions, and performs well on tasks such as classification, QA, and NER. The disadvantage is that only the mask position in the sequence participates in predict, and the training efficiency is low. In addition, there is a mask during training, but there is no mask during the downstream task fine-tune, resulting in inconsistency in the two stages.
  2. Auto-regressive language model Auto-regressive LM. Strictly speaking, the MLM mask language model cannot be regarded as a language model. Autoregressive models such as GPT are the real language models. It uses the above to predict the token below. It performs better on generation tasks. The advantage is that the two stages of pretrain and fine-tune are the same, and each position in the sequence participates in predict, and the training efficiency is very high. The disadvantage is that you can only see the above but not the following, that is, one direction, which greatly affects the semantic understanding of the sentence.

An example of Auto-Encoder is as follows, it can obtain context in two directions, which is helpful for semantic understanding

image.png

An example of Auto-regressive LM is as follows, whether it is from front to back or back to front, it can only be one-way

image.png

 

3.2 XLNet and PLM

XLNet combines the advantages of the two, it proposes the ranking language model PLM (Permutation LM). It is divided into two steps

  1. Sort, scramble the position of the token in the sequence. In practice, the token is not directly scrambled, but the attention mask is used.
  2. Autoregressive language model predict. Because the position of the token is disrupted, the following information can be obtained when training the language model, which helps to understand the entire sequence.

image.png

 

4 Self-supervised learning in the field of CV

On CV tasks, self-supervised learning is also easy to implement

4.1 predict missing pieces

Cut out some areas in the picture, and then let the model make predictions so that the output can restore the input picture as much as possible.

image.png

 

4.2 Jigsaw Puzzles

Divide the picture into multiple regions and scramble it, and then let the model restore it to the original picture. It is very similar to a jigsaw puzzle.

image.png

 

4.3 Rotation

Rotate the picture by a certain angle and let the model predict how many degrees it has rotated. Or rotate the picture to four categories of 0, 90, 180, and 270 degrees, and let the model predict the rotated category.

image.png

Guess you like

Origin blog.csdn.net/u013510838/article/details/108553383