SimCSE Contrastive Learning Method

14937803:

SimCSE Contrastive Learning Method

This was asked during the four-sided technical interview in a factory, so I recorded it and made up for the omission

Early Approaches to Computing Sentence Similarity

I wrote a blog before, which recorded the relatively early text similarity calculation method. If you are interested, you can move to Text Similarity Calculation .

Sentence vectors are used in many NLP tasks, such as text retrieval, rough text sorting, semantic matching and other tasks. Now there are many Bert-based methods to obtain sentence vectors, and some improved methods: such as Bert-flow and Bert-whitening.

  • Use the pre-trained Bert to directly obtain the sentence vector, which can be a vector of CLS bits, or the average of different token vectors.
  • Bert-flow, Bert-flow comes from the paper " On the Sentence Embeddings from Pre-trained Language Models ", mainly using the flow model to correct Bert's vector.
  • Bert-whitening, use pre-training Bert to obtain the vectors of all sentences, obtain the sentence vector matrix, and then change the sentence vector matrix into a mean value of 0 through a linear transformation, and the covariance matrix is ​​a matrix of unit matrix.

SimCSE

SimCSE: Simple Contrastive Learning of Sentence Embeddings

The idea of ​​contrastive learning will not be described too much. One of its benefits is that it can use a large amount of unlabeled data to train the model, and at the same time shorten the distance between positive samples and keep negative samples away. This paper proposes two kinds of SimCSE, supervised and unsupervised, and this paper will be introduced in detail below.

This paper focuses on proposing a very effective and simple unsupervised contrastive learning method to learn the similarity between sentences.
insert image description here
The overall structure of the model is shown in the figure: it includes unsupervised methods and supervised methods.

unsupervised

Unsupervised model, one sentence gets sentence embedding through encoder, while the sentence embedding obtained by other sentences is used as a negative example.

For the setting of positive examples, two different dropout masks are used for the same sentence, and two different sentence vectors are obtained as positive samples, and the sentence embedding is obtained through different drop mask mechanisms.

The dropout layer will randomly drop some inputs during training, which has a certain degree of randomness, but generally speaking, it can be regarded as similar samples (it feels like masking some words is also possible)

The training goals are:
insert image description here

Among them, sim() is the cosine distance;

The author compared the effects of Dropout and other data enhancement methods (the evaluation index is the Spearman correlation coefficient), and the results are shown in the table below, where None represents the Dropout method used by SimCSE. It can be seen that the effect of generating positive samples through Dropout is stronger than other data in a good way.
insert image description here

Supervised

SimCSE can also be used for supervised training. For example, you can use natural language inference datasets (SNLI and MNLI) for supervised learning. Natural language inference datasets contain many sentence pairs. Sentence pairs have three relationships: entailment, neutral ( neutral), contradiction (contradiction). The implication and neutral sentence pairs can be used as positive samples, and the contradictory sentence pairs can be used as negative samples for training.

insert image description here
Training objectives:
insert image description here
insert image description here
The following is a comparison of various common sentence vector algorithms, and it can be seen that SimCSE has the best effect.

insert image description here

Summarize:

I used a very simple method, but the effect is very good, and it can be widely used. There are also many recurring open source codes on the Internet that can be learned.

Those who are interested can refer to github

Guess you like

Origin blog.csdn.net/ganxiwu9686/article/details/126498708