[Paper Reading Notes 68] Sentence-BERT

1. Basic information

topic Paper author and unit source years
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Darmstadt University of Technology - Germany EMNLP 2019

1791 Citations
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv, abs/1908.10084.

Paper link: https://aclanthology.org/D19-1410/#

Paper code: https://github.com/UKPLab/sentence-transformers

2. Key points

Research Topics problem background Core method flow highlights data set in conclusion thesis type keywords
Semantic representation Bert is at the bottom of efficiency when calculating similar The model just adds a layer of pooling to bert A lot of experiments have been done, and the effect is good, which is very suitable for the appetite of the industry sts In some ways it works better than bert method sentence similar

3. Model (core content)

3.1 Model


There are two types of model structures, the classification task on the left; the regression task and reasoning framework on the right. The regression model uses the cosin function to calculate the correlation.

SBERT obtains a fixed-length sentence embedding by adding a pooling operation to BERT. It is used for three pooling strategies
: CLS-token, MAX, MEAN

The paper says that MEAN works better.

3.2 Loss function

Three loss functions are introduced: classification, regression, Triplet

4. Experiment and analysis

4.1 Dataset

sts, AFS, Wikipedia

4.2 Strategy research of the model


Sbert increases u*v to make the effect slightly worse;
the addition of |uv| this feature is very important to Sbert;
the Max strategy is better than MEA in BiLSTM (InferSent uses BiLSTM).

4.3 The effect of SentEval platform

SentEval is a tool for evaluating sentence embedding,
SBERT has two points of improvement in SentEval

4.4 Unsupervised STS与Supervised STS

unsupervised

The unsupervised here means that the train and dev data sets of STS are not used for training, but the test data set is used for evaluation.

supervision

5. Code

This is indeed more friendly to use.

    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    sentences = ['This framework generates embeddings for each input sentence',
        'Sentences are passed as a list of string.', 
        'The quick brown fox jumps over the lazy dog.']
    sentence_embeddings = model.encode(sentences)
    for sentence, embedding in zip(sentences, sentence_embeddings):
        print("Sentence:", sentence)
        print("Embedding:", embedding)
        print("")

6. Summary

This is a very practical article. The model is not complicated. From an academic point of view, it seems that there are not many innovative points, but it is easy to use. From an engineering point of view, I like this article very much, it is simple and easy to use. I like things that are simple and effective.

6.1 Excellent

There is code, easy to use. It can meet many practical needs.

6.4 Insufficient

Ternary samples do not see experiments.
In addition, if it is regarded as an academic paper, it is almost a theoretical innovation.

7. Knowledge collation (knowledge points, literature to be read, extracting the original text)

Sentence embedding: Skip-Thought (encoder-decoder architecture), InferSent (siamese BiLSTM network), poly-encoders,

8. References

【1】 Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3294–3302. Curran Associates, Inc.
【2】Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo¨ıc Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark. Association for Computational Linguistics

made by happyprince

Guess you like

Origin blog.csdn.net/ld326/article/details/125948712