1. Basic information
topic | Paper author and unit | source | years |
---|---|---|---|
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | Darmstadt University of Technology - Germany | EMNLP | 2019 |
1791 Citations
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv, abs/1908.10084.
Paper link: https://aclanthology.org/D19-1410/#
Paper code: https://github.com/UKPLab/sentence-transformers
2. Key points
Research Topics | problem background | Core method flow | highlights | data set | in conclusion | thesis type | keywords |
---|---|---|---|---|---|---|---|
Semantic representation | Bert is at the bottom of efficiency when calculating similar | The model just adds a layer of pooling to bert | A lot of experiments have been done, and the effect is good, which is very suitable for the appetite of the industry | sts | In some ways it works better than bert | method | sentence similar |
3. Model (core content)
3.1 Model
There are two types of model structures, the classification task on the left; the regression task and reasoning framework on the right. The regression model uses the cosin function to calculate the correlation.
SBERT obtains a fixed-length sentence embedding by adding a pooling operation to BERT. It is used for three pooling strategies
: CLS-token, MAX, MEAN
The paper says that MEAN works better.
3.2 Loss function
Three loss functions are introduced: classification, regression, Triplet
4. Experiment and analysis
4.1 Dataset
sts, AFS, Wikipedia
4.2 Strategy research of the model
Sbert increases u*v to make the effect slightly worse;
the addition of |uv| this feature is very important to Sbert;
the Max strategy is better than MEA in BiLSTM (InferSent uses BiLSTM).
4.3 The effect of SentEval platform
SentEval is a tool for evaluating sentence embedding,
SBERT has two points of improvement in SentEval
4.4 Unsupervised STS与Supervised STS
unsupervised
The unsupervised here means that the train and dev data sets of STS are not used for training, but the test data set is used for evaluation.
supervision
5. Code
This is indeed more friendly to use.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
sentence_embeddings = model.encode(sentences)
for sentence, embedding in zip(sentences, sentence_embeddings):
print("Sentence:", sentence)
print("Embedding:", embedding)
print("")
6. Summary
This is a very practical article. The model is not complicated. From an academic point of view, it seems that there are not many innovative points, but it is easy to use. From an engineering point of view, I like this article very much, it is simple and easy to use. I like things that are simple and effective.
6.1 Excellent
There is code, easy to use. It can meet many practical needs.
6.4 Insufficient
Ternary samples do not see experiments.
In addition, if it is regarded as an academic paper, it is almost a theoretical innovation.
7. Knowledge collation (knowledge points, literature to be read, extracting the original text)
Sentence embedding: Skip-Thought (encoder-decoder architecture), InferSent (siamese BiLSTM network), poly-encoders,
8. References
【1】 Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3294–3302. Curran Associates, Inc.
【2】Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo¨ıc Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark. Association for Computational Linguistics
made by happyprince