In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training - [sentence pair classification]

Category: General Catalog of "In-depth Understanding of Deep Learning"
Related Articles:
BERT (Bidirectional Encoder Representations from Transformers): Basic Knowledge
BERT (Bidirectional Encoder Representations from Transformers): BERT Structure
BERT (Bidirectional Encoder Representations from Transformers): MLM (Masked Language Model)
BERT (Bidirectional Encoder Representations from Transformers): NSP (Next Sentence Prediction) task
BERT (Bidirectional Encoder Representations from Transformers): input representation
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[sentence pairs Classification]
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[single sentence classification]
·
BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training- [ Text Q&A] BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training-[Single sentence annotation]
BERT (Bidirectional Encoder Representations from Transformers): Model summary and precautions


According to the input and output forms of downstream tasks of natural language processing, BERT divides the tasks supported by fine-tuning training into four categories, namely sentence pair classification, single sentence classification, text question answering and single sentence annotation. This article will introduce the fine-tuning training of sentence pair classification, and other types of tasks will be introduced in other articles in the " In-depth Understanding of Deep Learning - BERT (Bidirectional Encoder Representations from Transform)" series.

Given two sentences, judging their relationship is collectively called sentence pair classification. Common tasks are as follows:

  • Multi-type natural language inference MNLI (Multi-Genre Natural Language Inference): Given a sentence pair, it is a three-category task to judge whether they are implication, contradiction or neutral relationship.
  • Quora Questions and Answers QQP (Quora Question Pairs): Given sentence pairs, judging whether they are similar is a binary classification task.
  • Q&A Natural Language Inference QNLI (Question Natural Language Inference): Given a sentence pair, it is a binary classification task to determine whether the latter is an answer to the former.
  • Semantic Textual Similarity STS-B (Semantic Textual Similarity): Given a sentence pair, judging their similarity is a five-category task.
  • Microsoft Research Paraphrase Corpus MRPC (Microsoft Research Paraphrase Corpus): Given a sentence pair, judging whether the semantics are consistent is a binary classification task.
  • Text implication recognition RTE (Recognizing Texual Entailment): Given a sentence pair, it is a binary classification task to judge whether the two have an implication relationship.
  • Select candidate sentences SWAG (Situation With Adversarial Generations) according to context: given sentence AAA and four candidate sentencesBBB , select the optimal BBaccording to semantic coherenceB. _ This task can be converted into asking forAAThe matching value of A and each candidate sentence, according to the degree of quantification of the matching value, this type of task can be regarded as a multi-classification task.

For the task of sentence pair classification, BERT has made sufficient preparations in the pre-training process, using the NSP training method to obtain the ability to directly capture the semantic relationship of sentence pairs. For the binary classification task, BERT does not need to make any changes to the structure of the input data and output data, and can directly use the same input and output structure as the NSP training method. As shown in the figure below, sentence [SEP]pairs are spliced ​​into an input text sequence with delimiters, labels are added at the beginning of the sentence [CLS], and the output value corresponding to the label at the beginning of the sentence is used as the classification label, and the cross entropy between the predicted classification label and the real classification label is calculated, and it is used as Optimize the target and perform fine-tuning training on the task data. For multi-classification tasks, it is necessary [CLS]to connect a fully connected layer and Softmax layer after the output feature vector of the sentence head label to ensure that the output dimension is consistent with the number of categories, which can be passed through arg ⁡ max ⁡ \arg\maxargThe max operation gets the corresponding category result. An example sentence pair similarity task is given below, focusing on the format of the input data and output data:

Task: Determine whether the sentence "I like you very much" is similar to the sentence "I like you very much"
Input rewriting: " [CLS]I like you very much, [SEP]I like you very much"
and take the " [CLS]" label to correspond to the output: [ 0.02 , 0.98 ] [0.02, 0.98][0.02,0.98 ] , byarg ⁡ max ⁡ \arg\maxargThe max operation gets the similar category as 1 (the category index starts from 0), that is, the two sentences are similar

sentence pair classification

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015 [
2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J] . arXiv preprint arXiv:2106.11342, 2021.
[3] Che Wanxiang, Cui Yiming, Guo Jiang. Natural Language Processing: A Method Based on Pre-Training Model [M]. Electronic Industry Press, 2021. [4]
Shao Hao, Liu Yifeng. Pre-training language model [M]. Electronic Industry Press, 2021.
[5] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019
[6] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Practice[ M]. People's Posts and Telecommunications Press, 2023
[7] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131350116