In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training - [single sentence classification]

Category: General Catalog of "In-depth Understanding of Deep Learning"
Related Articles:
BERT (Bidirectional Encoder Representations from Transformers): Basic Knowledge
BERT (Bidirectional Encoder Representations from Transformers): BERT Structure
BERT (Bidirectional Encoder Representations from Transformers): MLM (Masked Language Model)
BERT (Bidirectional Encoder Representations from Transformers): NSP (Next Sentence Prediction) task
BERT (Bidirectional Encoder Representations from Transformers): input representation
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[sentence pairs Classification]
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[single sentence classification]
·
BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training- [ Text Q&A] BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training-[Single sentence annotation]
BERT (Bidirectional Encoder Representations from Transformers): Model summary and precautions


According to the input and output forms of downstream tasks of natural language processing, BERT divides the tasks supported by fine-tuning training into four categories, namely sentence pair classification, single sentence classification, text question answering and single sentence annotation. This article will introduce the fine-tuning training of single-sentence classification, and other types of tasks will be introduced in other articles in the " In-depth Understanding of Deep Learning-BERT (Bidirectional Encoder Representations from Transform)" series.

Given a sentence, judging the category of the sentence is collectively referred to as single sentence classification. Common tasks are as follows:

  • Stanford Sentiment Corpus SST-2 (Stanford Sentiment Treebank): Given a single sentence, judge the emotional category, which belongs to the binary classification task.
  • Text coherence corpus CoLA (Corpus of Linguistic Acceptability): Given a single sentence, it is a binary classification task to judge whether it is a semantically coherent sentence.

For single-sentence classification tasks, although BERT was not specifically optimized during the pre-training process, the NSP training method allowed BERT to learn to use classification labels to [CLS]capture sentence-pair relationships, and also learned the ability to extract and integrate single-sentence semantic information. Therefore, for the single-sentence binary classification task, there is no need to make any changes to the structure of BERT's input data and output data. As shown in the figure below, the single-sentence classification uses the output feature of the sentence head label as the classification label, and calculates the cross-entropy between the classification label and the real label [CLS]. , take it as the optimization target, and perform fine-tuning training on the task data. Similarly, for multi-classification tasks, it is necessary [CLS]to connect a fully connected layer and Softmax layer after the output feature vector of the sentence head label to ensure that the output dimension is consistent with the number of categories. An example of the semantic coherence judgment task is given below, focusing on the format of input data and output data:

Task: To judge whether the sentence "Haida football star rice and tea" is a sentence.
Input and rewrite: " [CLS]Haida football star rice tea and food"
and take the " [CLS]" label to correspond to the output: [ 0.99 , 0.01 ] [0.99, 0.01][0.99,0.01 ] , byarg ⁡ max ⁡ \arg\maxargThe max operation gets a similar category of 0, that is, this sentence is not a semantically coherent sentence

single sentence classification

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015 [
2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J] . arXiv preprint arXiv:2106.11342, 2021.
[3] Che Wanxiang, Cui Yiming, Guo Jiang. Natural Language Processing: A Method Based on Pre-Training Model [M]. Electronic Industry Press, 2021. [4]
Shao Hao, Liu Yifeng. Pre-training language model [M]. Electronic Industry Press, 2021.
[5] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019
[6] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Practice[ M]. People's Posts and Telecommunications Press, 2023
[7] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131350406