In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training - [single sentence annotation]

Category: General Catalog of "In-depth Understanding of Deep Learning"
Related Articles:
BERT (Bidirectional Encoder Representations from Transformers): Basic Knowledge
BERT (Bidirectional Encoder Representations from Transformers): BERT Structure
BERT (Bidirectional Encoder Representations from Transformers): MLM (Masked Language Model)
BERT (Bidirectional Encoder Representations from Transformers): NSP (Next Sentence Prediction) task
BERT (Bidirectional Encoder Representations from Transformers): input representation
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[sentence pairs Classification]
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[single sentence classification]
·
BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training- [ Text Q&A] BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training-[Single sentence annotation]
BERT (Bidirectional Encoder Representations from Transformers): Model summary and precautions


According to the input and output forms of downstream tasks of natural language processing, BERT divides the tasks supported by fine-tuning training into four categories, namely sentence pair classification, single sentence classification, text question answering and single sentence annotation. This article will introduce the fine-tuning training of single-sentence annotations, and other types of tasks will be introduced in other articles in the " In-depth Understanding of Deep Learning-BERT (Bidirectional Encoder Representations from Transform)" series.

Given a sentence, tagging each word is called single-sentence tagging. The common task is CoNLL 2003, that is, given a sentence, mark the names of people, places and institutions in the sentence. The single-sentence tagging task is quite different from BERT's pre-training task, but it is similar to the text question-answering task. When performing single-sentence tagging tasks, it is necessary to add a fully connected layer after the final semantic feature vector of each word to convert semantic features into features required for sequence tagging tasks. Different from text question and answer, the task of single-sentence tagging needs to tag each word, so there is no need for horizontal comparison, that is, no need to introduce auxiliary vectors, and directly perform Softmax operation on the results after the fully connected layer to get various tags The probability distribution of , as shown in the figure below.
single sentence annotation
The CoNLL 2003 task needs to label whether the word is a person name (PER, Person), a place name (LOC, Llocation) or an organization name (ORG, Organization). Considering that BERT needs to segment the input text, independent words will be divided into several subwords, so the results predicted by BERT will be 5 categories (subdivided into 13 subcategories):

  • O (non-personal and place name institution name, O means Other)
  • B-PER/LOC/ORG (initial word of person name/place name/organization name, B means Begin)
  • I-PER/LOC/ORG (person name/place name/organization name middle word, I means Intermediate)
  • E-PER/LOC/ORG (person name/place name/organization name termination word, E means End)
  • S-PER/LOC/ORG (independent word of person name/place name/organization name, S means Single).

Combining the first letters of the five categories, IOBES can be obtained, which is the most commonly used labeling method for sequence labeling. In addition to sequence annotation, BERT can also be used for various tasks such as new word discovery and keyword extraction. An example of a NER task is given below:

Task: Given the sentence "Einstein gave a speech in Berlin",
rewrite it according to the IOBES tagged NER entity input: "[CLS] Einstein gave a speech in Berlin" and take arg ⁡ max ⁡ \arg\max
for the result of Softmaxargmax , the final NER labeling result is: "Einstein" is a person's name, "Berlin" is a place name

BERT Softmax results

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015 [
2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J] . arXiv preprint arXiv:2106.11342, 2021.
[3] Che Wanxiang, Cui Yiming, Guo Jiang. Natural Language Processing: A Method Based on Pre-Training Model [M]. Electronic Industry Press, 2021. [4]
Shao Hao, Liu Yifeng. Pre-training language model [M]. Electronic Industry Press, 2021.
[5] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019
[6] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Practice[ M]. People's Posts and Telecommunications Press, 2023
[7] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131350552