In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transformers): input representation

Category: General Catalog of "In-depth Understanding of Deep Learning"
Related Articles:
BERT (Bidirectional Encoder Representations from Transformers): Basic Knowledge
BERT (Bidirectional Encoder Representations from Transformers): BERT Structure
BERT (Bidirectional Encoder Representations from Transformers): MLM (Masked Language Model)
BERT (Bidirectional Encoder Representations from Transformers): NSP (Next Sentence Prediction) task
BERT (Bidirectional Encoder Representations from Transformers): input representation
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[sentence pairs Classification]
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[single sentence classification]
·
BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training- [ Text Q&A] BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training-[Single sentence annotation]
BERT (Bidirectional Encoder Representations from Transformers): Model summary and precautions


BERT used " In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transform): MLM (Masked Language Model)" and " In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transform): NSP (Next The two training methods described in the Sentence Prediction task , in the process of real training, the two methods are mixed together. The Self-attention introduced in " In-depth Understanding of Deep Learning - Attention Mechanism: Self-attention" does not consider the position information of words, so Transformer needs two sets of Embedding operations, one for One -hot vocabulary mapping encoding (marked as Token Embeddings in the figure below), and the other set is position encoding (marked as Position Embeddings in the figure below). At the same time, in the training process of MLM, there are single-sentence input and double-sentence input, so BERT also needs a set of segmentation codes to distinguish input sentences (marked as Segment Embeddings in the figure below). BERT's Embedding process includes three sets of Embedding operations, as shown in the figure below.
BERT's Embedding process
The sample data in the following figure is used as the original input, and the final BERT input representation can be obtained through the following 5 steps.

  1. Obtain the original input sentence pair "my dog ​​is cute" and "he likes playing".
  2. Using WordPiece participle on the input sentence, it becomes "my dog ​​is cute" and "he likes play##ing".
  3. Sentence pairs are concatenated and special tags and separators for classification are added to obtain “[CLS]my dog ​​is cute[SEP]he likes play##ing[SEP]”.
  4. Calculate the Position Embeddings, Segment Embeddings, and Token Embeddings of each word, as shown in the gray, green, and yellow areas in the figure above.
  5. Add the three Embeddings representations to get the final BERT input representation.

It is worth noting that the position code used by Transformer is generally a trigonometric function, while the position code and segmentation code used by BERT are trained in the pre-training process, and its ability to represent position information is stronger.

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015 [
2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J] . arXiv preprint arXiv:2106.11342, 2021.
[3] Che Wanxiang, Cui Yiming, Guo Jiang. Natural Language Processing: A Method Based on Pre-Training Model [M]. Electronic Industry Press, 2021. [4]
Shao Hao, Liu Yifeng. Pre-training language model [M]. Electronic Industry Press, 2021.
[5] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019
[6] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Practice[ M]. People's Posts and Telecommunications Press, 2023
[7] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131350007