In-depth understanding of deep learning - BERT (Bidirectional Encoder Representations from Transformers): model summary and precautions

Category: General Catalog of "In-depth Understanding of Deep Learning"
Related Articles:
BERT (Bidirectional Encoder Representations from Transformers): Basic Knowledge
BERT (Bidirectional Encoder Representations from Transformers): BERT Structure
BERT (Bidirectional Encoder Representations from Transformers): MLM (Masked Language Model)
BERT (Bidirectional Encoder Representations from Transformers): NSP (Next Sentence Prediction) task
BERT (Bidirectional Encoder Representations from Transformers): input representation
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[sentence pairs Classification]
BERT (Bidirectional Encoder Representations from Transformers): fine-tuning training-[single sentence classification]
·
BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training- [ Text Q&A] BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning training-[Single sentence annotation]
BERT (Bidirectional Encoder Representations from Transformers): Model summary and precautions


Many excellent pre-trained language models emerged in 2018. Both ELMo and GPT have brought different surprises to the field of natural language processing, but the most influential models may be called milestone models in the field of natural language processing. None other than BERT. In the past, in the field of natural language processing, due to the different task requirements, the best performance results were often achieved by using specific models suitable for this field or even for specific tasks. The structure between the models is also varied and there are large differences. The emergence of BERT has broken the situation of model melee for various tasks in the field of natural language processing. Using the two-stage method of pre-training and fine-tuning training, BERT can easily achieve or even surpass SOTA performance in various fields. So far, the pre-trained language model has made its debut and officially debuted in the field of natural language processing. For personal use, it is extremely difficult to collect the corpus required for BERT training and prepare the computing resources required for training BERT. The pre-training language model is proposed to provide an unpolished general model so that readers can apply it in their respective tasks. Therefore, understanding the various details of BERT is not to train a BERT from scratch, but to use BERT better. Here are a few considerations for using BERT:

  • The input sentence should not be too long (more than 250 words), and it is better to use a sentence or a small paragraph to avoid the defects of BERT in the field of long text.
  • BERT is not suitable for tasks that rely on sociological experience, and BERT is suitable for tasks that can be solved only by analyzing the semantic information of sentences.
  • Avoid generative tasks, BERT's structure does not support generative tasks (GPT is best at generative tasks).
  • BERT is suitable for tasks involving the judgment of semantic connections between sentences.
  • BERT is suitable for tasks that require deep semantic understanding of input text.
  • The task of changing single sentence input to sentence pair input is most suitable for BERT (BERT training corpus is mainly in the form of sentence pairs)

References:
[1] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015 [
2] Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola. Dive Into Deep Learning[J] . arXiv preprint arXiv:2106.11342, 2021.
[3] Che Wanxiang, Cui Yiming, Guo Jiang. Natural Language Processing: A Method Based on Pre-Training Model [M]. Electronic Industry Press, 2021. [4]
Shao Hao, Liu Yifeng. Pre-training language model [M]. Electronic Industry Press, 2021.
[5] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019
[6] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Practice[ M]. People's Posts and Telecommunications Press, 2023
[7] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131351172