论文笔记：BERT的原理与应用，ERNIE的原理和进化历程

ref：
详解ERNIE-Baidu进化史及应用场景
 Bert时代的创新：Bert应用模式比较及其它
 从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史

1.ELMO：embedding from language models/ deep contextualized word representation

相较于word embedding， ELMO在构建语言模型的时候引入了上下文的信息。使用context before 和context after对当前位置单词进行训练。使得语言模型可以根据上下文内容判断语义，进而解决了ambiguous 的问题。

同时ELMO采用了Feature-based pretraining。在预训练模型中直接输出各层的结果（matrix）并进行合并。将合并结果输入到下游任务中。优点是下游任务的网络结构不需要随ELMO改变，缺点是这样没FIne tuning模式效果好

2. GPT：Generative Pretraining Fine tuning

本文创新的提出了transformer做特征提取器做单向预处理，后接下游任务的fine tuning模式。跟BERT差不多，没BERT火。

3. BERT：Bidirectional econder representation Transformer

强调1.预训练的基础模型2.Fine tuning的下游任务。
成功的原因是1.使用transformer 2.双向处理数据

采用了两种数据训练

masked language model:15%maksed words randamly
next sentence prediction: Is next/not next labels do training

BERT Encoder: Multi-layer bidirectional transformer, slef attention in both directions.

4.ERNIE:

ERNIE 1.0: 2019年基于BERT的中文 NLP任务优化
改进了mask方法：

basic-level masking, 与原始BERT一致
phrase-level masking，局域词组的掩模
entity-level masking，基于实体知识的掩模
（值得一提的是，这里区别于ERNIE-tsinghua是直接用KG embedding in the model）

DLM task：dialogue language model task
ERNIE修改了BERT的输入形式，使用了多轮对话，其中加入了多轮对话的dialogue embedding组合

ERNIE2.0：Continual pre-training framework for language understanding
ERNIE2.0引入了持续学习/终生学习的概念，在ERNIE2.0中通过持续添加任务提升对不同任务的SoA，同时保证之前学习的任务准确率不降低
（因为MT-DNN证明了在于尊老时加入多个下游任务再进行Fine tuning可以直接得到SoA结果）

在本文中，ERNIE2.0使用了巧妙地方法避免了终生学习中学习遗忘的问题。通过保持之前任务模型不变，用copy训练后来任务，进而得到较好效果（需要详细描述可以看这里
）

Fine tuning：和BERT一致
Model：增加了 task embedding
故ERNIE中共有 task embedding，position embedding，segment embedding， token embedding

论文笔记：NLP之BERT，ERNIE(pre-training 模式在NLP任务中的使用)

论文笔记：BERT的原理与应用，ERNIE的原理和进化历程

1.ELMO：embedding from language models/ deep contextualized word representation

2. GPT：Generative Pretraining Fine tuning

3. BERT：Bidirectional econder representation Transformer

4.ERNIE:

猜你喜欢