ELMO、GPT、Transformer、bert

ELMO、GPT、Transformer
1：解决一词多义的模型ELMO，存在的问题是LSTM的特征抽取能力远弱于Transformer 2：使用Transformer提取特征的GPT模型，存在的问题是语言模型是单向的 3：bert-Masked 语言模型和Next Sentence Prediction Transformer一跃成为踢开RNN和CNN传统特征提取器，荣升头牌，大红大紫。你问了：什么是注意力机制？这里再插个广告，对注意力不了解的可以参考鄙人16年出品17年修正的下文：“深度学习中的注意力模型”，补充下相关基础知识，如果不了解注意力机制你肯定会落后时代的发展。而介绍Transformer比较好的文章可以参考以下两篇文章：一个是Jay Alammar可视化地介绍Transformer的博客文章The Illustrated Transformer ，非常容易理解整个机制，建议先从这篇看起；然后可以参考哈佛大学NLP研究组写的“The Annotated Transformer. ”，代码原理双管齐下，讲得非常清楚。我相信上面两个文章足以让你了解Transformer了，所以这里不展开介绍。一类是序列标注，这是最典型的NLP任务，比如中文分词，词性标注，命名实体识别，语义角色标注等都可以归入这一类问题，它的特点是句子中每个单词要求模型根据上下文都要给出一个分类类别。二类是分类任务，比如我们常见的文本分类，情感计算等都可以归入这一类。它的特点是不管文章有多长，总体给出一个分类类别即可。三类任务是句子关系判断，比如Entailment，QA，语义改写，自然语言推理等任务都是这个模式，它的特点是给定两个句子，模型判断出两个句子是否具备某种语义关系。四类是生成式任务，比如机器翻译，文本摘要，写诗造句，看图说话等都属于这一类。它的特点是输入文本内容后，需要自主生成另外一段文字。参考文献：从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史 https://zhuanlan.zhihu.com/p/49271699 Bert时代的创新（应用篇）：Bert在NLP各领域的应用进展 https://zhuanlan.zhihu.com/p/68446772 美团技术团队： https://tech.meituan.com/

ELMO、GPT、Transformer

1：解决一词多义的模型ELMO，存在的问题是LSTM的特征抽取能力远弱于Transformer

2：使用Transformer提取特征的GPT模型，存在的问题是语言模型是单向的

3：bert-Masked 语言模型和Next Sentence Prediction

Transformer一跃成为踢开RNN和CNN传统特征提取器，荣升头牌，大红大紫。你问了：什么是注意力机制？这里再插个广告，对注意力不了解的可以参考鄙人16年出品17年修正的下文：“深度学习中的注意力模型”，补充下相关基础知识，如果不了解注意力机制你肯定会落后时代的发展。而介绍Transformer比较好的文章可以参考以下两篇文章：一个是Jay Alammar可视化地介绍Transformer的博客文章The Illustrated Transformer ，非常容易理解整个机制，建议先从这篇看起；然后可以参考哈佛大学NLP研究组写的“The Annotated Transformer. ”，代码原理双管齐下，讲得非常清楚。我相信上面两个文章足以让你了解Transformer了，所以这里不展开介绍。

一类是序列标注，这是最典型的NLP任务，比如中文分词，词性标注，命名实体识别，语义角色标注等都可以归入这一类问题，它的特点是句子中每个单词要求模型根据上下文都要给出一个分类类别。

二类是分类任务，比如我们常见的文本分类，情感计算等都可以归入这一类。它的特点是不管文章有多长，总体给出一个分类类别即可。

三类任务是句子关系判断，比如Entailment，QA，语义改写，自然语言推理等任务都是这个模式，它的特点是给定两个句子，模型判断出两个句子是否具备某种语义关系。

四类是生成式任务，比如机器翻译，文本摘要，写诗造句，看图说话等都属于这一类。它的特点是输入文本内容后，需要自主生成另外一段文字。

参考文献：

从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史

https://zhuanlan.zhihu.com/p/49271699

Bert时代的创新（应用篇）：Bert在NLP各领域的应用进展

https://zhuanlan.zhihu.com/p/68446772

美团技术团队：

https://tech.meituan.com/

bert

bert
SQuAD https://rajpurkar.github.io/SQuAD-explorer/ Implementing QANet (Question Answering Network) with CNNs and self attentions https://towardsdatascience.com/implementing-question-answering-networks-with-cnns-5ae5f08e312b 机器阅读理解打破人类记录，解读阿里iDST SLQA 技术 https://www.jiqizhixin.com/articles/2018-01-14-4 https://github.com/NLPLearn/QANet A Tensorflow implementation of QANet for machine reading comprehension BERT Dureader阅读理解排名第七 https://github.com/basketballandlearn/Dureader-Bert bert句向量： https://github.com/JerryRoc/bert_utils_gp/blob/5567cfd6536b098ec389845a7e3b166be5db9940/extract_features_gp_nothread.py#L276 https://github.com/google-research/bert/blob/master/extract_features.py `run_classifier.py`和`run_squad.py`用来做`fine-tuning` 1:准备预训练的data python create_pretraining_data.py \ --input_file=./sample_text.txt \ --output_file=/tmp/tf_examples.tfrecord \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --do_lower_case=True \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --masked_lm_prob=0.15 \ --random_seed=12345 \ --dupe_factor=5 2:预训练模型 python run_pretraining.py \ --input_file=/tmp/tf_examples.tfrecord \ --output_dir=/tmp/pretraining_output \ --do_train=True \ --do_eval=True \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --train_batch_size=32 \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --num_train_steps=20 \ --num_warmup_steps=10 \ --learning_rate=2e-5 3：提取特征 python extract_features.py \ --input_file=/tmp/input.txt \ --output_file=/tmp/output.jsonl \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --layers=-1,-2,-3,-4 \ --max_seq_length=128 \ --batch_size=8 4：阅读理解 python run_squad.py \ --vocab_file=$BERT_LARGE_DIR/vocab.txt \ --bert_config_file=$BERT_LARGE_DIR/bert_config.json \ --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \ --do_train=False \ --train_file=$SQUAD_DIR/train-v2.0.json \ --do_predict=True \ --predict_file=$SQUAD_DIR/dev-v2.0.json \ --train_batch_size=24 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=gs://some_bucket/squad_large/ \ --use_tpu=True \ --tpu_name=$TPU_NAME \ --version_2_with_negative=True \ --null_score_diff_threshold=$THRESH 5：fine-turn export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12 export GLUE_DIR=/path/to/glue python run_classifier.py \ --task_name=MRPC \ --do_train=true \ --do_eval=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 \ --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=/tmp/mrpc_output/ 6:predict export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12 export GLUE_DIR=/path/to/glue export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier python run_classifier.py \ --task_name=MRPC \ --do_predict=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$TRAINED_CLASSIFIER \ --max_seq_length=128 \ --output_dir=/tmp/mrpc_output/

SQuAD
https://rajpurkar.github.io/SQuAD-explorer/
Implementing QANet (Question Answering Network) with CNNs and self attentions
https://towardsdatascience.com/implementing-question-answering-networks-with-cnns-5ae5f08e312b
机器阅读理解打破人类记录，解读阿里iDST SLQA 技术
https://www.jiqizhixin.com/articles/2018-01-14-4
https://github.com/NLPLearn/QANet
A Tensorflow implementation of QANet for machine reading comprehension
BERT Dureader阅读理解排名第七
https://github.com/basketballandlearn/Dureader-Bert

bert句向量：

https://github.com/JerryRoc/bert_utils_gp/blob/5567cfd6536b098ec389845a7e3b166be5db9940/extract_features_gp_nothread.py#L276

https://github.com/google-research/bert/blob/master/extract_features.py

run_classifier.py和run_squad.py用来做fine-tuning

1:准备预训练的data

python create_pretraining_data.py \
  --input_file=./sample_text.txt \
  --output_file=/tmp/tf_examples.tfrecord \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --do_lower_case=True \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --masked_lm_prob=0.15 \
  --random_seed=12345 \
  --dupe_factor=5

2:预训练模型

python run_pretraining.py \
  --input_file=/tmp/tf_examples.tfrecord \
  --output_dir=/tmp/pretraining_output \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --train_batch_size=32 \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --num_train_steps=20 \
  --num_warmup_steps=10 \
  --learning_rate=2e-5

3：提取特征

python extract_features.py \
  --input_file=/tmp/input.txt \
  --output_file=/tmp/output.jsonl \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --layers=-1,-2,-3,-4 \
  --max_seq_length=128 \
  --batch_size=8

4：阅读理解

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
  --do_train=False \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True \
  --null_score_diff_threshold=$THRESH

5：fine-turn

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue

python run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=/tmp/mrpc_output/

6:predict

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier

python run_classifier.py \
  --task_name=MRPC \
  --do_predict=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$TRAINED_CLASSIFIER \
  --max_seq_length=128 \
  --output_dir=/tmp/mrpc_output/

ELMO、GPT、Transformer、bert

猜你喜欢