ELMO、GPT、Transformer、bert

ELMO、GPT、Transformer

1: The model ELMO that solves the ambiguity of a word, the problem is that the feature extraction ability of LSTM is far weaker than that of Transformer

2: The GPT model that uses Transformer to extract features, the problem is that the language model is one-way

3: bert-Masked language model and Next Sentence Prediction

Transformer became the traditional feature extractor that kicked off RNN and CNN, and became the top brand and became a big hit. You asked: What is the attention mechanism? Here is another ad. If you don’t understand attention, you can refer to the following revised version published in 2016 and 17 years: " Attention model in deep learning ", supplement the relevant basic knowledge. If you don’t understand the attention mechanism, you will definitely fall behind. The development of the times. For better articles about Transformer, you can refer to the following two articles: One is Jay Alammar's blog post The Illustrated Transformer  that visually introduces Transformer . It is very easy to understand the whole mechanism. It is recommended to start with this one; then you can refer to Harvard University NLP research. " The Annotated Transformer.  " written by the group , the code principle is two-pronged, and it is very clear. I believe the above two articles are enough for you to understand Transformer, so I will not introduce them here.

One type is sequence tagging, which is the most typical NLP task. For example, Chinese word segmentation, part-of-speech tagging, named entity recognition, semantic role tagging, etc. can all be classified into this type of problem. Its characteristic is that each word in the sentence requires the model according to The context must give a classification category.

The second category is classification tasks, such as our common text classification, sentiment computing, etc. can be classified into this category. Its characteristic is that no matter how long the article is, a classification category can be given overall.

The three types of tasks are sentence relationship judgment, such as Entailment, QA, semantic rewriting, natural language inference and other tasks are all in this mode. Its characteristic is that given two sentences, the model judges whether the two sentences have a certain semantic relationship.

The four types are generative tasks, such as machine translation, text summarization, writing poems and making sentences, looking at pictures and speaking, all belong to this category. Its characteristic is that after inputting text content, another paragraph of text needs to be generated autonomously.

references:

From Word Embedding to Bert Model—The History of Pre-training Technology in Natural Language Processing

https://zhuanlan.zhihu.com/p/49271699

Innovation in the Bert Era (Applications): The application progress of Bert in various fields of NLP

https://zhuanlan.zhihu.com/p/68446772

Meituan technical team:

https://tech.meituan.com/

bert

SQuAD
https://rajpurkar.github.io/SQuAD-explorer/
Implementing QANet (Question Answering Network) with CNNs and self attentions
https://towardsdatascience.com/implementing-question-answering-networks-with-cnns-5ae5f08e312b
machine reading Understanding breaking human records, interpreting Ali iDST SLQA technology
https://www.jiqizhixin.com/articles/2018-01-14-4
https://github.com/NLPLearn/QANet
A Tensorflow implementation of QANet for machine reading comprehension
BERT Dureader ranked seventh in reading comprehension
https://github.com/basketballandlearn/Dureader-Bert

 

bert sentence vector:

https://github.com/JerryRoc/bert_utils_gp/blob/5567cfd6536b098ec389845a7e3b166be5db9940/extract_features_gp_nothread.py#L276

https://github.com/google-research/bert/blob/master/extract_features.py

 

run_classifier.pyrun_squad.py用来做fine-tuning

1:准备预训练的data

python create_pretraining_data.py \
  --input_file=./sample_text.txt \
  --output_file=/tmp/tf_examples.tfrecord \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --do_lower_case=True \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --masked_lm_prob=0.15 \
  --random_seed=12345 \
  --dupe_factor=5

2:预训练模型

python run_pretraining.py \
  --input_file=/tmp/tf_examples.tfrecord \
  --output_dir=/tmp/pretraining_output \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --train_batch_size=32 \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --num_train_steps=20 \
  --num_warmup_steps=10 \
  --learning_rate=2e-5

3:提取特征

python extract_features.py \
  --input_file=/tmp/input.txt \
  --output_file=/tmp/output.jsonl \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --layers=-1,-2,-3,-4 \
  --max_seq_length=128 \
  --batch_size=8

4:阅读理解

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
  --do_train=False \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True \
  --null_score_diff_threshold=$THRESH

5:fine-turn

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue

python run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=/tmp/mrpc_output/

6:predict

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier

python run_classifier.py \
  --task_name=MRPC \
  --do_predict=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$TRAINED_CLASSIFIER \
  --max_seq_length=128 \
  --output_dir=/tmp/mrpc_output/

 

 

 

 

Guess you like

Origin blog.csdn.net/u011939633/article/details/103995561