BERT and ERNIE who is stronger? Here is a detailed review of four major scene

BERT and ERNIE, 2 large model of the field of NLP recently the most talked about what would happen? Someone just found a little competition, the results in the Chinese language environment, the results were unexpected and pleasant surprise. Specific details of how exactly? This technology may wish to crowd together under review.

1. EDITORIAL

With the 2018 release ELMo, BERT and other models, NLP field finally entered the era of "big miracle" of. Deep model uses a large-scale corpus unsupervised pre-training, fine-tuning it on the downstream task data, you can achieve good results. Once the need for repeated parameter adjustment, task carefully designed structure, now simply use a larger pre-training data, the deeper the model can be solved.

Then in the first half of 2019, Baidu open source learning platform depth knowledge PaddlePaddle released an enhanced pre-training model ERNIE, ERNIE by huge data modeling terms, entities and entity relationships. Compared to the original language BERT learning signal, ERNIE directly on the unit prior semantic knowledge modeling, enhanced ability to model semantic representation.

In simple terms, Masked Language Model Baidu ERNIE uses a priori knowledge of the mechanism with Mask. It can be seen in FIG next, if the random mask BERT, in accordance with the suffix "Heilongjiang" can be easily predicted "Black". After the introduction of the word, entities mask, "Heilongjiang" as a whole is mask off, so that the model had to rely on ( "snow-cultural city") from a longer distance learning relevant.

In addition, Baidu ERNIE also introduced DLM (dialog language model) task to learn this way the same reply semantic similarity between corresponding query. Experiments show that the introduction of DLM LCQMC (text similarity calculation) series of tasks brought greater help. The final training ERNIE multi-source data, the use of high-performance distributed deep learning platform PaddlePaddle complete pre-training.

2. pro-test

In the end Baidu ERNIE model training mechanism introduced have not worked, and they knew only after practice. For this reason, I personally ran BERT and ERNIE two models have been predicted results in the next few scenes.

2.1 Cloze

ERNIE introduced when Cloze tasks and pre-training knowledge a priori Mask LM task very similar. We can see from the relatively lower figure, ERNIE modeling entity clearer words, the prediction for more accurate than the term entity BERT. For example BERT answer "Family Week" combines similar words "Chow Yun-fat" and "family" The result is not clear enough; "City Gate Village" is not a known entity; "Cai Cai" word boundaries are incomplete. ERNIE answer can be accurately hit the vacancy entity.

2.2 NER (NER)

NER task in the same size as a token, a priori knowledge Mask LM has also brought significant results. Comparison F1 score on the performance data set MSRA-NER, and BERT ERNIE were 93.8%, 92.6%. On PaddleNLP the LAC data sets, ERNIE also achieved better results, set F1 test was 92.0%, improved 1.7% BERT results of 90.3%. Analysis of both the test data MSRA-NER both the prediction result. you will see:

1.) ERNIE more accurate understanding of the entity: "white marble" is not an entity type classification error;

2.) ERNIE modeling entity boundary more clearly: "The US law," the word boundary is not complete, and "North", "Tsinghua" are the two institutions.

Comparative Case: Taken three sections sentence MSRA-NER test data set. B_LOC / I_LOC label location entities, B_ORG / L_ORG label institutional entities, O is no entity class label. The following table shows the marked results were ERNIE, BERT model on every word.

2.3 Similarity

ERNIE introduced in training DLM can effectively enhance the modeling capabilities of text similarity. Therefore, we compare two text similarity task LCQMC data collection performance. As can be seen from the table below predictions, ERNIE learn the complex Chinese word order changes. The final prediction ERNIE BERT and the job data accuracy was 87.4%, 87.0%.

2.4 Classification

Finally, compare the most widely sentiment classification task. After ERNIE pre-trained to capture more subtle semantic differences, these sentences usually contain a more tactful way of expression. The following shows scoring performance on the test set ERNIE PaddleNLP sentiment classification and BERT: The turning point in relations contained in the sentence "not very ..." in, ERNIE can well understand this relationship will result forecast is "negative." After finetune on ChnSentiCorp ERNIE sentiment classification test set prediction accuracy was 95.4%, higher than the accuracy of the BERT (94.3%). We can see from the above data, ERNIE have good performance on most tasks. Especially in the sequence tagging, cloze tasks such as word size, ERNIE performance is particularly prominent, is not lost to Google's BERT.

Reproduced in: https: //juejin.im/post/5d0748c4e51d455cd73ba093

Guess you like

Origin blog.csdn.net/weixin_34384681/article/details/93181753