2020-03-13 12:37:59
Thirteen from the bottom of the recessed non-Temple
qubit reports | Public number QbitAI
The NLP pre-training model, you deserve to have.
It's called ELECTRA , from the Google AI, not only has the advantage of BERT, the efficiency is higher than it.
ELECTRA is a new pre-training method, it is possible to effectively learn how to collect accurate word sentences, that is, we usually say that the token-replacement.
How effective?
RoBERTa and only a quarter of the amount of calculation XLNet, will be able to reach their performance on GLUE. And it has made new breakthroughs in performance SQuAD.
This means that "small-scale, also has a large role" in the single GPU on the training needs of only four days, even higher than the accuracy of the model OpenAI of GPT.
ELECTRA has been used as TensorFlow open source model release includes a number of pre-training and easy to use language representation model.
Let the pre-training faster
Pre-existing training model can be divided into two broad categories: language model (Language Model, LM) and mask language model (Masked Language Model, MLM).
GPT is a kind of example, LM, which processes the input text from left to right, according to a given context to predict the next word.
And as BERT, RoBERTa and ALBERT belong to MLM, they can predict a small amount of words in the input masked. MLM has a two-way advantage, they can "see" both sides of the text token to be predicted.
MLM but also has its disadvantages: each of the input token and prognosis, these models predict only a very small subset (masked 15%), thereby reducing the amount of information obtained from each sentence.
The ELECTRA uses a new pre-training mission, called REPLACED token Detection (RTD).
It is like MLM training as a two-way model, like LM as learning all the input position.
Inspired by generating confrontation Network (GAN) is, ELECTRA to distinguish between "true" and "false" the input data by training model.
BERT input method of destruction is to use "[MASK]" token replacement, but this method is not correct (but somewhat credible) replacing some pseudo-token input by using the token.
For example, in FIG. "Cooked" may be substituted "ate".
Generating a first prediction using the mask out a sentence token, token using the predicted next alternative sentence [the MASK] tag, and then use a token for each sentence is determined to distinguish between the original or replacement.
After the pre-training, the task is determined for downstream.
Victory BERT, SQuAD 2.0 performed best
The ELECTRA with other advanced NLP model comparison can be found:
Under the same to calculate the budget, it is a big improvement over previous methods, in the case of less than 25% of the calculated amount of performance RoBERTa and XLNet quite.
To further improve efficiency, the researchers also tried a small ELECTRA model, it can be trained on a single GPU 4 days.
Although there is no large-scale model of precision required to achieve the TPU to train a lot, but still behave ELECTRA prominent, even more than the GPT (only 1/30 of the amount of computation required).
Finally, in order to see whether the large-scale implementation, researchers used more computation (RoBERTa about the same amount, about 10% of T5), to train a large ELECTRA.
The results showed that, on SQuAD 2.0 test set to achieve the best results.
Moreover, in the GLUE it has more than exceeded RoBERTa, XLNet and ALBERT.
Open source code has been
In fact, the study already published in early September last year, when. But what is exciting is that, in recent days, finally open up the code!
ELECTRA is mainly pre-mission training and fine-tuning downstream code. Currently supported tasks include text categorization, answers to questions and sequence markers.
Open source code to support the rapid training of a small ELECTRA model on a GPU.
ELECTRA model is currently only available in English, but the researchers also expressed the hope that the future can publish multilingual pre-training model.
Portal
Google AI blog:
https://ai.googleblog.com/2020/03/more-efficient-nlp-model-pre-training.html
GitHub Address:
https://github.com/google-research/electra
Papers Address:
https://openreview.net/pdf?id=r1xMH1BtvB
- Finish-