30 state-of-the-art natural language processing models

Model summary:

  1. T5: Based on Transformer, it combines multi-task learning and unsupervised pre-training, and uses a large-scale English Wikipedia corpus for training.

  2. GPT-3: Also based on Transformer, it uses an extremely large corpus and uses Zero-shot learning to realize natural language reasoning.

  3. Chinchilla: A novel natural language generation model using adaptive regularization and dynamic use of attention.

  4. PaLM: Combining the advantages of unidirectional and bidirectional models, and using bidirectional training and pre-training with additional tasks, it has achieved quite good results.

  5. LLaMA: A Natural Language Understanding Model Using Language Modeling as a Prior, Using Linguistic and Probabilistic Modeling of the Target Task to Optimize Network Parameters.

  6. Alpaca: A meta-learning-based multi-task learning model that can be quickly applied to new NLP tasks.

  7. ELECTRA: A novel pre-trained model that learns language representations using an "alternative observation" approach, achieving promising results.

  8. Roberta: Using more training data, longer training time, and larger model sizes, a combination of dynamic distillation and other techniques has achieved good results.

  9. BART: It combines the technologies of speech recognition and machine translation, and uses a bidirectional encoder-decoder structure, which has achieved very good results.

  10. UniLM: Using vertical and horizontal pre-training mechanisms, it integrates language generation and language understanding, and can be applied to a variety of natural language processing tasks.

  11. GShard: A Transformers framework that supports large-scale distributed training, which can be trained on multiple GPUs with very good performance.

  12. LSDSem: A Semantic Dependency Analysis Model Based on Multi-Level Detection, Considering Both Syntactic and Semantic Information.

  13. BertRank: A model for conversational search, based on BERT's two-tower architecture, using multi-task learning and local attention mechanisms, and achieved good results.

  14. BERT-DP: A BERT-based dependency parsing model that utilizes the dynamic programming technology of neural networks to achieve high precision.

  15. NLR: A natural language reasoning model based on generative confrontation networks, which uses unsupervised data enhancement technology and has achieved quite good results.

  16. MT-DNN: A natural language processing model based on multi-task learning, which improves model performance by jointly training multiple tasks.

  17. ERNIE: A language representation framework that combines knowledge graphs and external entities to support cross-language and cross-domain applications.

  18. XLNet: Using an autoregressive network and a recurrent reverse language model, the model can process bidirectional contextual information during the pre-training phase.

  19. TAPAS: A table-based natural language inference model using Transformer encoders and decoders, combined with parse tree information.

  20. DeBERTa: A novel multi-stream model that utilizes separate mask networks and global networks to assign different importance to words.

  21. FNet: Replace the convolutional layer with a custom inverse time Fourier (IFFT) layer, which achieves comparable effects to Transformer-based models.

  22. AdaBERT: An adaptive inference-based natural language processing model that uses two modules to independently learn context representations and task representations.

  23. UniSkip: Use the span information in the sentence to control the flow of information, and achieve the effect of paying more attention to the important information of the input sentence.

  24. Transformer-XH: Test to determine the size and number of hidden layers, realize automatic model selection, and achieve better results on multiple tasks.

  25. Embedding Propagation: Automatically learn the embedding vector of each word, and realize a richer semantic representation with the help of manifold space technology.

  26. EAT: An entity-relationship representation model based on Transformer, which introduces self-attention mechanism and global feature attention, and has achieved good results.

  27. GPT-2: A Transformer-based pre-trained language representation model, using unsupervised learning and multi-level structure, achieved good results.

  28. ULMFiT: Using CycleGAN to achieve data set enhancement, fine-tuning was done through the sequence-to-sequence method, and good results were achieved.

  29. BERT-MRC: A BERT-based reading comprehension model that extends the form of binary classification to span extraction and improves accuracy.

  30. ERNIE-Gram: A natural language generation model based on ERNIE, which uses large-scale weakly supervised data and unsupervised pre-training technology, and has achieved good results.

List of pros and cons:

model name Advantage disadvantage
T5 Combination of multi-task learning and unsupervised pre-training; use large-scale corpus for training longer training time
GPT-3 Huge corpus; realized Zero-shot learning to realize natural language reasoning function Not yet fully open
Chinchilla Attention mechanism using adaptive regularization and dynamic usage Not suitable for all application scenarios
PaLM Combines the advantages of unidirectional and bidirectional models; uses bidirectional training and pre-training with additional tasks May require large computing power and data volume
LLaMA Language modeling can be used as a priori to optimize network parameters Performance can be affected by data bias in the model
Alpaca Multi-task learning model based on meta-learning; can be quickly applied to new NLP tasks Few open source implementations
ELECTRA Using the "surrogate observation" method to learn language representation, achieved good results Not yet thoroughly tested on all NLP tasks
Roberta Use more training data, longer training times, and larger model sizes; incorporates dynamic distillation and other techniques May require more computing resources to train
BART Technology that combines speech recognition and machine translation; uses a bidirectional encoder-decoder structure Some applications require higher precision
UniLM Blends language generation and language understanding; applicable to a variety of natural language processing tasks Processing large-scale data and training time may be long
GShard Supports large-scale distributed training; performance is very good Higher cost of use
LSDSem Both syntactic and semantic information are considered Not currently available for all NLP tasks
BertRank Using multi-task learning and local attention mechanism There may be a risk of overfitting in some application scenarios
BERT-DP Utilizes the dynamic programming technology of the neural network to achieve high precision Sensitive to noise or errors in input data
NLR Utilizes unsupervised data augmentation techniques; achieves reasonably good results Like BERT-DP, it is more sensitive to noise or error of input data
MT-DNN Jointly train multiple tasks to improve model performance High training time and computing resource requirements
ERNIE Combines knowledge graph and external entities; supports cross-language and cross-domain applications In some application scenarios, the effect is not satisfactory
XLNet Handle bidirectional contextual information using autoregressive networks and recurrent inverse language models Training and tuning require more time and computing resources
TAPAS Transformer encoder and decoder are used, combined with parse tree information In some application scenarios, the effect is not satisfactory
DeBERTa Utilizes separate mask network and global network to assign different importance to words Training and tuning require more time and computing resources
FNet Achieved equivalent effects to Transformer-based models; more computationally efficient still in research stage
AdaBERT Two modules are used to learn context representation and task representation independently Requires more training resources and tuning time
UniSkip Pay more attention to the important information of the input sentence Processing large-scale data and training time may be long
Transformer-XH Automated model selection is achieved; better results have been achieved on multiple tasks The principle is more complicated
Embedding Propagation Learn embedding vectors for each word and achieve richer semantic representation In some application scenarios, the effect is not satisfactory
EAT Using self-attention mechanism and global feature attention, achieved good results Training and tuning have high demands on computing resources
GPT-2 used unsupervised learning and multi-level structure, and achieved good results Not suitable for all NLP tasks
ULMFiT Use CycleGAN to achieve data set enhancement; use sequence-to-sequence method to do fine-tuning Requires more computing resources and time
BERT-MRC Extended the form of binary classification to span extraction, and improved the accuracy Not suitable for all reading comprehension tasks
ERNIE-Gram Using large-scale weakly supervised data and unsupervised pre-training technology, it has achieved good results In some application scenarios, the effect is not satisfactory

Guess you like

Origin blog.csdn.net/ChinaLiaoTian/article/details/130252437