Prompt

Quote:

Harbin Institute of Technology Wanxiang Che: The paradigm of natural language processing is changing

Author: Che Wanxiang

It should be noted that the success of ChatGPT and a series of ultra-large-scale pre-trained language models will bring about a new paradigm shift in natural language processing, that is, from the pre-training + fine-tuning paradigm represented by BERT to The paradigm of pre-training + prompting represented by GPT-3 [3]. The so-called prompt refers to the transformation of downstream tasks into language model tasks in the pre-training stage by constructing a natural language prompt (Prompt). For example, if you want to identify the emotional orientation of the sentence "I like this movie.", you can splice the prompt "It's very" after it. If the pre-trained model predicts that the space is "wonderful", the sentence has a high probability of being commendatory. The advantage of this is that without fine-tuning the entire pre-training model, the internal knowledge of the model can be mobilized to complete "arbitrary" natural language processing tasks.

Of course, before the emergence of ChatGPT, the trend of this paradigm shift was not obvious, mainly for two reasons:

First, large GPT-3-level models are basically in the hands of large companies, so the academic community basically uses relatively small pre-training models when conducting pre-training + prompt research. Since the scale is not large enough, the effect of pre-training + hints is not better than that of pre-training + fine-tuning. And only when the scale of the model is large enough, "intelligence" will emerge (Emerge) [4]. In the end, many of the previous conclusions drawn on small-scale models may not be applicable to large-scale models.

Second, if only the pre-training + prompt method is used, due to the large difference between the pre-trained language model task and the downstream task, this method is good at completing other tasks except for the pre-training task of continuing to write text. not good. Therefore, in order to cope with more tasks, it is necessary to continue pre-training on downstream tasks (also called pre-fine-tuning), and the current trend is to pre-fine-tune large models on many downstream tasks to deal with various New tasks seen [5].

So to be more precise, pre-training + pre-fine-tuning + hints will become a new paradigm for natural language processing.

Different from the traditional pre-training + fine-tuning paradigm, the pre-training + pre-fine-tuning + hint paradigm transforms the way that a natural language processing model was good at handling a specific task in the past to using a model to handle multiple tasks, or even a general-purpose model that has never been seen before. way of the task. So from this perspective, general artificial intelligence may really be coming.

Figure content (in case the picture hangs)

1950-1990 Small scale expert knowledge

1990-2010 Shallow Machine Learning Algorithms

2010-2017 Deep Learning Algorithms

2018-2023? Large scale pre-trained model 

So, how to further improve the ability of pre-training + pre-fine-tuning + prompting new paradigms, and implement them in practical applications?

First, explicitly using human annotation and feedback is still time-consuming and laborious, and we should try to obtain and utilize human feedback more naturally. That is, in actual application scenarios, natural feedback from real users, such as their replies and actions, is obtained, and these feedback information are used to improve the performance of the system. We call this method interactive natural language processing . However, user interactive feedback is relatively sparse, and some users will give malicious feedback. How to overcome the sparsity and avoid malicious feedback will be an urgent problem to be solved.

Second, the current natural language text generated by this paradigm has very good fluency, but there are often factual errors, that is, serious nonsense. Of course, using the above interactive natural language processing method can solve such problems to a certain extent, but for questions that users do not know the answer to, they cannot give feedback on the results. At this time, it is back to poor interpretability, the old problem of this deep learning model. If you can insert the source of relevant information in the generated results like inserting references when writing a paper, the interpretability of the results will be greatly improved.

Finally, this paradigm relies on very large-scale pre-trained language models. However, these models are currently only in the hands of a few large companies. Even if there are individual open source large models, small companies or research groups cannot download and use them because they are too large. Therefore, online calling is currently the most important mode of using these models. In this mode, how to further pre-fine-tune the model with the user's private data according to the different tasks faced by different users, without affecting the public large model, has become an urgent problem to be solved for the practical application of this paradigm. In addition, in order to improve the running speed of the system, how to obtain the offline small model through the online large model and make the offline small model maintain the ability of the large model in certain tasks has also become a solution that the model can be applied in practice.

 

Guess you like

Origin blog.csdn.net/weixin_43717681/article/details/130067846