Tech Talk | ChatGPT's technological evolution and question-and-answer application

On December 1 last year, since Sam Altman publicly announced ChatGPT on Twitter, ChatGPT has gradually attracted global attention. According to Xinhua News Agency, ChatGPT had 100 million monthly active users in January this year, making it the fastest growing app in history. Even, Stanford University research believes that it already has a human mind.

Although ChatGPT sometimes has factual errors, its inherent open domain knowledge, language understanding ability to follow human instructions, code writing, mathematical calculation, and common sense reasoning have brought us a great shock. As far as the field of open domain question answering is concerned, ChatGPT is completely different from the previous common question answering technology, bringing a new paradigm and change.

“ 

In this issue of Tech Talk, we invited Liu Huiwen, an engineer from Xiaomi's Q&A team , to introduce ChatGPT's technological evolution and Q&A applications, including ChatGPT-related work and technologies, and discuss ChatGPT's open domain Q&A service for Xiaoai. What kind of transformation does it bring about.

2b220661be59fe9a30cd7dba0d7ed6f0.jpeg

Hard core index: ⭐⭐⭐⭐⭐

Hobby index: ⭐⭐⭐

Reading time: about 14 minutes

1. Technical background

ChatGPT was launched by OpenAI. OpenAI is a non-profit (initial goal) laboratory founded by Silicon Valley tycoons Reed Hoffman, Elon Musk and others in 2015, aiming to study general artificial intelligence technology AGI. ChatGPT can also be regarded as a phased achievement under its purpose.

At present, its academic papers have not been made public, but OpenAI mentioned in their blog that ChatGPT was developed based on their previous InstructGPT. It involves related work on the GPT series, as well as IFT (Instruction Fine-Tuning), CoT (Chain-of-Thought) and RLHF (Reinforcement Learning from Human Feedback), etc. In addition to this, it is currently believed that Codex, another OpenAI job, is also related. In general, ChatGPT did not appear suddenly, and a large amount of previous research and technology accumulation created conditions for its emergence.

It is worth mentioning that a lot of work was not pioneered by OpenAI. Even Yann LeCun, the chief scientist of Meta, believes that "ChatGPT does not have much innovation, but it is well combined. As far as its underlying technology is concerned, in addition to Google and Meta, there are several companies that have similar technologies." But OpenAI stood on the shoulders of its predecessors, borrowed and absorbed other technologies and finally achieved ChatGPT. Here, we first introduce the technical background of the birth of ChatGPT .

 >>>> 1.1 GPT1-3

ChatGPT is considered to be based on the GPT series model (GPT3.5), after fine-tuning and artificial feedback reinforcement learning training. GPT (Generative Pre-Training) is a language model (Language Model). The earliest model GPT1 was launched by OpenAI in June 2018. GPT1 has about 100 million learnable parameters and adopts the common pre-training + fine-tuning mode in natural language processing (NLP) tasks. It is worth mentioning that the GPT1 model was subsequently borrowed and modified by the Google team, and BERT was launched in October of that year. Before ChatGPT, BERT was considered a cross-generational work in the NLP field.

The GPT2 model was launched in February 2019 after BERT. Compared with GPT1, it has more parameters, reaching 1.5 billion, but it is still weaker than BERT in the pre-training + fine-tuning mode. But starting from GPT2, OpenAI changed its perspective and began to change from the pre-training + fine-tuning mode to zero-sample learning. Subsequent Prompt, Instruction and finally ChatGPT can interact with users naturally, all originated from this transformation. In terms of popularity, GPT2 may not be as good as the opening work GPT1 and the later GPT3, but it is very important as a link between the past and the future.

Based on GPT2, GPT3 was launched in May 2020. Its training parameters reached 175 billion. Such a large-scale parameter requires a lot of computing resources to complete the training, and the training cost has reached millions of dollars, triggering a new round of arms race for large language models.

GPT3 has two important points: first, it proposes a new paradigm, In-Context Learning, which can be considered to be related to CoT and IFT; second, GPT3 has begun to show the emergence of large language models. (Emergent Abilities). Emergence ability, in layman's terms, means that when the number of parameters of the model is relatively small (such as the 100 million or 1.5 billion parameter scales of GPT1 and GPT2), the model does not have or has weaker related capabilities, but when the parameter quantity changes After getting a lot older, these abilities will suddenly have or become very strong. GPT3 carried out experiments with addition and subtraction in mathematics. When the model reaches the parameter scale of GPT3, its two-digit addition and subtraction operations will be much better. Emergent power is an unexpected discovery, and so far, the academic community has not given a good explanation for this phenomenon.


 >>>> 1.2 IFT

The full name of IFT is Instruction Fine-Tuning, which can be called fine-tuning following instructions. In layman's terms, it is to organize a batch of training data according to the way of human language or commands to fine-tune large-scale language models (such as GPT3). This fine-tuned model is already similar to ChatGPT, which can better "understand" what people say or give commands, and then give answers on this basis. The earlier work in this field is FLAN proposed by Google in October 2021. FLAN is mentioned in the InstructGPT paper and is considered to be related to ChatGPT technology.

 >>>> 1.3 CoT

The full name of CoT is Chain-of-Thought, which is called Thinking Chain in Chinese. It was first published by Google Brain on NeurIPS 2022. The chain of thinking, simply described, is to use a large language model to answer questions, give it a few examples, and give the reasoning process of the entire question in these examples. For example, when doing a math application problem, you must first give an example to the model. The example needs to include the question stem and the answer. The focus of the thinking chain is to give the intermediate process the same as when students do the problem. Contrary to the chain of thinking, in the given example, no intermediate process is given, only the stem of the question and the final answer are given. In addition, there is another kind of thinking chain that encourages the model to generate the intermediate process ("Let's think step by step"), and finally gives the answer based on the model's own derivation process.

Thinking is considered to be closely related to the ability of ChatGPT. For example, when ChatGPT answers the question of chickens and rabbits in the same cage, it will give an intermediate derivation process. Its more important point is that without retraining or fine-tuning the large language model, but only adding the description of the intermediate steps in the given example, it can significantly improve the performance of the large language model on tasks such as mathematical calculations and logical reasoning. Performance. Therefore, some research points of view believe that such as mathematical calculation and logical reasoning are the emerging capabilities of large language models. These capabilities will naturally appear after the model size and training corpus reach a certain level, and the thinking chain and other methods are only unlocked ( Or called awakening) this ability.

 >>>> 1.4 RLHF

RLHF, the full name of Reinforcement Learning from Human Feedback, is reinforcement learning from human feedback. Unlike the large-scale unlabeled text data used by the model in the pre-training phase, the fine-tuning phase uses manually labeled training data. RLHF will introduce manual intervention during the training process. The results generated by the model will be compared and sorted by the annotators, and then this batch of data will be used to guide the training and iteration of the model. OpenAI's data annotation quality is considered to be relatively high. After manual intervention, RLHF has reduced the output of harmful and untrue content to some extent.

 >>>> 1.5 Codex

In addition to the above-mentioned IFT, CoT, and RLHF, Codex, another work of OpenAI, may be related to ChatGPT's code writing and logical reasoning capabilities. Codex is also the technology behind GitHub's AI code completion tool Copilot. Before GPT3 was trained by selecting text on the Internet, Codex added the code on GitHub during the training process and developed an evaluation system. Judging from the results of the paper, it greatly exceeds GPT3 in terms of coding ability.

According to the introduction, you can see that ChatGPT did not appear suddenly. The technology it is based on has also been accumulated in other companies, even earlier and more. OpenAI managed to integrate the strengths of each company, and finally produced cross-age achievements.

2. Q&A Business Discussion

Open-domain Q&A undertakes knowledge-based questions that are not limited to domains. Here we discuss the impact of ChatGPT on the Q&A business . It has to be said that ChatGPT already has the ability to answer any question in any field. The following shows the effect of ChatGPT on several types of knowledge problems.

 >>>>  2.1 Example

2.1.1 Language

7afc80b9b2261e491a9a649e8df32e62.png

f37ea455b92d5030d2f38547e67a4aee.png

Figure 1. Example of ChatGPT answering language questions

2.1.2 Mathematics

1f6e2bb9ca96dfab3a4f67a51b06bf2f.png

2285a6cc129d94ee213210ebcffa8c7a.png

Figure 2. Example of ChatGPT answering a math question

2.1.3 Physics

9b35ab76f75a4b5711dbadcfa0ac79fd.png

97880063018b794b815229d83f314b08.png

Figure 3. Example of ChatGPT answering physics questions

It can be seen that ChatGPT can cope with subject knowledge such as language, mathematics, and objects, and on more complex mathematics and physics problems, ChatGPT not only gives the results, but also gives a sufficient derivation process. In addition to subject knowledge, ChatGPT also performs well on long-tail questions of general knowledge.

2 .1.4 Common sense

f528ba3336f6a50ed90c3ab174c16c44.png

a32a1377b236ede75cb60e06f162d5c9.png

Figure 4. Example of ChatGPT answering general knowledge questions

>>>>  2.2 Insufficient

ChatGPT already has a strong question-and-answer ability. If there are some shortcomings, we think there are four points:

① Sometimes there are factual errors, and it is impossible to judge whether the answer is right or wrong, as shown in Figure 5 below;

② You cannot search for new information like Google search, and cannot obtain the latest knowledge;

③ Sometimes the result is not stable, and the ChatGPT answer may change after changing the question method, as shown in Figure 6 below;

④ Lack of interpretability, which is also a common problem of the current model.

Of course, we think the above problems are insignificant compared with the changes that ChatGPT can currently bring.

ea0d10e2631ad40999099cf67ccd2826.jpeg

Figure 5. ChatGPT’s answer has a factual error

6b70a083244ed07a69a59d945c172f14.png

f17a5dd25129a6264b9632de53995611.png

Figure 6. An example of an unstable answer

>>>>  2.3 Application

We can imagine how to apply ChatGPT and related technologies in Xiaoai's open domain question answering scene. For now, there are two problems with directly adopting ChatGPT: first, it sometimes produces factual errors and people cannot distinguish; second, it consumes a lot of computing resources and is expensive.

Based on the above characteristics, we believe that traditional technology is still needed to provide stable and reliable services for high-frequency and regular Q&A, but the open-domain Q&A capability provided by ChatGPT can be used to deal with long-tail and low-frequency Q&A. Reasonable product shape design combined with the answers provided by search engines can bring users a better product experience.

In addition to being used directly online, ChatGPT can also be used as an offline tool in Q&A. Taking data construction as an example, two examples are listed to briefly explain:

● Supplement the training data extracted from slots;

● Supplement the long-tail question-answer pair data.

① Training data supplement for slot extraction

ChatGPT itself has a strong In-Context Learning capability. It only needs to provide an example of slot extraction, which can imitate the automatic extraction of slots from user questions, as shown in Figure 7 below. In this way, unmarked text can be sent to ChatGPT to automatically extract slot information, and after manual review, it can be used as training data.

54fe41107a635d1673595fe655526441.png

Figure 7. Example of slot extraction

② Question and answer to supplement data

Most of the time, the collection of Q&A data requires a lot of manpower for products, operations, and marking students. With ChatGPT, you can use it to generate answers directly after you have prepared your questions. There may be factual errors in ChatGPT's reply, and manual review is required before official use, but this method is much more convenient than directly collecting data.

d61e08ac84d360049599129a8b62b757.png

Figure 8. ChatGPT’s answers about common sense in life can be used to supplement data

3. Conclusion and Outlook

ChatGPT has a strong open domain question answering ability, which is a well-deserved cross-age achievement in terms of question answering technology, with unlimited prospects and a promising future. Although we mentioned some of its shortcomings, such as factual errors, inability to acquire new knowledge, etc., these problems can be avoided or solved under the current global ChatGPT boom.

Take the problem of factual error, as far as the current stage is concerned, it seems to be a problem that cannot be completely and thoroughly solved by large language models. But if some materials are given as a reference or basis when the results are given, then people can judge the authenticity of the results by themselves. As far as we know, there is related research work on adding reference material to the generated results of language models. Regarding retrieving new information, in fact, DeepMind's Sparrow is already doing it, and we have reason to expect that future models will have relevant capabilities. In that case, not only can you solve the questions with reference and basis in the replies, but you can also try to answer questions that are relatively new in time. In addition, regarding the large resource consumption of the model and whether it can be miniaturized, after ChatGPT has attracted global attention, I believe there will be relevant research follow-up. We optimistically expect that ChatGPT will be better applied to open domain question answering in the future.

However, although ChatGPT has unlimited prospects, it will not fully replace the existing question-answering technology. ChatGPT's question answering ability in the general field is indeed very strong, but in the vertical field that requires extremely rich domain knowledge, ChatGPT is not necessarily suitable.

Figure 9 below shows ChatGPT’s response to questions related to Chinese words, which involves pinyin-related knowledge. Although there seems to be some truth, the result is wrong. These detailed areas require specialized data construction, but in order to answer a segmented area, it is unlikely to retrain ChatGPT. Therefore, the traditional question-and-answer method will still have certain advantages in professional question-and-answer fields such as customer service, e-commerce, and medical treatment in the future. In the future question-and-answer format, traditional question-and-answer can coexist with ChatGPT and complement each other. Just like the current BERT does not replace traditional machine learning methods in all tasks.

3b31512272e098e1300b902bb740c63d.pngFigure 9. ChatGPT makes factual errors on Chinese word questions

references

[1] Improving Language Understanding by Generative Pre-Training

[2] Language Models are Unsupervised Multitask Learners

[3] Language Models are Few-Shot Learners

[4] Finetuned Language Models Are Zero-Shot Learners

[5] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

[6] Large language models are zero-shot reasoners

[7] Training language models to follow instructions with human feedback

[8] Evaluating Large Language Models Trained on Code

[9] Emergent Abilities of Large Language Models

[10] Improving alignment of dialogue agents via targeted human judgements

What other technologies do you want to know about? Welcome to leave a message in the comment area, and we will continue to invite engineers to share on topics you care about. For more hard-core knowledge, please continue to pay attention to Xiaomi Tech Talk!

e15ac2e018f782fc2491cd3de426198f.gif

612ae5da2966e7baa4b7297ae624bafa.png

Guess you like

Origin blog.csdn.net/pengzhouzhou/article/details/129311989