ChatGPT development and technical basis

1. Development of ChatGPT

【ChatGPT——GPT3.5】

Born in : November 2022

Type : Large Language Models for Dialogue Scenes

Features : It interacts with users in a more human-like way; it far exceeds human expectations in terms of understanding human intentions, answering questions accurately, and generating results smoothly .

Function : It can answer questions, admit mistakes, challenge incorrect premises, and reject inappropriate requests . It also has amazing thinking chain reasoning ability and zero-sample problem-solving ability.

Popularity : According to UBS data, ChatGPT had more than 100 million users two months after its launch , while it took nine months for TikTok, the last phenomenal app, to reach 100 million users.

Ability : Powerful contextual continuous speech ability : ChatGPT can realize dozens of rounds of continuous dialogue, can more accurately identify fine-grained language phenomena such as ellipsis and reference, record historical information, and seem to be able to maintain the consistency and focus of the conversation theme .

        Intelligent interactive correction ability : Whether the user changes his previous statement or points out the problems in the ChatGPT reply, it can capture the modification intention, accurately identify the part that needs to be modified, and finally make the correct modification.

【ChatGPT——GPT4.0】

Born : March 2023

Type : Large Language Models for Dialogue Scenes

Features : It has multi-modal capability and can support text and image input at the same time .

         The number of supported text input has been increased to about 32,000 tokens , corresponding to about 25,000 words.

Performance :

  1. Enhanced comprehension/ reasoning / multilingual skills.
  2. The comprehension ability is significantly enhanced, and " talking through pictures " can be realized.
  3. Compared with GPT3.5, the reliability has been greatly improved by 19% .
  4. Significant decline in false responses to disallowed and sensitive content.

2. ChatGPT technical basis

ChatGPT is fine-tuned from the Generative Pretrained Transformer (GPT) GPT-3.5. On the basis of GPT-3.5, the reinforcement learning technology based on human feedback (Reinforcement Learning from Human Feedback, RLHF) is introduced to the model. fine-tuning.

Reference: The Impact of Large Models Represented by ChatGPT on Information Resource Management

The ability of ChatGPT comes from: large-scale pre-training + instruction fine-tuning + reinforcement learning based on human feedback

1. Through large-scale pre-training , by allowing a large model with 175 billion parameters to learn a corpus containing 300 billion words, the large model already has basic capabilities.

Large model basic capabilities : language generation, situational learning (in-context learning, following a given example to generate a solution for a new test application), world knowledge (factual knowledge and common sense), instruction following (Instruct following), thinking chain (Chain of thought) can solve problems step by step.

2. Through instruction tuning (Instruction tuning) , help the large model "unlock" the ability of specific domains, such as following instructions to implement a question-and-answer chatbot, or generalize to other new task domains.

3. Reinforcement Learning with Human Feedback (RLHF, Reinforcement Learning with Human Feedback) enables the large model to have the ability to "align" with humans, that is, to give the questioner a detailed and fair response, to reject inappropriate questions, and to reject its scope of knowledge Extraneous problems and other characteristics.

2.1 Large-scale pre-training

Chat GPT performs feature extraction based on Transformer , adopts Decoder-Only method, from two stages to one stage: one-way language model pre-training + zero shot/ few shot prompt/ Instruct.

Explain the meaning of GPT: Generative Pre-Train (GPT).

The Transformer structure is composed of an encoder and a decoder, and a large language model can be built solely based on an encoder or a decoder. Therefore, three types of large model routes have been formed
  • Decoder-Only (decoder only) - GPT
  • Encoder-Only (encoder only) - Google's Bert, Deberta
  • Encoder-Decoder (encoder-decoder) - Meta's Bart, T5, ChatGLM

 Those who use Decoder-Only include GPT, etc., which use the method of "predicting the next word" for pre-training , and then realize the stimulation of specific domain functions through instruction fine-tuning.

Those who use Encoder-Only include Google's Bert, Microsoft's Deberta, etc., which use "cloze" pre-training , and then fine-tuning (fine-tuning) with a small amount of labeled data according to the required application field.

Models using the Encoder-Decoder architecture include Google's T5, Meta's Bart, and Tsinghua University's ChatGLM.

Large model pre-training:

 

 Reference: " A Survey of Large Language Models " ( Zhao Wayne Xin, etc.), Open Source Securities Research Institute

2.2 Model fine-tuning

Model fine-tuning will endow the model with the ability in a specific field, and the pre-trained basic model will be fine-tuned:

  • 1. Use manually labeled data to train the model;
  • 2. Train a reward model through human ranking of model answers;
  • 3. Use the reward model to train ChatGPT through reinforcement learning. The latter two steps are called RLHF (Reinforcement Learning with Human Feedback).

During the training process of GPT4, OpenAI further added rule-based reward models (RBRMs) to help the model further generate correct answers and
reject harmful content. It can be seen that model fine-tuning is crucial to the realization of the final effect of the model. Players' unique training and fine-tuning methods will make their own models form unique performance.

2.3  Reinforcement Learning Based on Human Feedback

Reinforcement learning based on human feedback (RLHF) enables the large model to have the ability to "align" with humans, that is, to give the questioner a detailed and fair response, to reject inappropriate questions, and to reject questions outside the scope of his knowledge.

3. The impact of ChatGPT on scientific research ideas

1. Laboratories with abundant resources will start to further invest in large-scale model competition , and will focus on exploring different directions and scales of RLHF in the short term.

2. The rapid disappearance and integration of some subtasks . A large number of pre-existing subtasks/small tasks will be merged into large tasks, and constructing a supervised dataset and fine-tuning is no longer the first choice for small tasks. Small tasks where large models fail to achieve good results will become a research hotspot.

3. Cross-modal knowledge mining and self-supervised learning will become new hot research directions. A large number of RLHF-based cross-modal knowledge generation methods will be quickly proposed and practiced, and related results will be published in a large number in the short term. Mainstream hotspots will mainly focus on the quantity and quality of knowledge and the methods of using knowledge.

references:

[1] Zhao Chaoyang, Zhu Guibo, Wang Jinqiao. Enlightenment brought by ChatGPT to large language models and new development ideas for multimodal large models [J]. Institute of Automation, Chinese Academy of Sciences. 2023

[2] A Survey of Large Language Models》(Zhao Wayne Xin 等)

[3]  The impact of large models represented by ChatGPT on information resource management

This article is only for your reference and study, thank you~

Guess you like

Origin blog.csdn.net/qq_41204464/article/details/131491304