Introduction to ChatGPT, the difference from BERT and the direction of use in the company

Analysis of ChatGPT

Introduction to ChatGPT

ChatGPT is an artificial intelligence chat robot program launched by OpenAI. ChatGPT can chat with people in fluent natural language, and can also interact according to the context of the chat. Its answers reflect strong natural language understanding, information synthesis and reasoning capabilities, as well as the rich knowledge contained in its model. Since the launch of ChatGPT, it has gradually attracted widespread and enthusiastic attention. Microsoft has planned to integrate ChatGPT into Office, Bing search and other products. ChatGPT can also be used in conjunction with other AIGC models to obtain more cool and practical functions. For example, generate living room design drawings through dialogue. This greatly enhances the ability of AI applications to communicate with customers.
ChatGPT can also complete tasks such as writing emails, writing copywriting, translating, writing code, and writing papers. ChatGPT can even challenge incorrect questions. For example, when asked the question "Napoleon came to China in 2022", the robot will say that Napoleon does not belong to this era and adjust the output.
The introduction of the Transformer structure in 2017 made the parameters of the deep learning model break through 100 million large models came into being. ChatGPT is built based on the pre-trained large model GPT3.5.
In 2018, OpenAI released the first GPT-1 model, which is a pre-trained language model based on the Transformer structure, which can automatically generate text.
In 2019, OpenAI launched GPT-2, which was optimized on the basis of GPT-1, using 1600-dimensional vectors for word embedding, modifying the initialized residual weight, and extending the dimension of feature vectors from 768 to 1600 dimensions. expanded to 50257;
The GPT-3 version will be released in 2020, and the amount of training data and model parameters will gradually increase significantly. The word embedding size of GPT-3 will increase from 1600 of GPT-2 to 12888, and the size of the context window will increase from 1024 of GPT-2 to GPT-3. In 2048, GPT-3 was trained with 45T data, with 175 billion parameters, and demonstrated excellent performance in a variety of natural language tasks. GPT3.5 will be released in 2022. On the basis of GPT3, human feedback reinforcement learning mechanism (RLHF) will be added, and the reward model will be established by scoring the model output results through manual annotation, and then continue to iterate through the reward model. Through such training, OpenAI has obtained InstructGPT, a language model that is more realistic, harmless, and better follows user intentions, will
be released in March 2022, and will start building ChatGPT, a sister model of InstructGPT, during the same period.
insert image description here

ChatGPT uses supervised learning and reinforcement learning based on human feedback to fine-tune GPT-3.5, obtain the ability to model dialogue history, and make model output more in line with human expectations, produce more detailed and fair responses, reject inappropriate questions and Questions outside the domain of knowledge.

The difference between Bert and GPT

The introduction of the Transformer structure in 2017 made the parameters of the deep learning model exceed 100 million. With the iterative upgrade of the model, the scale of the model parameters is getting larger and larger, and the amount of data used for training is also increasing. The large model can be called Foundation Model (cornerstone) model. The model adopts a self-supervised training method to capture knowledge from a large amount of unlabeled data. By storing knowledge in a large number of parameters and fine-tuning specific tasks, the generalization ability of the model is greatly expanded.
After the emergence of Transformer, many companies have done NLP large-scale model research based on Transformer, among which OpenAI and Google are the most important two. In 2018, OpenAI launched GPT-1 with 117 million parameters, and Google launched BERT with 300 million parameters. Started a contest of NLP.
Both BERT and GPT, the underlying structure chooses the transformer as the feature extractor, and both use a large amount of training data for model training. The most important difference between the two is:
The BERT model uses the Transformer's Encoder structure, which belongs to the self-encoding model Encoder-AE. BERT uses a bidirectional Transformer structure for pre-training, which can model single-sentence or double-sentence texts. During model training, the model can predict the probability of the occurrence of intermediate words based on the contextual semantic information of the predicted words. Therefore, BERT is better at language understanding tasks, such as text classification, entity extraction, sentiment judgment, etc.
The GPT model uses the Decoder structure of the transformer, which belongs to the autoregressive language model Decoder-AR. During model training, the GPT model generates the predicted words sequentially from left to right, and can only see the words before the predicted words but not the words after the predicted words. Therefore, GPT has a better performance on language generation tasks, such as machine translation, text generation, etc.
How do the two perform? The earlier GPT-1 won the original Transformer, but lost to BERT, and it was a complete defeat. In the competition rankings at that time, the field of reading comprehension had been slaughtered by BERT. Since then, BERT has become the most commonly used model in the NLP field. However, OpenAI did not change its strategy, but insisted on following the "big model route". The subsequent release of GPT-2 has surpassed BERT in performance, and GPT-3 has gone further, and can almost complete most of the tasks of natural language processing, such as for Question search, reading comprehension, semantic inference, machine translation, article generation, automatic question and answer, and even automatic code generation based on task descriptions. The subsequent release of ChatGPT is far superior to BERT in natural language processing capabilities.

Why did the country not take the GPT route before ChatGPT appeared?

In the early days of the development of the pre-training model, the evaluation of the pre-training model was mainly carried out on various natural language processing NLP sub-tasks. BERT borrowed the idea of ​​GPT and proposed it after GPT. It has better performance in various evaluation tasks. Performance. A reasonable explanation is that the structure and training method of Decoder-AR are more difficult than Encoder-AE, so more training data is needed to stimulate the emergence ability and effect of the model. Therefore, under the background at that time, people were more optimistic about the pre-training model of BERT, and chose the form of BERT + fine-tuning to handle various NLP tasks.
But OpenAI still insisted on choosing the GPT path. And the GPT2 and GPT3 models that will be released in succession can be directly used for downstream NLP tasks through prompt prompts without fine-tuning. This flexible generation method is more in line with the logical concept of human-computer interaction, and by changing the generation form of language generation tasks, it can handle language understanding problems well. For example, it can handle text classification problems by generating tags, which truly achieves NLP. Unity of tasks.

Chatgpt limitations

1. Unable to guarantee the accuracy of the answer
ChatGPT lacks "human common sense" and extension ability in the field without a large amount of corpus training, and may give wrong answers, but due to the naturalness of its expression, it increases the user's ability to distinguish difficulty.

2. High cost of computing power
Both training and reasoning of ChatGPT require high computing power support, and the fixed cost (a large number of chip purchases and computer room construction) and variable cost (electricity and maintenance costs) are both high.

3. It is unrealistic to retrain the GPT model if new knowledge cannot be incorporated in time. If the online training mode is adopted for new knowledge, it is easy to cause catastrophic forgetting of the original knowledge due to the introduction of new data.

4. Lack of very specialized domain knowledge.
For questions from very specialized fields such as electricity, natural science, medicine, etc., ChatGPT may not be able to generate appropriate answers without sufficient corpus training.

With the development of software and hardware and innovation at the application level, these problems will be gradually alleviated or resolved.

5. Incremental iteration cost
The incremental iteration of ChatGPT can be divided into two parts.
One is parameter iteration based on reinforcement learning. There are fewer parameters in this part, and data can be gradually accumulated as users use it. The cost of iteration frequency is low, and the frequency can be higher.
The second is the part of pre-training the large language model. There are many parameters in this part, and the iteration requires high-quality data, and the iteration cost is high. However, after this part of training is completed, the required frequency of iterations is also low. The update of new knowledge can be passed to the model through the prompt when it is applied, without retraining the model.

How technology is used in the company

1. Traditional search + ChatGPT

The combination of ChatGPT and traditional search engines will inevitably bring about a new subversive change. The father of Keras said: Generative AI and search engines are complementary. What we need is a new generation of tools that combine the advantages of both. When searching, traditional search engines quickly locate webpages containing keywords according to the user's query request, and finally return the highest-ranked search results to the user. The user needs to manually filter the returned list.
Before chatGPT, we were very satisfied with this one-time interaction most of the time. We can go through it many times but after chatGPT appeared this way was changed. The main problems solved by the combination of the two are:
1. The powerful natural language processing of ChatGPT can understand natural language, accurately identify the user's search intention, and then use the search engine to return the search results. Solved the possibility that search engines can only deal with keyword query capabilities and ChatGPT produces misleading information.
2. ChatGPT can understand the query based on the context information provided by the user, and can perform more in-depth interaction and extended interaction. The text generated by ChatGPT is handed over to the search engine for further search. The combination of the two solves the problem that the search engine usually only answers a single query.
3. The knowledge of ChatGPT is limited, not as rich as the knowledge base of search engines, and compared with traditional search engines, the speed of generating results may be slower because of the need to calculate and generate results, and the two complement each other to bring users more Good experience.
In this way, whether it is the efficiency of knowledge acquisition, or in-depth interaction and extension, we have obtained great satisfaction and improvement. The combination of the two searches finally gets the answers we need.

2. Code generation

ChatGPT is called a walking code generator. It has the ability to understand human language intent and generate corresponding code. In daily work, ChatGPT can help users generate corresponding code according to their needs, and then integrate the corresponding code into the operating system environment, for example :
1. Generate SQL statements. When querying data, select the query table, limit the range of tables and fields generated by SQL, then use chatGPT to generate SQL statements, and finally use the generated SQL statements to query the data in the database.
2. The python language front-end display interface can be generated according to the requirements. After the operation is successful, the generated code will be integrated into the operating environment of the platform, and the display function can be realized simply and quickly.
3. Modify the program BUG, ​​ChatGPT The answer to the program modification question is very comprehensive. It first confirms what the intention of this code is, and then quickly finds the bug according to the intention, and also attaches a very detailed description to explain the problem Where does it come from, what kind of bug will it cause, how to change it, why should it be changed in this way, etc., to prevent program loopholes and improve the production efficiency of the operating system.

Guess you like

Origin blog.csdn.net/dream_home8407/article/details/129086163