Google used Bard to fire the first shot of Chat GPT. When will the Baidu version of Chat GPT be released?

Baidu | Bard | Chat GPT

Google | RLHF ERNIE Bott 

With the rapid development of deep learning, high-performance computing, data analysis, data mining, LLM, PPO, NLP and other technologies, Chat GPT has developed rapidly. Chat GPT is a large pre-trained language model developed by OpenAI, a variant of the GPT-3 model, which is trained to generate human-like textual responses in conversations.

In order to occupy a favorable position in the ChatGPT market, giant companies such as Baidu and Google are also strategizing and developing continuously.

As a well-known domestic manufacturer of liquid-cooled servers, Blue Ocean Brain Chat GPT deep learning all-in-one machine has realized the in-depth optimization of software and hardware collaboration, made important breakthroughs in key technologies such as distributed storage acceleration and intelligent network acceleration, and provided better cloud system performance . The NVMe-specific acceleration engine is used to give full play to the extreme performance of NVMe, and the full-stack data transmission channel realizes zero loss in data transmission of distributed storage copies. At the same time, upgrading the intelligent network engine, virtual scheduling through more types of network cards, and releasing CPU performance can save up to 90% of computing resources, increase the network forwarding rate several times, and further improve platform performance. favorite.

ChatGPT training process

On the overall technical route, Chat GPT introduces "manually labeled data + reinforcement learning" (RLHF, reinforcement learning from human feedback) to continuously Fine-tune the pre-training language model. The main purpose is to let the LLM model learn to understand the meaning of human commands (such as writing a short text to generate questions, knowledge answer questions, brainstorming questions and other types of commands), so that LLM can learn to judge the input instructions for a given prompt (the user's question ) What kind of answer is high-quality (rich in information, rich in content, helpful to users, harmless, and does not contain discriminatory information, etc.).

Under the framework of "manually labeled data + reinforcement learning", specifically, the training process of Chat GPT is divided into the following three stages:

1. The first stage: supervision and tuning model

As far as GPT 3.5 itself is concerned, although it is powerful, it is difficult to understand the different intentions embodied by different instructions of different types of humans, and it is also difficult to judge whether the generated content is a high-quality result. In order for GPT 3.5 to initially understand the intent contained in the instructions, a batch of prompts (ie, instructions or questions) submitted by test users will be randomly selected, and professional annotators will provide high-quality answers to the specified instructions, and then the professionals will annotate the data for GPT 3.5 The model is fine-tuned. Through this process, it can be assumed that GPT 3.5 initially has the ability to understand the intentions contained in human commands and provide relatively high-quality answers based on these intentions.

The first task of the first stage is to train a supervised policy model by collecting data.

  • Data collection: Select a list of prompts and ask the annotators to write out the expected results. Chat GPT uses two different sources of prompts: some are generated directly using annotators or researchers, and others are obtained from OpenAI's API requests (ie, from GPT-3 users). Although the whole process is slow and expensive, the end result is a relatively small high-quality dataset (perhaps 12-15k data points) that can be used to tune the pretrained language model.

  • Model selection: Chat GPT developers choose pre-trained models from the GPT-3.5 suite instead of fine-tuning the original GPT-3 model. The base model used is the latest version of text-davinci-003 (a GPT-3 model tuned with program code).

2. Phase 2: Training Reward Model

The main goal of this stage is to train the reward model by manually annotating the training data. Specifically, the request prompt submitted by the user is randomly selected (most of which are the same as the first stage), and the cold start model of the first stage Enhancement is used. For each prompt, the cold start model will generate K different answers, so the model will generate data <prompt, answer1>, <prompt, answer2>....<prompt, answerX>. Afterwards, the annotator sorts the X results according to various criteria (such as relevance, informativeness, harmful information, etc.) and specifies the ranking order of the X results. This is the data manually annotated at this stage.

Next, use this ranking result data to train the reward model. The training method used is actually the commonly used pair-wise learning to rank. For the X sorting results, two pairs are combined to form a training data pair, and ChatGPT uses pair-wise loss to train the Reward Model. The RM model takes <prompt, answer> as input and provides a reward score to evaluate the quality of the answer. For a pair of training data, assuming that answer1 is ranked before answer2, then the Loss function drives the RM model to score higher than others.

To sum up: at this stage, firstly, the monitoring strategy model after the cold start generates X results for each prompt, and sorts the results according to the quality of the results from high to low, and uses them as training data to train through pair-wise learning to rank mode return model. For the learned RM model, input <prompt, answer>, and output the result quality score. The higher the score, the higher the quality of the answer. Its working principle is:

  • Select the prompt list, the SFT model generates multiple outputs for each command (any value between 4 and 9);

  • Annotators rank the outputs from best to worst. The result is a newly labeled dataset approximately 10 times the size of the exact dataset used for the SFT model;

  • This new data is used to train the RM model. The model takes as input the outputs of the SFT model and ranks them in order of preference.

3. The third stage: fine-tuning the SFT model using the PPO model

This stage does not require manual labeling of data, but uses the RM model learned in the previous stage to update the pre-training model parameters according to the RM scoring results. Specifically, first a batch of new instructions are randomly selected from the prompts submitted by users (referring to new prompts different from the first and second stages), and the PPO model parameters are initialized by the cold start model. Then, for randomly selected prompts, the PPO model is used to generate answers, and the RM model trained in the previous stage is used to provide a reward score for evaluating the quality of the answers, which is the overall reward given by RM for all answers. With the final reward of the word sequence, each word can be regarded as a time step, and the reward is passed sequentially from the back to the front, and the resulting policy gradient can update the parameters of the PPO model. This is a standardized reinforcement learning process with the goal of generating high-quality answers that meet RM criteria.

If we keep repeating the second and third stages, it is clear that each iteration makes the LLM model stronger and stronger. Because in the second stage, the ability of the RM model is enhanced by manually annotating the data, and in the third stage, the enhanced RM model more accurately evaluates the answers generated by the new prompt, and uses reinforcement learning to encourage the LLM model to learn new high-quality content, which is similar to using pseudo-labels to augment high-quality training data to further enhance LLM models. Obviously, the second and third phases complement each other, which is why the effect of successive iterations will be bigger and bigger.

However, the editor believes that implementing reinforcement learning strategies in the third stage is not necessarily the main reason why the Chat GPT model is so good. Assume that the third stage does not use reinforcement learning, but adopts the following method: Similar to the second stage, for a new prompt, the cold start model may generate X answers, which are scored by the RM model. We choose the answer with the highest score to form new training data <prompt, answer>, which enters the fine-tune LLM model. Assuming this model is replaced, I believe the effect may be better than reinforcement learning. It's not as polished, but it's not necessarily much less effective. No matter which technical model is used in the third stage, it is likely to use the RM learned in the second stage to expand high-quality training data from the LLM model.

The above is the Chat GPT training process. This is an improved instruct GPT. The improvement is mainly due to some differences in the method of labeling data collection. Other aspects, including the model structure and training process, basically follow the instruction GPT. It is estimated that this Reinforcement Learning from Human Feedback technology will soon spread to other content creation directions, such as a direction that is easy to think of, similar to "A machine translation model based on Reinforcement Learning from Human Feedback" and so on. However, I personally think that it is not very important to adopt this technology in the specific field of NLP content generation, because Chat GPT itself can handle many different types of tasks, basically covering many subfields generated by NLP. Therefore, for some subdivisions of NLP, the value of using this technology alone is not great, and its feasibility can be considered to be verified by Chat GPT. If this technology is applied to the creation of other modes, such as images, audio, video, etc., this may be a direction worth exploring. Might see something like "A XXX diffusion model based on Reinforcement Learning from Human Feedback" soon.

Disadvantages of Chat GPT

Despite the rave reviews and growing merchant adoption of Chat GPT, there are still many drawbacks.

1. Response lacks coherence

Because Chat GPT can only be based on the above and has poor memory, it tends to forget some important information. Researchers are developing an AI that can look at short-term and long-term characteristics when predicting the next letter in text. This strategy is called convolution. A neural network using convolutions can track information long enough to stay on topic. 

2. There are sometimes biases

Because the Chat GPT training dataset is text, which reflects human worldview, which inevitably contains human bias. If businesses use Chat GPT to compose emails, articles, papers, etc. without human review, the legal and reputational risks are significant. For example, a racially biased article can have significant consequences.

Jerome Pesenti, Facebook's head of AI, used tweets generated by Kumar's GPT-3 to show how dangerous the output could be, using words like "Jews, blacks, women, or the Holocaust" as needed. Kumar argued that the tweets were handpicked, and Pesenti agreed, but responded that "generating racist and sexist output shouldn't be so easy, especially when it's neutral."

In addition, the evaluation of GPT-3 articles is also biased. The style of human written texts can vary widely by culture and gender. If GPT-3 grades papers without proofreading, the GPT-3 paper raters might rate students higher because their writing style is more prevalent in the training data. 

3. Weak understanding of facts

Chat GPT cannot distinguish right from wrong from a factual point of view. For example, Chat GPT might write an interesting story about a unicorn, but Chat GPT might not understand what a unicorn really is. 

4. Misinformation/fake news

Chat GPT may create realistic news or opinion articles that could be exploited by bad actors to generate disinformation, such as fake stories, fake communications or impersonated social media posts, as well as biased or abusive language. Or spam, phishing, fraudulent academic paper writing, incitement to extremism, and social engineering excuses. Chat GPT could easily be the engine of a powerful propaganda machine. 

5. Not suitable for high-risk categories

OpenAI states that the system should not be used in "high-risk categories," such as healthcare. In Nabra's blog post, the authors confirmed that Chat GPT can provide questionable medical advice, such as "suicide is a good idea." Chat GPT should not be used in high-risk situations because while it may sometimes give correct results, it will sometimes give wrong answers. Getting things right is a matter of life and death in this field. 

6. Sometimes useless information is generated

Because Chat GPT has no way of knowing which outputs are correct and which are wrong, and cannot prevent itself from spreading inappropriate content to the world. The more content that is generated using such systems, the more content pollution is created on the internet. Finding really valuable information on the Internet is becoming more and more difficult. As language models spew unchecked utterances, they may be degrading the quality of internet content, making it harder for people to gain valuable knowledge.

Measures taken by Google and Baidu in response to OpenAI

Recently, the Chat GPT chat robot has become popular all over the world and caused a sensation. These AI products are the objects of competition from many major manufacturers. On February 7th, according to foreign media reports, on Monday local time, Google announced Chat GPT competitor Bard, an artificial intelligence chat robot tool. In addition, Baidu plans to launch an AI chatbot service similar to Chat GPT OpenAI in March this year. 

1. Google launches Bard, an AI chatbot tool

Google CEO Sundar Pichai announced the project in a blog post, describing the tool as an "experimental conversational AI service" powered by LaMDA, a large language model developed by Google. , will answer user questions and engage in conversations.

He also noted that Bard's ability to pull the latest information from the web to provide fresh, high-quality responses means that Bard may be able to answer questions about recent events in a way that Chat GPT can't. 

Pichai said the software will initially be available to trusted testers before becoming more widely available to the public in the coming weeks. It's unclear what features Bard will have, but it appears the chatbot will be free to use, like Chat GPT, owned by US artificial intelligence research firm OpenAI.

It is reported that Chat GPT was launched by OpenAI on November 30, 2022. Chat GPT can quickly create articles, stories, lyrics, prose, jokes, and even codes according to user needs, and answer various questions. Once Chat GPT was released, it set off a storm on the Internet and was favored by users including writers, programmers, marketers, and other companies. Regarding the popularity of Chat GPT, Pichai issued a "red alert" within the company, saying that it will fully adapt Google's AI work around Chat GPT in 2023. Last week, Pichai said that Google would roll out its own AI language modeling tool, similar to Chat GPT, in the coming weeks or months. 

2. Official announcement of Baidu Chat GPT product confirmation: Wenxin completed internal testing in March

It is worth noting that according to foreign media reports, Baidu plans to launch an artificial intelligence chat robot service similar to Chat GPT OpenAI in March this year. The initial release will be embedded in its search service. At present, Baidu has confirmed that the project name is Wenxin Yiyan, and the English name is ERNIE Bot. The internal beta ended in March and was opened to the public. At this time, Wenxinyiyan is doing the sprint before going online.

In September last year, Baidu CEO Robin Li judged that the development of artificial intelligence "has undergone a directional change both at the technical level and at the commercial application level." It is guessed that Baidu began to make Wenxin Yiyan at that time. According to the rhythm of Google and Microsoft, Wenxinyiyan may start internal testing in advance.

Baidu has Chat GPT-related technologies, and has carried out a full-stack layout on a four-layer artificial intelligence architecture (including underlying chips, deep learning frameworks, large models, and top search applications). Wenxinyiyan is located at the model layer. Baidu has been deeply involved in the field of artificial intelligence for decades, and has an industrial-level knowledge-enhanced Wenxin large model ERNIE, which has cross-modal and cross-language in-depth semantic understanding and generation capabilities.

According to the analysis of industry insiders, especially in the field of natural language processing, there is absolutely no domestic company that can approach Baidu's current level. Some experts pointed out that Chat GPT is a milestone of artificial intelligence, and even a watershed, which means that the development of AI technology has reached a critical point, and enterprises need to implement it as soon as possible.

Guess you like

Origin blog.csdn.net/LANHYGPU/article/details/128940077