ChatGPT detailed introduction & principle

16361611:

Introduction of ChatGPT

The full name of ChatGPT is "chat Generative Pre-trained Transformer", translated into Chinese is a generative pre-trained transformation model. It is a chat robot program released by the American company OpenAl on November 30, 2022. It can be used for question and answer, text summary generation, machine translation, classification, code generation and dialogue Al. "l is a natural language processing tool driven by artificial intelligence technology.

ChatGPT shares some of the features of similar products, such as the conversational ability to answer context-sensitive follow-up questions within the same session. However, the reason why it ignited the world in a short period of time is that in the screenshots posted by netizens, ChatGPT can not only talk to users smoothly, but also write poems, articles, and codes.

ChatGPT also adopts a training method that focuses on the moral level, according to the pre-designed moral guidelines, "say no" to questions and requests with bad intentions. Once it is found that the text prompt given by the user contains malicious intent, including but not limited to violence, discrimination, crime, etc., it will refuse to provide a valid answer.

Application fields:

1. Customer service automation

2. Smart Assistant

3. Education

4. Healthcare

5. Financial Services

6. Social Media

1. Customer service automation: ChatGPT can be used to build chatbots that can automatically answer user questions, provide technical support and solve problems. This can not only reduce labor costs, but also provide 24-hour service to provide customers with a better experience.

2. Smart Assistants: ChatGPT can be used to build smart assistants that can help people with daily tasks such as schedule management, shopping, booking flight tickets and hotels, etc. These assistants can provide personalized services according to the user's needs and improve people's productivity.

3. Education: ChatGPT can be used to build smart educational applications that can answer students' questions, explain concepts, and provide a better learning experience. This kind of application can provide personalized tutoring according to the student's learning style and level, helping students to better understand and master knowledge.

4. Healthcare: ChatGPT can be used to build healthcare applications that can answer patients' questions and provide health consultation and advice. This kind of application can provide personalized advice based on the patient's condition, helping patients better manage their health.

5. Financial services: ChatGPT can be used to build smart financial applications that can provide customers with personalized investment advice and provide investment portfolios based on customers' risk preferences and investment goals. This kind of application can help customers make better investment decisions and improve the return on investment of customers.

6. Social media: ChatGPT can be used to build intelligent social media applications that can provide personalized content recommendations based on users' interests and preferences. Such applications can help users better discover and share content of interest.

working principle

1. Data collection

2. Pretreatment

3. Building a model

4. Generate text

5. Output control

1. Data collection: ChatGPT will collect a large amount of text data, including web pages, news, books, etc. At the same time, it also analyzes hot topics and popular culture on the Internet to understand the latest language patterns and expressions.

2. Preprocessing: ChatGPT preprocesses the collected data, including word segmentation, removal of stop words, translation, etc. This process can help the model better understand the input text and improve the quality of the generated text.

3. Model building: On the basis of preprocessing, ChatGPT will build a deep learning model, which includes multiple convolutional layers, cyclic neural networks, and pooling layers. Working together, these layers enable the model to better capture the patterns and semantics of the language.

4. Generating text: Once the model is built, ChatGPT can generate output text similar to human language. It uses a deep learning architecture called a "Transformer" that learns a mapping from input text to output text.

5. Output control: After ChatGPT generates text output, a series of output controls are required, including grammar, semantics, emotion, etc., to ensure that the generated text conforms to human language habits.

ChatGTP's most basic training of language models involves predicting a word in a word sequence. The most common, usually "next-token-prediction" (next-token-prediction), the goal is to predict the next possible word or token given some text. This task is the basis of the language model and can Used in text generation, automatic translation, speech recognition, etc.) and masked-language-modeling (masked-language-modeling, the main idea is to cover up some tags or words in the input text, and then let the model predict these covered mark or word)

The image above is an example of a basic sequence modeling technique, typically deployed with a Long Short-Term Memory (LSTM, a special type of Recurrent Neural Network RNN) model. Given the context, the LSTM model fills in the blanks with the word with the highest statistical probability. This sequence modeling structure has the following two main limitations.

1. This model cannot give higher weight to certain contexts. In the above example, for example, "Jacob hates reading" mentioned above, the model may associate "reading" with "hates" by default, but in practical applications, if there is a character "Jacob" in the data, And in this data, "Jacob" likes reading very much, so when processing the sentence "Jacob hates reading", the model should pay more attention to the information of "Jacob", instead of simply relying on the relationship between "reading" and "hates" in the context. relationship to draw conclusions. Therefore, if the model only relies on the words in the context without fully considering the relationship between entities in the text, it may draw wrong conclusions in practical applications.

2. Secondly, when LSTM processes input data, it is based on sequence-by-sequence input and processed step by step, rather than processing the entire corpus together at one time. This means that when training LSTMs, the context window size is fixed and can only be extended across a few steps of the sequence, not across the entire sequence. This approach limits the LSTM model to capture more complex relationships between words and derive more meaning from them.

In response to this problem, a team at Google Brain introduced transformers in 2017. Unlike LSTMs, transformers can process all input data simultaneously. Transformers are based on self-attention (self-attention, for each word, self-attention can strengthen or weaken the representation of the word by calculating the relationship strength between the word and other words, so as to better capture semantic information) mechanism, The model can assign different weights to different parts of the input data according to their relationship to any position in the language sequence. This feature is a huge improvement in infusing meaning into LLMs and supports handling of larger datasets.

Step 1: Supervised fine-tuning (SFT) model

The first part of model development required the hiring of 40 contractors to fine-tune the GPT-3 model by creating a "supervised training dataset," in which the inputs have known outputs, for the model to learn from. Input (prompt) is collected by actual user input to the Open API. The labeler then responds appropriately to the prompt, creating a known output for each input. Then use this new supervised dataset to fine-tune the GPT-3 model to build the GPT-3.5, also known as the SFT model.

To keep the prompt dataset as diverse as possible, any given user ID is limited to 200 prompts, and prompt prefixes entered by everyone are dropped. Finally, any prompts containing personally identifiable information (PII) are also removed.

After aggregating the hints from the OpenAI API, they asked the labelers to create sample hints to populate the categories with the least real sample data. Categories of interest include:

  • Plain prompts: Arbitrary inquiries.

  • Few-shot: An instruction that contains multiple query/response pairs.

  • User-based prompts: Equivalent to use-case-specific requests made to the OpenAI API.

When generating a response, the tagger does its best to infer what the user's instruction was. The paper describes three main ways in which prompts request information.

  • Direct: "Tell me about..."

  • Few-shot: Given these two examples of stories, write new stories on the same topic.

  • Continuation: Give the beginning of a story, which will be continued by AI.

A total of 13,000 input/output samples were derived from hints from the OpenAI API and handwritten by the annotators, which were then deployed on this "supervised model."

Step 2: Reward Model

After training the SFT model in step 1, the model generates more consistent responses to user prompts. The next improvement is to train a reward model, where the model input is a sequence of prompts and responses, and the output is a scalar value called a reward. Reward models are needed to take advantage of Reinforcement Learning. In reinforcement learning, the model learns to generate outputs that maximize its reward over cumulative rewards (see step 3).

To train the reward model, the labeler receives the output of 4 to 9 SFT models. They were then asked to rank these outputs for better or worse performance, and the output ranking combinations created were as follows.

Including each combination of the model as a separate data point can lead to overfitting (i.e. the model can only perform well on known data, but cannot generalize to unseen data). To address this, the model is built using each set of rankings as a single batch of data points (editor's note: because each batch of data points contains multiple ranking combinations, increasing the diversity of model learning and generalization ability).

Step 3: Reinforcement Learning Model

In the final stage, the model receives random prompts and returns responses. Responses are generated using the "policy" (a specific action selection function) learned by the model in step 2. A policy means an action that a machine has learned to use to achieve its goal; in this case, maximization of reward. Based on the reward model formed in step 2, determine the scalar reward value for a given cue and response pair. Rewards are then fed back to the model to improve the policy.

In 2017, Schulman et al. introduced Proximal Policy Optimization (PPO), a method that can be used to update policies as the model generates responses. The PPO method incorporates a per-token based Kullback-Leibler (KL) penalty in the SFT model. KL divergence can be used to measure the similarity between two probability distributions and impose penalties for distributions that are too far apart. In this case, using a KL penalty limits the distance between the response and the output of the SFT model trained in step 1 to avoid over-optimizing the reward model and causing the response to deviate too far from the human intent dataset. By introducing a KL penalty, the model's accuracy and generalization can be balanced during training.

Steps 2 and 3 of this process can be repeated, but are not widely used in practice.

model evaluation

When training a new model, a separate set of data (called the "test set") that the model has never seen before is set aside. This is to ensure that the performance of the model is evaluated on data the model has not been exposed to before, thus providing a more accurate estimate of generalization ability.

Usefulness: the ability of the model to infer and follow user instructions. Labelers prefer the output of InstructGPT over GPT-3 85 ± 3% of the time.

Reality: The tendency of the model to "hallucinate". When evaluated with the TruthfulQA dataset, the output produced by the PPO model shows a slight increase in truthfulness and informativeness.

Harmlessness: The ability of a model to avoid generating inappropriate, derogatory, and disparaging content. The researchers tested the innocence using the RealToxicityPrompts dataset. Tests were performed under three conditions.

  1. Instructions provide a polite response: resulting in a significant reduction in toxic reactions.

  2. Responses were indicated, without any setting of respect: no significant change in toxicity.

  3. Instructions to provide toxic/derogatory responses: responses were much more toxic than the GPT-3 model.

Algorithm structure of Transformer model

Specific calculation process

For the specific calculation process, use the translation sentence "I love you" to "I love you" as an example (this sentence is simpler). First, vectorize and absorb the sentence position information to obtain an initial vector group of a sentence.

(Because the length of each sentence in the sample is different, each sentence will be a 512*512 matrix. If the length is not enough, replace it with 0. In this way, no matter how long the sentence is, you can use a matrix of the same size during training. To represent. Of course, 512 is a super parameter, which can be adjusted before training.)

Next, the initial vector of each word is multiplied by three random initial matrices WQ, Wk, Wv respectively to obtain three quantities Qx, Kx, Vx. The figure below uses "I" as an example.

Then, calculate the attention value of each word. For example, the attention value of the word "I" is to multiply the QI of the word "I" by the K value of other words in the sentence. The mathematical meaning of multiplying two matrices is to measure two matrix similarity. Then through a SoftMax conversion (you don't have to worry about how to calculate it), calculate the weight of it and each word, and the weight ratio must be equal to 1 when added together. Each weight is then multiplied by the corresponding V value. All the products are added to get this Attention value.

This attention value is the correlation information of each word in the sentence, in addition to its own information and location information of the word "I".

You can find that in the calculation logic of all attention coefficients, only the initial matrix WQ, Wk, and Wv of each word are unknowns (these three matrices are shared by all words). Then we can simplify this transformer into an equation about the input, output and this W matrix: where X is the input text information and Y is the translation information.

e2462dda80ac380b8dca2ab6e4c5afc7.png

The Transformer algorithm is essentially a feed-forward neural network model. Its basic calculation logic, regardless of the complex hidden layer, is to assume Y=f(x)=wx, (the goal is to calculate an f()) and then set it randomly A w0, start to calculate the cost function of y=w0x, then change w0 into w1, calculate the cost function of y=w1x, and calculate countless w by analogy (not countless, it will also converge), and then compare which w’s The minimum cost function is the f() we trained. Then in the transformer, these three initial matrices are the w0.

Going back to the transformer, after calculating the Attention, each word is entered into a new high-dimensional space according to the semantic relationship. This is the Self-attention (self-attention mechanism).

 

 

 

 

 

Guess you like

Origin blog.csdn.net/2301_78731684/article/details/131262140