ChatGPT principle course that everyone can understand: ChatGPT and natural language processing

Table of contents

ChatGPT and the Turing Test

Modeling form of ChatGPT

The development history of ChatGPT and NLP

Rule-Based NLP

Statistics-based NLP

NLP based on reinforcement learning

The development of NLP technology

ChatGPT's neural network structure Transformer

Summarize


ChatGPT (Chat Generative Pre-training Transformer) is an AI model that belongs to  the field of Natural Language Processing (NLP), which is a branch of  artificial intelligence . The so-called natural language refers to English, Chinese, German, etc. that people come into contact with and use in their daily life. Natural language processing refers to allowing computers to understand and correctly operate natural language to complete tasks specified by humans. Common tasks in NLP include keyword extraction in text, text classification, machine translation, and more.   

There is also a very difficult task in NLP: the dialogue system , which can also be generally referred to as a chat robot , is exactly what ChatGPT does.

ChatGPT and the Turing Test

Since the emergence of computers in the 1950s, people have begun to study how to allow computers to assist humans in understanding and processing natural language. This is also the development goal of the field of NLP. The most famous one is the Turing test .

In 1950, the father of computing, Alan Turing, introduced a test to check whether a machine could think like a human, known as the Turing Test . Its specific test method is exactly the same as the current ChatGPT method, that is, to build a computer dialogue system , a person and the tested model talk to each other, if the person cannot tell whether the other party is a machine model or another person, it means that the model has passed According to the Turing test, the computer is intelligent.

For a long time, the Turing test has been considered by the academic circles to be the pinnacle that is difficult to climb. Because of this, NLP is also known as the jewel in the crown of artificial intelligence. The work that ChatGPT can do has gone far beyond the scope of chat robots. It can write articles according to user instructions, answer technical questions, do math problems, do foreign language translation, play word games and so on. So, in a way, ChatGPT has taken off the crown jewel.

Modeling form of ChatGPT

The working form of ChatGPT is very simple. The user asks any question to ChatGPT, and the model will answer it.

Among them, the user's input and the output of the model are both in text form. A user input and an output corresponding to a model are called a round of dialogue. We can abstract the ChatGPT model into the following process:

In addition, ChatGPT can also answer continuous questions from users, that is, multiple rounds of dialogue, and there is information association between multiple rounds of dialogue. Its specific form is also very simple. When the user enters for the second time, the system will stitch together the first input and output information by default, so that ChatGPT can refer to the information of the last conversation.

If the user has too many rounds of dialogue with ChatGPT, generally speaking, the model will only retain the information of the most recent rounds of dialogue, and the previous dialogue information will be forgotten.

After ChatGPT receives the user's question input, the output text is not directly generated in one breath, but is generated word by word. This word-by-word generation is called Generative  . As shown below.

When the user enters a question: "Do you like apples or bananas?", after ChatGPT receives the data, it will first generate a word "I", and then, the model will synthesize the user's question and the generated "I" to continue to generate the next One word "hi". And so on until a complete sentence "I like apples." is generated.

The development history of ChatGPT and NLP

The modeling form of ChatGPT was introduced earlier. Just imagine, if you are asked to implement a ChatGPT model, what ideas and methods do you have? In fact, there are roughly two strategies, rule NLP and statistics- based NLP. Since ChatGPT started, the field of NLP has entered the era of reinforcement learning , that is, NLP based on reinforcement learning.

Rule-Based NLP

Rule-based NLP refers to the use of human-written rules to process natural language. These rules are usually based on knowledge of syntax, semantics, and pragmatics, and can be used to parse and generate natural language. For example, we can design a dialog system based on the following rules:

Rule 1: When the model receives the user's question, remove the word "?" in the question and replace "?" with "."

Rule 2: Replace "you" with "I" and "I" with "you"

Therefore, we can make a dialogue model according to these rules and start the dialogue mode.

User: Hello.

Model: Hello.

User: Are you ChatGPT?

Model: I am ChatGPT.

User: Do you like the Nuggets community?

Model: I love the Nuggets community.

User: Have you used the jionlp software toolkit?

Model: I have used the jionlp software toolkit.

The above is a very rough example of a rule-based dialog system. I believe readers can easily find out the problems in it. What if the user question is too complex? What if there is no question mark in the question? We need to keep writing various rules to cover the special cases above. This illustrates several significant disadvantages of rule-based:

  1. In natural language, any rule cannot completely cover the requirements, so it is not effective when dealing with complex natural language tasks;

  2. The rules are endless, and it will be a huge amount of work to complete it by manpower;

  3. In essence, the task of natural language processing has not been handed over to the computer, and it is still dominated by humans.

This is the way in the early days of NLP development: complete model system construction based on rules. In the early days, it was generally called symbolism.

Statistics-based NLP

Statistics-based NLP uses machine learning algorithms to learn the regular features of natural language from a large corpus , which was also called connectionism in the early days. This method does not need to manually write rules, and the rules are mainly implicit in the model by learning the statistical characteristics of the language. In other words, in the rule-based method, the rules are explicit and manually written; in the statistical-based method, the rules are invisible, implicit in the model parameters, and trained by the model based on the data .

These models have developed rapidly in recent years, and ChatGPT is one of them. In addition, there are various models with different morphological structures, and their basic principles are the same. They are mainly handled as follows:

Label data => build a model, determine the input and output => train the model => use the trained model to work

In ChatGPT, pre-training (  Pre-training  )  technology is mainly used to complete statistics-based NLP model learning. At the earliest, the pre-training in the NLP field was first introduced by the ELMO model (Embedding from Language Models), which was widely used in subsequent ChatGPT and other deep neural network models.

Its focus is to learn a language model based on large-scale original corpus, and this model does not directly learn how to solve a specific task, but learns information from grammar, morphology, pragmatics, to common sense, knowledge, etc. integrated into the language model. Intuitively speaking, it is more like a knowledge memory than applying knowledge to solve practical problems.

The benefits of pre-training are many, and it has become a necessary step for almost all NLP model training. We will expand on it in subsequent chapters.

Statistics-based methods are far more popular than rule-based methods, however, their biggest drawback is black-box uncertainty, that is, the rules are invisible and implicit in the parameters . For example, ChatGPT will also give some ambiguous and unintelligible results, and we have no way of judging from the results why the model gave such an answer.

NLP based on reinforcement learning

The ChatGPT model is based on statistics, but it uses a new method, Reinforcement Learning with Human Feedback (RLHF)  , which has achieved excellent results and brought the development of NLP into a new stage .

A few years ago, Alpha GO defeated Ke Jie. This almost shows that under the right conditions, reinforcement learning can completely defeat humans and approach the limit of perfection. At present, we are still in the era of weak artificial intelligence , but it is limited to the field of Go. Alpha GO is a strong artificial intelligence , and its core lies in reinforcement learning  .

The so-called reinforcement learning is a method of machine learning, which aims to allow the agent (agent, which mainly refers to the deep neural network model in NLP, that is, the ChatGPT model) to learn how to make optimal decisions by interacting with the environment.

This approach is like training a dog (agent) to listen to a whistle (environment) and eat (a learning objective).

A puppy will be rewarded with food when the owner whistles; when the owner does not whistle, the puppy can only starve. Through repeated eating and starvation, the puppy can establish the corresponding conditioned reflex, which is actually a reinforcement learning.

In the field of NLP, the environment here is much more complicated. The environment for the NLP model is not a real human language environment, but an artificially constructed language environment model. Therefore, the emphasis here is on reinforcement learning with human feedback.

The statistical method allows the model to fit the training data set with the maximum degree of freedom; while reinforcement learning is to give the model more degrees of freedom, so that the model can learn independently and break through the limitations of the established data set. The ChatGPT model is a fusion of statistical learning methods and reinforcement learning methods, and its model training process is shown in the following figure:

This part of the training process will be described in Sections 8-11.

The development of NLP technology

In fact, the three methods based on rules, statistics, and reinforcement learning are  not just a means of processing natural language, but a kind of thinking . An algorithm model to solve a certain problem is often the product of the fusion of these three solutions. 

If a computer is compared to a child, natural language processing is like a human being educating a child to grow up.

The rule-based approach is like a parent who controls the child 100% and requires him to act according to his own instructions and rules, such as setting a few hours of study every day and teaching the child every question. Throughout the process, the emphasis is on hands-on teaching , and the initiative and focus are on the parents. For NLP, the initiative and focus of the whole process are on programmers and researchers who write language rules.

The method based on statistics is like parents only telling their children how to learn, instead of teaching each specific question, emphasizing semi- guidance . For NLP, the focus of learning is on the neural network model, but the initiative is still controlled by the algorithm engineer.

Based on the method of intensive learning, it is like parents only setting educational goals for their children. For example, they are required to achieve 90 points in the exam, but they do not care about how the child learns. They are all completed by self-study, and the child has a very high degree of freedom . degree and initiative. Parents only make corresponding rewards or punishments for the final result , and do not participate in the entire educational process. For NLP, the center of gravity and initiative of the entire process lies with the model itself.

The development of NLP has been gradually moving closer to the statistical-based method, and finally achieved a complete victory by the method based on reinforcement learning . The symbol of victory is  the advent of ChatGPT  ; while the rule-based method has gradually declined, and it has become an auxiliary method. processing means. The development of the ChatGPT model, from the very beginning, has been unswervingly developing and progressing in the direction of allowing the model to learn by itself.

ChatGPT's neural network structure Transformer

In the previous introduction, in order to facilitate the reader's understanding, the specific internal structure of the ChatGPT model was not mentioned.

ChatGPT is a large neural network, and its internal structure is composed of several layers of Transformer, which is a neural network structure. Since 2018, it has become a common standard model structure in the NLP field, and Transformer is found in almost all kinds of NLP models.

If ChatGPT is a house, then Transformer is the brick to build ChatGPT.

The core of Transformer is the self-attention mechanism (Self-Attention), which can help the model automatically focus on other positional characters related to the current positional character when processing the input text sequence. The self-attention mechanism can represent each position in the input sequence as a vector, and these vectors can participate in the calculation at the same time, thereby achieving efficient parallel computing. As an example:

In machine translation, when translating the English sentence "I am a good student" into Chinese, traditional machine translation models may translate it into "I am a good student", but this translation result may not be accurate enough. The article "a" in English, when translated into Chinese, needs to be determined in conjunction with the context.

When using the Transformer model for translation, you can get more accurate translation results, such as "I am a good student".

This is because Transformer can better capture the relationship between words spanning a long distance in English sentences and solve the long dependence of text context . The self-attention mechanism will be introduced in Section 5-6, and the detailed structure of Transformer will be introduced in Section 6-7.

Summarize

  • The development of the NLP field has gradually changed from artificially writing rules and logic control computer programs to completely handing over the network model to adapt to the language environment.
  • ChatGPT is currently the closest NLP model to pass the Turing test, and GPT4 and GPT5 will be closer in the future.
  • The workflow of ChatGPT is a generative dialogue system.
  • The training process of ChatGPT includes pre-training of language model and reinforcement learning of RLHF with human feedback.
  • The model structure of ChatGPT adopts Transformer with self-attention mechanism as the core.

In the following chapters, we will explain these contents one by one.

Guess you like

Origin blog.csdn.net/m0_68036862/article/details/131198304