[Artificial Intelligence] Training principle of Chatgpt


Preface

        Not long ago, when I was learning C language, I wrote a piece of code for three-piece chess, but the computer I was playing against didn’t think at all. You’ll understand why after reading this code:


void computerMove(char Board[ROW][COL], int row, int col)
{
	
	while (1)
	{
		unsigned int i = rand() % ROW, j = rand() % COL;
		if (Board[i][j] == ' ')
		{
			Board[i][j] = '#';
			break;
		}
	}
}

        The chess pieces moved by the computer are randomly generated, so I want to give the computer a certain intelligence and write some thinking functions for it. However, writing down the logic code of if nested if one after another is not only very error-prone, but also the subsequent thinking. It is also very difficult to debug.

        We know that artificial intelligence can play chess, but how is the source code of artificial intelligence autonomous learning implemented?

        So, I thought of Chatgpt. Although Chatgpt is not strictly implemented through code, its principle is indeed interesting.

Positioning "information"

        The operation of artificial intelligence cannot be separated from information; 

        We are familiar with the word information, but what is the status of information in time and space? In other words: what is its location?​ 

As shown in the picture: 

 Physically, starting from the earth, the earth is the largest ecosystem. The three functions of the ecosystem are:

        Energy flow, material circulation, information transmission;

        In an ecosystem,the three major functions are inseparable, interacting and interdependent.

We will leave this picture aside for the time being as a foreshadowing.​ 


 Throughout the development of human history, we have experienced the following historical periods:

        

        Every period is a stage, and every stage is a leap, a qualitative change after a quantitative change.​ 

         In each period, compared to the previous period, the technology of the new period often has more advantages, and the technology of the new period often eliminates the technology of the old period, so human history is moving forward.

 


Summary 

In the ecosystem, information is dynamic, it constantly flows, is transmitted, and thus plays a role;

        Until the rapid development and widespread application of human beingscommunication technology and computer technology. It marks that mankind has entered the information age. Until this time, we have not really paid attention to the role of information.

        From the perspective of information positioning:

        1. The status of information is very important

        2. The emergence of computers allows almost all information to be represented by data. Information can be represented by computers, which means it can be calculated. Computers can represent information, search for information, and even Through information, predict new information that will appear.

Forecast information 

        What can be done with forecast information? It can make weather forecasts, predict stock prices, and even realize artificial intelligence!​ 

 

what is artificial intelligence

concept 

        Artificial intelligence refers to the theories, methods, technologies and application systems that simulate human intelligence through computer technology. AI allows machines to think, understand, judge, learn, reason, plan, make decisions, etc. like humans, so that they can complete various intelligent tasks.

invention

        In the summer of 1956, a group of far-sighted young scientists led by McCarthy, Minsky, Rochester and Shannon gathered together to study and discuss a series of related issues in using machines to simulate intelligence, and for the first time put forward The term "artificial intelligence" marks the official birth of the emerging discipline of "artificial intelligence".​ 

 Why artificial intelligence works

        The reason why we call AI AI is because we want to understand the essence of intelligence, and we want to realize what intelligence is by realizing the thinking of intelligent people through computer simulation.​ 

Two ways to implement artificial intelligence:

 Engineering methods:

        That is, it does not matter whether the method used is the same as that used by human intelligence, as long as the corresponding effect can be achieved.

Simulation method:

        That is to say, it is not only necessary to look at the effect, but also requires the implementation method to be consistent with the method used by human intelligence.

How to understand it?

        e.g.1

        When we solve a quadratic equation of one variable, we use the root formula, but the computer does not know that there is such a thing as the root formula. It solves this

For an equation, the exhaustive method is used, and the values ​​of the independent variables are substituted one by one and tried, and finally the root closest to the true value is output.

        Of course, we can also write a program to communicate with the machine and tell it that there is a root-finding formula and that it can use it.

        However, this does not mean that the machine running according to the program we set has intelligence; it can be said that this is just the optimization of the algorithm.

But machines are still far from true intelligence.


         e.g.2 

        ​ ​ ​ Looking back at the initial example:

        If I write a program for the computer and tell it if (judgment) {how to go} (of course this is very painful for me who writes the program), this may be a way to achieve artificial intelligence, but the computer does not really "thinking" because he just executes according to the code.

        This (formulating a set of rules to guide the computer) has major flaws:

        1. Manual programming is cumbersome and the workload is heavy

        2. Error-prone

        3. Once an error occurs, you need to debug, modify the source code, compile, run, and finally provide a new version manually. 

        If you write a method for the computer to learn lessons from the game of three-piece chess, continue to learn, and ultimately achieve the purpose of relying on its own database to achieve self-decision, this may also be a way to achieve artificial intelligence.

        In other words, I only need to implement an intelligent system. Although it doesn't understand anything at first, just like a baby, it can learn, and it can gradually adapt to the environment to cope with various complex situations.​ 


Early development

First, introduce an assumption:

        Markov Hypothesis: The probability of a word appearing is only related to the previous word, and has nothing to do with earlier words or subsequent words.

        Assume that it is related to the first (n-1) words (that is, N-gram model), but n needs to have a range that needs to be satisfied. Words that should appear with high frequency should appear with high frequency in the sample; words that should appear with low frequency , appears with low frequency in the sample.

        But the value of n is not easy to determine:

        If the value of n is too large, then the probability distribution that needs to be recorded will increase exponentially, so n cannot be infinitely large, that is, it cannot have a long context; at the same time, if n is too large, then the word is likely to rely on the context of a long time ago. Then this model appears to be very inefficient.

        If the value of n are too small, the accuracy of the results are difficult to guarantee.

        This is also theN-gram model, and later theRNN (Recurrent Neural Network) Although it solves some problems of N-gram, RNN still has its own problems - gradient reduction (because of the existence of activation function, during backpropagation, a small part of the original appears are ignored because smaller parts are not sensitive to changes in value)

Transformer model

        Until later, a new model was proposed by Goggle "Attention Is All You Need"——GPT model

(The link to the paper is at the end of the article) 

 

word vectorization

Why vectorize words?

        The bottom layer of a computer is binary. If real-world information is given to a computer for processing, it will be converted into numbers.

        A semantically rich word is just a string of numbers composed of 0 and 1 inside the computer. How does the computer understand them?

We are an intelligent body that can make judgments and process information; the early computer was just a locker that helped us store information.

information, just like we store food in the refrigerator. The refrigerator does not recognize the type of food. Its task is only to store it. What is important is that we

Just know the types of food. How to make the refrigerator recognize the type of food?

Word vectorization is a solution to this problem.

        Word vectorization can make the semantic relationships between words reflected in vector space, and vectors can be calculated, which lays the foundation for computers to understand words.

What effect do we want to achieve?

        Let's imagine:

e.g.1

        The king vector minus the man vector plus the woman vector is exactly the queen vector;

e.g.2

         As shown in the figure, king and queen symbolize royal power, and man and woman symbolize gender. In this way, we can roughly think that royal power has more components on one axis, and gender has more components on the other axis;

       The above two examples are examples of vocabulary vectorization.

        In a suitable vector space, the spatial relationship between words reflects the actual relationship between them.

        How to achieve this effect?

        Since the vector can be calculated, that is, the difference between the correct result and the result can be calculated. The difference between the two can be expressed by a functional relationship. This function is the loss function. Once converted into a function, the training process is computable. mathematical method, that is, the loss function must converge.​ 

Information compression and feature extraction

        We humans can forget some irrelevant information and retain some important information through the attention mechanism of the brain. We can answer questions by processing the main information in the question.

        ​ ​ But computers do not have this function of the brain, so we want to find a way to extract language features.

        But general models such as N-gram and RNN models have certain limitations, such as:

        Xiao Ming read the blogger’s article and liked it very much. He reached out and gave the blogger a ______:

        A: Three in a row B: Big mouth

Obviously, as intelligent beings, we have the blessing of important information extraction from the brain, and can easily obtain the inference results - three consecutive;

(In this process, our brains guess the results by extracting words such as "article" and "like", but computers do not have this function)

However, if you only rely on N-gram and RNN models, the model will first pay attention to the words closest to the inferred content, so the computer will most likely give the blogger something other than three consecutive words. (0_=_0)..

Attention Is All You Need

 

        To put it simply, let the words in the sentence do vector dot multiplication with all the words in the sentence, and train the results again. The final training result will let the computer calculate the dot multiplication results of one word and other words in the sentence. To speculate on the possibility of other words appearing after this word, output the next word based on the possibility.

        With appropriate feature extraction training, the computer learns how to speak beautifully; ChatCPT itself is a language model. The purpose of its invention is not to solve practical problems, but how to speak beautifully. The reason why we think ChatGPT says It makes sense. It is the result of reading a large number of corpora and undergoing a lot of training.

        ChatGPT guesses what the next word is based on the above content, then adds this word and continues to guess.


Can ChatGPT replace humans?

         ​​​​ ChatGPT is just a language model. It can provide us with a certain reference for solving problems, but it cannot really solve the problem. It can indeed replace some people and make them unemployed, but if these people can make good use of the advantages of GPT and make it their own strength instead of rejecting it or belittling it, then the emergence of GPT should not cause us anxiety. But it should make us think.

 


Attention Is All You Needicon-default.png?t=N7T8http://Attention Is All You Need:https://arxiv.org/pdf/1706.03762.pdf


 Finished~

Reprinting without the author's consent is prohibited

Guess you like

Origin blog.csdn.net/2301_79465388/article/details/134608332