How do neural networks work? | JD Cloud Technical Team

As programmers, we are accustomed to understanding the underlying principles of the tools and middleware we use. This article aims to help you understand the underlying mechanisms of AI models, so that you can be more comfortable and more suitable when learning or applying various large models. Friends who have no AI foundation.

1. The relationship between GPT and neural networks

Everyone must be familiar with GPT. When we talk to it, we usually only need to pay attention to the questions we ask (input) and the answers given by GPT (output). We know nothing about how the output content is generated. It's like a mysterious black box. 

GPT is a natural language processing (NLP) model based on neural networks. It uses a large amount of data to input into the neural network to train the model until the output of the model meets our expectations to a certain extent. The mature model can receive user input. , and give "thought-through" answers to the key information in the input. To understand how GPT "thinks", perhaps we can start from the neural network.

2. What is a neural network?

So, what exactly is a neural network? In other words, why neural network?

High school biology tells us that the human nervous system is composed of hundreds of millions of connected neurons. They are biological cells with main structures such as cell bodies, dendrites, and axons. The connections between different neurons are Dendrites and axons are connected to other neurons through synapses, forming a complex neural network in the human brain. 

 

In order to make machines acquire intelligence close to humans, artificial intelligence attempts to imitate the thinking process of the human brain and creates a computing model that imitates the interconnection between neurons in the human brain - neural network. It consists of multiple layers of neurons, each of which receives input and produces a corresponding output. According to the above definition, the internal structure of the black box in Figure 1 has begun to take shape. Each circle in the figure below represents a neuron. The neuron has computing power and can pass the calculated results to the next neuron. 

 

In biology, the simpler the brain's structure, the lower the intelligence; correspondingly, the more complex the nervous system, the more problems it can handle, and the higher the intelligence. The same is true for artificial neural networks. The more complex the network structure, the more powerful the computing power. This is why deep neural networks have been developed. It is called "deep" because it has multiple hidden layers (that is, the number of layers of longitudinal neurons in the figure above). Compared with traditional shallow neural networks, deep neural networks have more hierarchical structures.

The process of training a deep neural network is called deep learning. After building a deep neural network, we only need to input training data into the neural network, and it will spontaneously learn the features in the data. For example, if we want to train a deep neural network to recognize cats, we only need to input a large number of pictures of cats of different types, postures, and appearances into the neural network and let it learn. After successful training, we input an arbitrary image into the neural network and it will tell us whether there is a cat in it.

3. How does the neural network calculate?

Now that we know what a neural network is and its basic structure, how do the neurons in the neural network calculate the input data?

Before that, we have to solve a question: How is data input into the neural network? The following uses image and text type data as examples.

How data is fed into a neural network

1. Image input processing

Imagine a picture: when we enlarge a picture to a certain extent, we can see small squares one after another. This small square is called a pixel. The more pixels a picture has, the higher the pixels and the clearer the picture. Each pixel is composed of only one color. The three primary colors in optics include red, green, and blue. All other colors can be produced by mixing these three colors to varying degrees. In the RGB model, the intensity of each color can be represented by a numerical value, usually between 0 and 255. An intensity value of 0 for red means no red light, and 255 means maximum intensity of red light; the intensity values ​​for green and blue are similar.

To store an image, the computer stores three separate matrices that correspond to the intensities of the red, green, and blue colors of the image. If the size of the image is 256 * 256 pixels, then three 256 * 256 matrices (two-dimensional arrays) can be used to represent the image in the computer. It can be imagined that the colors represented by the three matrices are overlapped and stacked together to reveal the original appearance of the image.

Now that we have how the image is represented in the computer, how do we feed it into the neural network?

Usually we convert the above three matrices into a vector, which can be understood as an array of 1 * n (row vector) or n * 1 (column vector). Then the total dimension of this vector is 256 * 256 * 3, and the result is 196608. In the field of artificial intelligence, each data input to the neural network is called a feature, so there are 196608 features in the above image. This 196608-dimensional vector is also called a feature vector. The neural network receives this feature vector as input, makes a prediction, and then gives the corresponding result.

2. Text input processing

Text is composed of a series of characters. First, the text needs to be divided into meaningful words. This process is called word segmentation. After word segmentation, build a vocabulary consisting of all the words or some high-frequency words that appear (you can also use an existing vocabulary). Each word in the vocabulary is assigned a unique index, which converts the text into a discrete sequence of symbols that the neural network can process. The text's sequence of symbols is typically converted into a dense vector representation before being fed into a neural network.

Take the text "How does neural network works?" as an example:

  • 分词:["how", "does", "neural", "network", "works"]
  • Build vocabulary: {"how": 0, "does": 1, "neural": 2, "network": 3, "works": 4}
  • Serialized text data: ["how", "does", "neural", "network", "works"] -->[0, 1, 2, 3, 4]
  • Vectorization:
#此处以one-hot向量表示法为例:
[[1, 0, 0, 0, 0]
 [0, 1, 0, 0, 0]
 [0, 0, 1, 0, 0]
 [0, 0, 0, 1, 0]
 [0, 0, 0, 0, 1]]

Finally, the vector sequence is used as input to the neural network for training or prediction.

So far we already know how data is input into the neural network, so how does the neural network train based on this data?

How neural networks make predictions

First, clarify the difference between model training and prediction: training refers to adjusting the parameters of the model by using known data sets so that it can learn the relationship between input and output; prediction refers to using a trained model to predict new data. Enter data to make predictions.

The prediction of the neural network is actually based on a very simple linear transformation formula:

 

Among them, xrepresents the feature vector, wis the weight of the feature vector, indicating the importance of each input feature, and brepresents the threshold, which is used to affect the prediction results. The dot() function in the formula means vector multiplication of wand x. For example: If an input data has ifeatures, the result of substituting into the formula is:

How to understand this formula? Suppose you need to decide whether to go boating in the park on the weekend. You are hesitant about this and you need a neural network to help you make the decision. There are three factors when deciding whether to go boating: whether the weather is sunny and warm, whether the location is conveniently located, and whether the company you are traveling with is suitable. The actual situation is that the weather on the day of the trip was cloudy with occasional gusts of wind, the location was in a remote suburb 20km away, and my playmate was a handsome guy I had long admired. These three factors are the feature vectors x=[x1, x2, x3] of the input data. We need to set the feature values ​​according to the impact of the features on the results. For example, "bad weather" and "remote location" have a negative impact on the results. influence, we can set it to -1. "The playmate is the handsome guy that I have long admired" obviously has a great positive impact on the result. We can set it to 1, that is, the feature vector x=[-1, -1, 1]. Next, you need to set the weights of the three features according to your preferences, that is, the degree to which each factor affects your final decision. If you don't care about the weather and location, as long as you walk with a handsome guy, it will be rain or shine, then you can set the weight to w=[1, 1, 5]; if you are a lazy dog, then you may set the weight to w=[2, 6, 3]; in short, the weight is determined based on the importance of the corresponding feature.

We select the first set of weights w=[1, 1, 5], the feature vector is x=[-1, -1, 1], and set the threshold b=1. Assume that the result z ≥ 0 means go, and z < 0 means If you don’t go, calculate the prediction result z = (x1*w1 + x2*w2 + x3*w3) + b = 4 > 0, so the prediction result given by the neural network is: go boating in the park.

The formula used above

 

Essentially a logistic regression used to map input data to a binary classification probability output. Logistic regression usually uses a specific activation function to implement the mapping relationship from zvalue to [0, 1], that is, the Sigmoid function, which converts the results of linear transformation into probability values ​​through nonlinear mapping. Generally, probability values ​​greater than or equal to 0.5 are considered positive classes, and probability values ​​less than 0.5 are considered negative classes.

The formula and image of the Sigmoid function are as follows:

 

In addition to controlling the result output range between 0 and 1, another important role of the Sigmoid function (or other activation function) is to perform nonlinear mapping on the results of linear transformation, so that the neural network can learn and represent more complex nonlinear relationships. . Without the activation function, the neural network can only solve simple linear problems; after adding the activation function, as long as there are enough layers, the neural network can solve all problems, so the activation function is essential.

How neural networks learn

After getting the prediction result, the neural network will use the loss function to judge whether the prediction result is accurate. If it is not accurate enough, the neural network will adjust itself. This is the learning process.

The loss function measures the error between the model's predictions and the true labels. By comparing predicted values ​​to true values, the loss function provides a numerical indicator that reflects the model's current predictive performance. A smaller loss value indicates that the model's prediction results are closer to the true label, while a larger loss value indicates a larger prediction error. The following introduces a loss function (logarithmic loss) commonly used in binary classification problems:

The purpose of neural network learning is to adjust the parameters of the model to minimize the loss function, thereby improving the prediction performance of the model. This process is also called model training. The gradient descent algorithm can solve this problem. Through this algorithm, the appropriate w(weight of the feature) and b (threshold) are found. The gradient descent algorithm will change the values ​​​​of w and b step by step , so that the result of the loss function becomes more and more The smaller it is, the more accurate the prediction results are.

What needs to be noted here is that if the learning rate is set too small, it will take multiple gradient descents to reach the lowest point, which is a waste of machine running resources; if it is set too large, the lowest point may be missed and directly reached the point on the left side of the figure, so A correct learning rate needs to be chosen based on the actual situation.

There are two main steps in the calculation process of neural networks: forward propagation and back propagation. Forward propagation is used to calculate the output of neurons, which is the above-mentioned process of weighted summation of input features and nonlinear transformation through activation functions; back propagation is used to update the optimization model parameters by calculating the loss function about the model parameters. The process of backpropagating the gradient from the output layer to the input layer (backpropagation involves a lot of mathematical calculations, interested readers can learn more about it).

4. Overview

To sum up, the process of neural network training and learning is actually the process of continuously optimizing model parameters and reducing prediction loss values. After sufficient training, the model is able to learn effective feature representations and weight assignments from the input data, allowing it to make accurate predictions on unseen data. The trained neural network model can be applied to various practical problems. For example, in image classification tasks, convolutional neural networks can automatically identify objects or patterns based on the characteristics of input images; in natural language processing tasks, recurrent neural networks can understand and generate text; in recommendation systems, multi-layer perceptron neural networks can The network can make personalized recommendations based on the user's historical behavior.

This article provides a shallow explanation of the working mechanism of neural networks. If there is anything wrong, please let me know!

5. Quote

https://cloud.tencent.com/developer/article/1384762

http://zhangtielei.com/posts/blog-nn-visualization.html

https://blog.csdn.net/pingchangxin_6/article/details/79754384

ps: Some definitions of professional terms come from GPT-3.5-turbo

Author: JD Retail Ouyang Zhouyu

Source: JD Cloud Developer Community Please indicate the source when reprinting

IntelliJ IDEA 2023.3 & JetBrains Family Bucket annual major version update new concept "defensive programming": make yourself a stable job GitHub.com runs more than 1,200 MySQL hosts, how to seamlessly upgrade to 8.0? Stephen Chow's Web3 team will launch an independent App next month. Will Firefox be eliminated? Visual Studio Code 1.85 released, floating window Yu Chengdong: Huawei will launch disruptive products next year and rewrite the history of the industry. The US CISA recommends abandoning C/C++ to eliminate memory security vulnerabilities. TIOBE December: C# is expected to become the programming language of the year. A paper written by Lei Jun 30 years ago : "Principle and Design of Computer Virus Determination Expert System"
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10320660
Recommended