Detailed explanation of the principle of neural network classification algorithm

Table of contents

Detailed explanation of the principle of neural network classification algorithm

Neural Network Workflow

backpropagation algorithm

1) The principle of backpropagation

2) Application example

Summarize


Forward propagation

(forward-propagation): Refers to the order of the neural network along the input layer to the output layer, and sequentially calculate and store the intermediate variables of the model.

backpropagation

(Back-propagation): Along the order from the output layer to the input layer, according to the chain rule, sequentially calculate and store the gradient of the intermediate variables and parameters of the objective function related to each layer of the neural network . Backpropagation is a method for computing gradients of neural network parameters

Detailed explanation of the principle of neural network classification algorithm

Before the neural network algorithm became popular, the most concerned algorithm in the field of machine learning was the " Support Vector Machine Algorithm (SVM algorithm)". Now that the neural network is in the ascendant, you may be curious. Similarly, why stack so many neural network layers? Just like why the single-layer perceptron model can't solve the XOR problem, but can it be solved by adding a hidden layer? Who in the end endowed the neural network with such wonderful magic power.

Generally speaking, the more layers of the neural network, the stronger the learning ability of the network model, and the better it can fit complex data distribution. But this is only an ideal state, because as the network deepens, other problems will also arise, such as the difficulty of calculation will also increase, and the model will be more difficult to understand. Therefore, choosing the appropriate number of network layers to solve suitable scenarios is a difficult point in neural network algorithms.

Neural Network Workflow

Let's use a simple example to understand how a neural network works:
 

Neural Network Workflow


Figure 1: Artificial neural network model


As shown in Figure 1, A, B, C, and D are four blind people who want to play the game "Blind Man Touching the Elephant". In the dataset there are the following four animals: elephant, wild boar, rhino, elk. Among the four people, A, B, and C are responsible for touching animals (that is, animal characteristics), and D is responsible for summarizing and analyzing the information sent to him by A, B, and C. At the same time, someone will tell D what animal they touched this round. In addition, it is stipulated that only when A, B, and C touch the following three characteristics, they will report to D:

Feature 1: Like a pillar (leg) 
Feature 2: Like a cattail fan (ear) 
Feature 3: Like a whip (tail)

Note that the game is played under ideal conditions, without considering other external factors. Next, follow the supervised learning process, first train and then predict. The process of touching animals is actually the process of obtaining the characteristics of animal parts. Because there are 4 animals, it needs to be polled 4 times here. The following is the information summarized by D after the four rounds are completed, as follows:

The first time, an elephant: 
A: like a pillar (leg) 
B: like a fan (ear) 
C: like a whip (tail) 
The second time, a wild boar: 
B: like a fan 
C: like a Whip 
The third time, rhinoceros: 
A: Like a cattail fan 
C: Like a whip 
The fourth time, elk: 
C: Like a whip

Through the analysis of the above summary information, D believes that what C reports is the least valuable (that is, the weight is small), because whether it is an elephant or not, what he reports is the same. D believes that, in comparison, the reports of A and B are more valuable (heavy weight), but there will be mistakes in their respective reports. After research on D, it is found that as long as the information of A and B is aggregated, when the two people touch the [pillar and cattail fan] at the same time, the animal touched is an elephant, so that even the blind can touch the elephant through solidarity. Come.

For the above example, A/B/C/D actually constitutes a simple neural network model, they are equivalent to four neurons, and A/B/C is responsible for "touching", that is, going back to different dimensions. The input data constitutes the input layer of the neural network. When the three of them get the data, they will tell D, and through D's summary analysis, they will give the final prediction result, that is, judge whether it is an elephant or not, which is equivalent to the output layer of the neural network. Neural networks can aggregate scattered information to extract the most valuable and authoritative information . If you only pick out an independent node in the network, it is a partial generalization. For example, C thinks that all those with tails like whips are elephants, which is obviously unreasonable. The neural network distinguishes the importance of different information by assigning different weight values ​​to

the input information . In the process of model training, by adjusting the corresponding weight of the linear function, increasing the input weight of valuable information and reducing the input weight of other value information, this is the core idea of ​​[tuning the weight], through the above method can Improve the prediction accuracy of network model prediction.

The more the number of neuron nodes and the number of layers, the stronger the expressive ability of the neural network, or the stronger the ability to fit data. This is why the neural network algorithm is suitable for processing images compared with other machine learning algorithms. The root cause of complex tasks such as recognition, speech recognition, etc.

backpropagation algorithm

There are two important components in the neural network model, namely: the activation function and the backpropagation BP algorithm.

We know that the artificial neural network is composed of neuron nodes, and the role of these nodes is to receive and transmit information. Like neurons in the brain, they receive external stimuli and transmit excitatory signals.

In an artificial neural network model, starting from the input layer, passing to the output layer, and finally returning the result, this signal propagation method is called "forward propagation" (or forward operation, forward propagation). In the neural network model, if the input is passed layer by layer until the output layer produces output, the forward propagation is over.

Backpropagation is similar to forward propagation, but because the propagation direction is opposite, it is called the backpropagation algorithm (referred to as BP algorithm). This algorithm first appeared in the 1960s, but it did not attract attention until 1986. After re-describing by Hinton et al., it entered the public's field of vision again. This algorithm successfully solves the problem of [weight parameter] calculation of a few-layer neural network.
 

Forward Operation and Backpropagation


Figure 2: Schematic diagram of forward operation and backpropagation

1) The principle of backpropagation

The backpropagation algorithm (BP) is a supervised learning algorithm , that is, it learns through labeled training data, and it is one of the common methods for training artificial neural network models. Simply put, the BP algorithm is to learn from mistakes until the error is minimized, thereby improving the reliability of the model.

The learning process of BP algorithm consists of two parts: forward propagation process and back propagation process.

In the process of forward propagation, the input information passes through the hidden layer through the input layer, processed layer by layer and transmitted to the output layer. If there is an error between the output value and the marked value,

Then the error is propagated from the output layer to the input layer through the hidden layer (that is, backpropagation), and in this process, the gradient descent algorithm is used to optimize the weight parameters of the neurons. When the error reaches the minimum, the network model training The end, that is, the end of backpropagation.

The flowchart is as follows:
 

backpropagation algorithm


Figure 3: Neural Network Model Training


Summarize the above process: the input layer accepts an input data x, and initializes a weight parameter ω at the same time. After calculation by the hidden layer, the output layer outputs the result, and the forward operation is completed. After that, compare the result of the output layer with the marked value to obtain the deviation value, and propagate the deviation value from the output layer to the input layer (back propagation stage). In this stage, the gradient descent algorithm is used to repeatedly optimize the weight parameters. When When the deviation value is the smallest, a set of optimal weight parameters (ω) is obtained.

2) Application example

The existing neural network model is as follows, which consists of three layers, namely the input layer, hidden layer, and output layer, and uses the Sigmoid function as the activation function of the neural network. Let's take a look at how the backpropagation algorithm works and how to achieve parameter tuning.
 

Backpropagation case


Figure 4: Neural Network Model


First, make a brief description of the data of the network model:

Input layer: i1=0.05, i2 = 0.1 

Initialization weight parameters: w1=0.15, w2=0.2, w3=0.25, w4=0.3, w5=0.4, w6=0.45, w7=0.5, w8=0.55 output layer mark 

value (i.e. expected value): o1=0.01, o2=0.99 

Bias item weight value: b1=0.35, b2=0.6

Next, the backpropagation algorithm is used to make the real output and the marked value as close as possible, that is, the deviation between the real value and the marked value is the smallest. We calculate step by step according to the above process.

Forward operation stage : input layer --> hidden layer --> output layer, calculate the weight sum of neuron H1:
 

sum of weights


Putting the initialization data into the above formula, the following results can be obtained:
 


The output result of H1 of the hidden layer neuron, note that this result needs to be mapped by the activation function:
 

Activation function mapping results


Similarly, the output of the H2 neuron can be calculated according to the above method, and the results are as follows:
 


The real output result of the output layer O1 is calculated as follows:
 


Bring the data into the above formula, and through the activation function mapping, the result of the output layer O1 is obtained:
 


In the same way, the real output of O2 can be calculated according to the above method:
 

Output layer output result


Through the above calculation, it can be concluded that the actual output result is far from the marked value, the calculation result is [0.75136507,0.772928465], but the actual marked value is [0.01,0.99]. Next, use the backpropagation algorithm to iteratively update the weights and recalculate the output.


In the backpropagation stage , output layer --> hidden layer --> input layer, first use the MSE mean square error formula to calculate the total error:
 

Note: MSE is a more convenient method to measure the "average error". MSE can evaluate the degree of data change. The smaller the value of MSE, the better the generalization ability of the prediction model.

The total error (Etotal) is obtained through the above calculation, and this value is "commonly" composed of all nodes in the neural network. Therefore, it is necessary to calculate how much deviation value each neuron node "contributes". This is the core problem to be solved by the backpropagation algorithm. Of course, the solution is also very simple, that is, to find the partial derivative, such as how much loss the A node contributes value, we just calculate the partial derivative of this node.

We take w5 as an example to adjust it. To know how much influence w5 has on the overall error, here we use the chain rule to find the partial derivative, as shown below:
 

chain rule


To obtain the partial derivative of w5, it is necessary to obtain partial derivatives for the other three parts, as follows:
 

find partial derivatives


The partial derivative of w5 can be obtained by multiplying the results of the above three parts, and the resulting value is 0.082167041. Finally, the w5 parameter value is updated using the gradient descent algorithm as follows:
 

Update weights

Note: η is the learning rate in the gradient descent algorithm, and the value here is 0.5. It has been introduced in the previous explanation of the gradient descent algorithm. Refer to " Gradient Descent for Extremum ".

In this way, the w5 weight update is completed. Similarly, the w6, w7, and w8 updates can be completed according to the above method.

The above process only propagates from the output layer to the hidden layer. When the updated weight is calculated, it starts to propagate from the hidden layer to the input layer, and updates w1, w2, w3, and w4 in sequence, thus completing the first round of weight update. . It can be seen that after the first round of weight update, the total error decreased from 0.298371109 to 0.291027924. After 10,000 iterations, the total error is 0.000035085, and the output value is [0.015912196, 0.984065734], which is very close to the expected value [0.01, 0.99] .

Summarize

The neural network classification algorithm is a supervised learning algorithm. Using the neural network classification algorithm generally requires the following five steps:

  • Initialize the weights of all neuron nodes in the neural network;
  • The input layer receives input and generates output through forward propagation ;
  • According to the predicted value of the output, the deviation is calculated in combination with the actual value;
  • The output layer receives the deviation, and allows all neurons to update the weights through the backpropagation mechanism (reverse inversion);
  • From step 2 to step 4 is a complete process of training the model, and this process is repeated until the deviation value is minimized.


The neural network algorithm allows all neurons to realize the weight update through the back propagation mechanism. When we iterate the above training process until the deviation value is the smallest, we will finally get an optimal network model, which realizes the best simulation of the data. combine.

Guess you like

Origin blog.csdn.net/qq_38998213/article/details/132297022