Deep Learning-Forward Propagation & Backpropagation

forward propagation

 

For neurons, the goal of training is to confirm the optimal w and b, so that the error between the output value and the true value is minimized.

The data is processed layer by layer from input to output, and a predicted value y is output at the output layer.

(Understanding: forward propagation, multiple input layers—》Hidden layer to calculate weight w and bias b. Hidden layer—》Output layer.

Finally, the output layer data is obtained and compared with the real value.

Multiple inputs are forward propagated through their respective weights to obtain the predicted value y , and then the weight w and offset b are updated through back propagation to obtain the minimum error between the predicted value and the true value . This step is simultaneously optimizing their respective weight w and bias b .

The neural network can continuously approach the real value by using multiple sets of data. After the closest weight w and bias b are obtained through the gradient, when the next test picture arrives, the latest weight w and bias can be used. Set b to predict the input image for testing. )

Backpropagation

After weight initialization and forward propagation, all parameters and operation results in the network can be obtained. Then we need to update the weight of the model based on the Loss loss error. This process is called back propagation.

 We get the output value as [0.75136079, 0.772928465] and the actual value as [0.01, 0.99]

1. Calculate the total error

 But there are two outputs, so the errors of o1 and o2 are calculated separately, and the total error is the sum of the two:

2. Hidden layer---->weight update of output layer:

Taking the weight parameter w5 as an example, if we want to know how much impact w5 has on the overall error, we can use the overall error to take the partial derivative of w5 to find: (chain rule)

 

Now let's calculate the value of each equation separately:

 

 

 

 (This step is actually to derive the sigmoid function, which is relatively simple and you can derive it yourself)

 

 

The last three are multiplied together: 

In this way, we calculate the partial derivative of the overall error E(total) with respect to w5.

Looking back at the above formula, we find:

For convenience of expression, it is used to represent the error of the output layer:

 

Therefore, the partial derivative formula of the overall error E (total) with respect to w5 can be written as: 

 

If the output layer error meter is negative, it can also be written as:

 

Finally we update the value of w5:

 

(where, is the learning rate, here we take 0.5)

3. Hidden layer ----> weight update of hidden layer:

The method is actually similar to the one mentioned above, but there is one thing that needs to be changed. When calculating the partial derivative of the total error to w5 above, it is from out(o1)---->net(o1)---->w5 , but when the weights between hidden layers are updated, it is out(h1)---->net(h1)---->w1, and out(h1) will accept E(o1) and E(o2 ) The error comes from two places, so both of them must be calculated in this place.

 

 

 

 

 In this way, the error backpropagation method is completed. Finally, we recalculate the updated weights and iterate continuously. In this example, after the first iteration, the total error E (total) dropped from 0.298371109 to 0.291027924. After 10,000 iterations, the total error is 0.000035085, and the output is 0.015912196,0.984065734 , which proves that the effect is good.

 

おすすめ

転載: blog.csdn.net/weixin_43852823/article/details/127561941