Gradient Descent and Backpropagation Algorithms in Machine Learning

This article does not explain the specific process of gradient descent and backpropagation algorithms in detail, but will cite correspondingly better blogs for reference.

This paper mainly provides a conceptual explanation and comparison of the two.

 

First explain gradient descent: For the specific process, please refer to this blog https://www.cnblogs.com/pinard/p/5970503.html

1. Gradient descent is a method of finding the maximum value of a function, which is used to find the minimum value of the loss function in machine learning.

This can be compared to the process of finding the maximal value in a one-variable function. Assuming that a certain meta-function f(x) is derivable, when we solve the maximal value, we can first find the point (extreme point) where f'(x)=0, and then Find the maximum value among the extreme points.

The idea of ​​gradient descent is also the same, but instead of directly finding the full derivative of the multivariate function, we arbitrarily select a point on the function to find the gradient of this point (the directional derivative with the fastest descending function), where the gradient of The magnitude is the rate of change of the function, and the direction of the gradient is the direction with the greatest rate of change of all directions at this point. Then we move the starting point a distance in this direction, so that the value of the function drops, and the final result is that when the gradient is 0 (similar to the derivative being 0), we find a local minimum (similar to an extreme point) .

Machine learning uses gradient descent because our ultimate goal is to compute those Θs that minimize the loss function J(Θ), so it becomes a function-minimization problem.

 

Backpropagation algorithm: For the specific process, please refer to https://blog.csdn.net/mao_xiao_feng/article/details/53048213 , for the detailed derivation process, please refer to https://blog.csdn.net/u014313009/article/details/51039334 and http://briandolhansky.com/blog/2013/9/27/artificial-neural-networks-backpropagation-part-4

The backpropagation algorithm is an algorithm for finding the minimum value of the loss function in deep learning. The method used is still gradient descent.

1. So why is it called backpropagation by a separate name?

This is because when solving gradients in a multi-layer network of deep learning, the chain rule for solving gradients is used. In fact, both forward propagation (calculating gradients from input to output) and backward propagation (calculating gradients from output to input) can be used. The loss function is optimized by calculating the gradient using the chain rule.

The reason why the backpropagation algorithm is used is that the time complexity of the backpropagation algorithm is smaller than that of the forward propagation, which is very important for training large-scale neural networks.

 

2. Backpropagation also has a name called error backpropagation algorithm. What does the error mean here?

In fact, I think the error here is more appropriate to call the error rate. First, let's look at the definition of error (rate):

Define the partial derivative of the loss function for the intermediate value z of the jth neuron in the lth layer as the error, note that z l =Θa l-1 here , not the value after the activation function is activated.

This is actually just a symbol defined for convenience of representation and recursive representation in the process of backpropagation calculation. The intuitive understanding is that the error (rate) of the first layer determines the generation of the intermediate value z' of this layer and the real value z How much will it affect the final error C when it is biased.

When we have this definition, we can succinctly represent the backpropagation algorithm.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324672513&siteId=291194637