Python deep learning 027: what is gradient, gradient disappearance, gradient explosion and how to solve it

1. The concept of gradient

insert image description here

In machine learning, gradient refers to the rate of change of a multivariate function at a certain point and the direction of change.

For a parameterized function, the gradient tells us the direction in which the value of the function increases most rapidly at a particular point.

Gradients are especially important in deep learning because we typically use gradient descent algorithms to update the parameters of neural networks. At this point, the gradient is used to calculate the direction and magnitude that each parameter should move, given the loss function and the current parameter values, in order to minimize the loss function.

A gradient is a vector and thus has a direction and a magnitude. We can use partial derivatives to compute each component of the gradient vector, thus determining how each parameter changes in the direction of the gradient. If the gradient is positive, then the direction of movement in parameter space is increase, otherwise it is decrease. The magnitude of the gradient indicates how far moving the parameters in the direction of the gradient can cause the loss function to change.

In summary, gradients can tell us the steepest rates of change in different directions, allowing us to optimize function values ​​and update parameters.

Guess you like

Origin blog.csdn.net/PoGeN1/article/details/131294196