Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner

1

Sum up

First, let’s take a look at the AdaGrad algorithm
Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner

We can see that the difference between the optimization algorithm and the ordinary sgd algorithm lies in which part of the yellow color, the cumulative square gradient is adopted.

Simply put, after setting the global learning rate, for each pass, the global learning rate is divided by the square root of the square sum of the historical gradients parameter by parameter, so that the learning rate of each parameter is different

2

effect

So what is its role?

The effect is that greater progress will be made in the more gentle direction of the parameter space (because it is gentle, the sum of squares of the historical gradient is smaller, and the magnitude of the corresponding learning decline is smaller), and it can make the steep direction smooth , Thereby speeding up training.

Let's explain it through an example:
Assume that the optimization algorithm we are now using is the most common gradient descent method mini-batch. Its moving direction is shown in blue below:
Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner

Assuming that we now only have two parameters w and b, we can see from the figure that the direction of b is relatively steep, which affects the optimization speed.

After we adopted the AdaGrad algorithm, we used the cumulative square gradient r=:r + gg in the algorithm.

It can be seen from the figure above that the gradient g in the b direction is greater than the gradient in the w direction.

Then in the next calculation update, r appears as the denominator. The larger the value, the smaller the update, and the smaller the value, the larger the update. Then the subsequent update will be like the green line update below, which will obviously be better than Blue update curve.
Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner

In the more gentle direction of the parameter space, greater progress will be made (because it is gentle, the sum of squares of historical gradients is smaller, and the magnitude of the corresponding learning decline is smaller), and the steep direction can be smoothed, thereby speeding up training .

This is the intuitive benefit of the AdaGrad optimization algorithm.

Reference: AdaGrad's Deep Learning optimization method of YBB's
DeepLearning.ai course slides

Recommended reading:

Featured Dry Goods|Summary
Dry Goods in the Dry Goods Catalog for the Last Half Year|Wu Enda DeepLearning.ai Course Refinement Notes (1-2) Neural Network and Deep Learning--- Neural Network Basic
Dry Goods|Wu Enda DeepLearning.ai Course Refinement Notes (1-3) Neural Network And deep learning --- shallow neural network

           欢迎关注公众号学习交流~          

Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner
Welcome to join the exchange group sex
Dry goods|AdaGrad that explains the optimization method of Deep Learning in an easy-to-understand manner

Guess you like

Origin blog.51cto.com/15009309/2553807