Evolution Pytorch-- neural network training methods

1 Introduction

Today we would like to talk about in how to accelerate your neural network training process.
Common are the following:

1、Stochastic Gradient Descent (SGD)
2、Momentum
3、AdaGrad
4、RMSProp
5、Adam

The more complex neural networks, more and more data, we need to spend time in the process of training the neural network of the more simple reason is because the calculation is too big. But often times in order to solve complex problems, complex data structures and large it can not be avoided, so we need to find some way to let the neural network wise up, get up.

2.SGD (stochastic gradient descent)

Therefore, the most basic way is to SGD, to a set of data, if ordinary training methods, you need to constantly repeat the whole set of data into a neural network NN training, so consume computing resources will be great.

We put it another thought, if these data to split into small groups of small batches, and then continue into batches NN in computing, this is what we often say SGD right way to open up each time using batch data, although the situation does not reflect the overall data, but it is largely accelerated the NN training process, but also without losing much accuracy. If used on the SGD, you still too slow training, how to do?

No problem, it turns out, SGD is not the fastest method of training, but it's time to learn to reach the goal of these methods is the longest one. We have many other ways to speed up training

3.Momentum (momentum)

Here Insert Picture Description
Most of the other ways in the renewal of neural network parameters that step on moving hands and feet. The traditional parameter W is to update the original W tired plus a negative rate of learning (learning rate) multiplied by the correction value (dx). This method may make the learning process very tortuous, when people come home looks like a drunk, he staggered take a lot of detours.
Here Insert Picture Description
so we put this man on a ramp from the ground, as long as he walked towards a downhill direction little, due to the downward inertia, he has not consciously go down, take the detour becomes less and less. this is the Momentum parameter update.

4.AdaGrad

Here Insert Picture Description
This approach is hands-on learning at the feet of the top, so that each parameter update will have their own unique learning rate, momentum and his role is similar, but not for people drunk schedule another downhill, but to give him one pair of good walking shoes, making him a shake to walk on foot pain, shoes became a resistance detours, forcing him to walk straight ahead. his mathematical form is this.

5.RMSProp

If the downhill and good walking shoes combined, is not it better? Yes, we have a RMSProp update method.
Here Insert Picture Description
With the momentum of the principle of inertia, coupled with adagrad resistance to the wrong direction, we can be incorporated into this. let RMSProp they also have the advantage of both methods, but careful classmates certainly saw it, it seems less of what RMSProp in. we have not turned out to be the complete merger Momentum, RMSProp also the lack of momentum in this part. so, we have Adam methods make up this idea

6.Adam

Here Insert Picture Description
Momentum is calculated when there m downhill properties, has resistance properties adagrad calculating v, and then again when the update parameters m and V are taken into account. Experimental results show that most of the time, the use of faster and better able to reach the target adam rapid convergence. so, when accelerating neural network training, a downhill, a pair of broken shoes, contributed.

Published 134 original articles · won praise 366 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_37763870/article/details/104844089