06-- gradient descent

Introduction

Use solving the above mentioned way to obtain the minimum value of the partial derivative of the objective function, but not all of the objective function can be obtained by taking the partial derivatives minimum manner, this time need gradient descent method mentioned here ; machine to machine learning routines is a bunch of data, and then tell what kind of learning is correct (objective function), then let the machine do it in this direction
here mentioned gradient, in fact, to understand that the partial derivatives, when given a target function, and given a starting value, this time will move towards its tangential way is the fastest (fastest will be able to identify with the results faster moving out to solve)

Gradient descent operation flow

1, to find the most appropriate direction
2, then take a small step, go fast may get less than the minimum
3, in accordance with the direction and pace parameters to update
Here Insert Picture Description
the learning rate (step): will have a huge impact on the results, general smaller; the choice of 0.01 to start, and if there is no convergence, you can then adjust the learning rate as small as 0.001 (experience) under normal circumstances
how to choose: From childhood, no longer small
batch data: 32,64,128 can, Very often have to consider and memory efficiency (now generally choose 64)
above talked about the learning rate, in fact, the larger point is that the way of handling is now at the beginning of the learning rate adjustment, this can be calculated quickly, after slowly lower the learning rate, that could ensure a convergence final
Here Insert Picture Description

Guess you like

Origin blog.csdn.net/Escid/article/details/90710283