In stochastic gradient descent (SGD), why is the negative direction of the gradient the fastest direction of the function?

To prove that the negative direction of the gradient is the fastest descending direction of the function,

It is to prove that the positive direction of the gradient is the fastest rising direction of the function.

prove:

Assuming a vector x, there is a function f(x), we want f(x), tends to the smallest,

Suppose a random direction l, note that the dimensions of l and x are the same

If the function falls or rises along the direction l (because we don’t know whether it rises or falls along the l direction)

Then get the function: f(x+l)

Perform a first-order Taylor expansion of f(x+l) to obtain the following formula:

Then f(x+l)-f(x) is the change in the value of the function along the direction l.

That is to say, if f(x+l)-f(x)> 0, it is rising along the direction l; if f(x+l)-f(x) <0, it is along the direction l Falling

Back to our ultimate question: Why is the positive direction of the gradient the fastest rising direction of the function?

After we see f(x+l)-f(x), the right side of the equation is:

Consider the following situation: When the independent variable changes particularly small

Is negligible

Then the remaining formula is:

So our ultimate question (why is the positive direction of the gradient the fastest direction of the function's rise?) is how to maximize the above formula.

Since the above formula is a dot product, when the two vectors are in the same direction, the above formula is the largest, that is to say, this is the direction in which the function rises fastest. On the contrary, if the two vectors are in opposite directions, the above formula is the smallest, that is to say, this is the direction in which the function drops fastest.

------------------------------end--------------------------

 

 

Guess you like

Origin blog.csdn.net/qq_28057379/article/details/105178156
Recommended