Gradient descent algorithm and its theoretical foundation

Gradient descent method, also known steepest descent method, is the most commonly used method for unconstrained optimization problem, often used when the loss function is minimized. Gradient descent is an iterative algorithm. Select the appropriate initial value x (0) , continue the iteration, updated values of x, for the objective function minimization until convergence. Since the negative gradient direction that the function value decreases fastest direction, at each iteration, the negative gradient direction of the x update value, thus to decrease the function value. Mentioned gradient descent method, we have to mention the gradient and directional derivative.

1. direction 导数

Provided the function z = f (x, y) is defined within a neighborhood U at point p (x, y) of (P), the point P from the primer rays l. Let x-axis positive angle of the ray l, φ, and let P '(x + [Delta] x, Y + Ay) is another point on l and P' are the o (P) domain of the U- .

 

 

 

If considered , if this limit exists, this limit is called the function f (x, y) at the point P in the direction of the directional derivatives of l, as referred to , i.e.,

 


2. The relationship between the derivative and partial derivative direction

Theorem : If the function z = f (x, y) at point p (x, y) is differentiable, then the function at that point in either direction l directional derivatives are present, and have , where φ is the x-axis to l corner direction.

 

 

Brief proof:

Thus the theorem to a higher-dimensional function, for example ary function u = f (x, y, z), which is defined in the specified direction (direction is provided in a space point P (x, y, z) of derivative direction angle direction α, β, γ) as follows

 ,

 

 

Therefore,

 

 

 3. gradient

Provided the function z = f (x, y) has a continuous first order partial derivative within the plane region D, then for any point p (x, y) in the region D in either direction and l, there

 

Wherein the vector is called a function f (x, y) of the point P in the gradient, denoted Grad f (x, y).

 

 

4. gradient and directional derivative

Provided the same direction of the unit vector l, the

 

 5. Why gradient direction is a function of the value of the fastest growing direction? (Negative gradient direction is a function of the value of the fastest decline direction?)

When the maximum directional derivative obtain it? That is equal to 1, i.e., the unit vector when the gradient vector with the same direction, the direction of maximum derivative, i.e. a unit step function value in this direction changing fast. Similarly when reversing the direction of the gradient understood, the function value decreases fastest.

 

Conclusion: The function of a point in the gradient is a vector whose direction coincides with the direction of orientation for maximum derivative, and its modulus maximum value directional derivative. So, yes, along the direction of the gradient, directional derivative is positive, the function of the direction of change is increasing the fastest in this direction.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/lovewhale1997/p/11594471.html