gradient descent

       Gradient descent is an iterative method that can be used to solve least squares problems (both linear and nonlinear). Gradient descent is one of the most commonly used methods when solving the model parameters of machine learning algorithms, that is, unconstrained optimization problems. Another commonly used method is the least squares method. When solving the minimum value of the loss function, the gradient descent method can be used to iteratively solve the problem step by step to obtain the minimized loss function and model parameter values. Conversely, if we need to find the maximum value of the loss function, then we need to use gradient ascent to iterate. In machine learning, two gradient descent methods have been developed based on the basic gradient descent method, namely stochastic gradient descent and batch gradient descent.

Introduction

      Gradient : For Differentiable Quantity Fields  

 , the vector field of the components is called the gradient or slope of f.  [1] Gradient descent is an optimization algorithm commonly used in machine learning and artificial intelligence to recursively approximate the minimum deviation model.

 

Solving process

 

As the name implies, the calculation process of the gradient descent method is to solve the minimum value in the direction of gradient descent (you can also solve the maximum value in the direction of gradient ascent). Its iteration formula is
   
, which   represents the negative gradient direction,
   
Indicates the search step size in the gradient direction. The gradient direction can be obtained by derivation of the function. It is more troublesome to determine the step size. If it is too large, it may diverge, and if it is too small, the convergence speed is too slow. The general method of determining the step size is determined by a linear search algorithm, that is, the coordinates of the next point are regarded as a function of ak+1, and then the minimum value that satisfies f(ak+1) can be obtained.
Because in general, if the gradient vector is 0, it means that it has reached an extreme point, and the amplitude of the gradient is also 0. When the gradient descent algorithm is used for the optimization solution, the termination condition of the algorithm iteration is the amplitude of the gradient vector. Close to 0, you can set a very small constant threshold.

E.g

Take a very simple example, such as finding a function

   

the minimum value of . The steps to solve the problem using the gradient descent method are as follows:

1. Find the gradient,
 
 
2. Move in the opposite direction of the gradient  
, as follows   , where,
 
is the step length. If the step size is small enough, it is guaranteed to decrease with each iteration, but may result in too slow convergence, and if the step size is too large, it is not guaranteed to decrease with each iteration, nor is it guaranteed to converge.
 
3. The loop iterates step 2 until   the value changes to such that
  
The difference between the two iterations is small enough, say 0.00000001, that is, until the
  
There is basically no change, it means that at this time
  
A local minimum has been reached.
 
4. At this time, output    , this
  
is to make the function
  
the smallest
  
value of .
MATLAB is as follows.

%% Steepest descent method icon
% Set the step size to 0.1, f_change is the change of the y value before and after the change, and only one exit condition is set.
syms x;f=x^2;
step=0.1;x=2;k=0; % set step size, initial value, number of iterative records
f_change=x^2; % initialized difference
f_current=x^2; % calculation Current function value
ezplot(@(x,f)fx.^2) % Draw the function image
axis([-2,2,-0.2,3]) %Fixed coordinate axis
hold on
while f_change>0.000000001 %Set the condition, two If the difference between the calculated values ​​is less than a certain number, jump out of the loop
x=x-step*2*x; %-2*x is the reverse direction of the gradient, step is the step size, ! The fastest descent method!
f_change = f_current - x^2 % Calculate the difference between two function values
​​f_current = x^2; % Recalculate the current function value
plot(x,f_current,'ro','markersize',7) %Mark the current position
drawnow ;pause(0.2);
k=k+1;
end
hold off
fprintf('After iterating %d times, the minimum value of the function is found to be %e, and the corresponding x value is %e\n',k,x^2,x )

 

 

 

defect

  • Convergence slows down near the minimum.
  • Some problems may arise when searching in a straight line.
  • There may be a "zigzag" drop.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326630268&siteId=291194637