Gradient descent is an iterative method that can be used to solve least squares problems (both linear and nonlinear). Gradient descent is one of the most commonly used methods when solving the model parameters of machine learning algorithms, that is, unconstrained optimization problems. Another commonly used method is the least squares method. When solving the minimum value of the loss function, the gradient descent method can be used to iteratively solve the problem step by step to obtain the minimized loss function and model parameter values. Conversely, if we need to find the maximum value of the loss function, then we need to use gradient ascent to iterate. In machine learning, two gradient descent methods have been developed based on the basic gradient descent method, namely stochastic gradient descent and batch gradient descent.
Introduction
Gradient : For Differentiable Quantity Fields
, the vector field of the components is called the gradient or slope of f. [1] Gradient descent is an optimization algorithm commonly used in machine learning and artificial intelligence to recursively approximate the minimum deviation model.
Solving process
E.g
Take a very simple example, such as finding a function
the minimum value of . The steps to solve the problem using the gradient descent method are as follows:
, as follows , where,
%% Steepest descent method icon
% Set the step size to 0.1, f_change is the change of the y value before and after the change, and only one exit condition is set.
syms x;f=x^2;
step=0.1;x=2;k=0; % set step size, initial value, number of iterative records
f_change=x^2; % initialized difference
f_current=x^2; % calculation Current function value
ezplot(@(x,f)fx.^2) % Draw the function image
axis([-2,2,-0.2,3]) %Fixed coordinate axis
hold on
while f_change>0.000000001 %Set the condition, two If the difference between the calculated values is less than a certain number, jump out of the loop
x=x-step*2*x; %-2*x is the reverse direction of the gradient, step is the step size, ! The fastest descent method!
f_change = f_current - x^2 % Calculate the difference between two function values
f_current = x^2; % Recalculate the current function value
plot(x,f_current,'ro','markersize',7) %Mark the current position
drawnow ;pause(0.2);
k=k+1;
end
hold off
fprintf('After iterating %d times, the minimum value of the function is found to be %e, and the corresponding x value is %e\n',k,x^2,x )
defect
-
Convergence slows down near the minimum.
-
Some problems may arise when searching in a straight line.
-
There may be a "zigzag" drop.