Introduction : When in solving unconstrained problem, a class of algorithms is often used in the steepest descent method, in machine learning algorithms for solving the model parameters, that is unconstrained optimization, gradient descent (Gradient Descent) is the most commonly used method of I. another common method is the least squares method. When solving the minimum loss function can be iteratively solved descent method step by step through a gradient, the loss function is minimized and the model parameter values. Conversely, if we need to solve the maximum loss function, then we need to iterate the gradient ascent method. In machine learning, based on the basic gradient descent gradient descent method we developed two methods, respectively, and a stochastic gradient descent batch gradient descent method. This section mainly introduce the steepest descent method, and in the end its use matlab programming. (ps. Recommended reading computer, the formula works best)
1. Principle
Data in this chapter discusses the following optimization model:
Where is the real value of the continuous function, it is usually assumed having second continuous partial derivatives for the conversion may be equivalent to , the discussion below only the minimization problem.
Since the steepest descent method only considers the current fastest decline rather than a global decline fastest in solving nonlinear unconstrained problem, the most important thing is to get direction at each iteration and decrease the length of each step . Considering the function at the point in the direction of the directional derivative in the sense that at a point along at a rate of change. When continuously when micro directional derivative is negative, indicating that the function value decreases in this direction; direction derivative is smaller (negative), indicating decreased faster, it is determined search direction of an idea is to point the directional derivative is minimized direction as the search direction.
1.1 search direction determined
Direction is provided as a unit vector , the point direction , step size acquired by searching the point , to give the Taylor expansion formula:
You can get change at the rate:
Easy to see in the fastest decline is to the maximum rate of change out, it is to make the minimum ( ), and for
To make the minimum is that when the time,
That can determine the steepest descent direction , which is the origin of the name of the steepest descent method.
1.2 step of determining
Using the method of steepest descent strategy search step usually taken to accurately search method steps, namely: by this expression for the minimum point to obtaining step, generally:
The equation shows and are orthogonal. Here I did not use this method, but one-dimensional search method (Golden Section <0.618>) to find the approximate minimum point, one-dimensional search to achieve a better understanding of this process through their own programming, the end result with precision Search almost unanimously.
2. algorithmic process
Solving the problem:
DETAILED steepest descent method step of:
1. Select the initial point , a given accuracy .
2. Calculate , if , then stop, otherwise the order ;
3. In the direction for line search too , returns to 2.
3. Examples
Solving minimum points unconstrained nonlinear problems by using the steepest descent method:
Which .
Here, for intuitive understanding, we visualize the problem, were taken in steps of 0.2. Draw the image below:
Solution: Here the first minimum point determined with precision search, and then followed by one-dimensional search for verification.
(1)
(2)
(3)
(4) the use of precision search seek step
, Available ,
(5)
Similarly reversal (2) can always go back until iteration until the conditions are satisfied, the optimal solution is ,
4. advantages and disadvantages
Advantages:
(1) each iteration is simple, requires little initial point
drawbacks:
(1) Because it is optimal for each step of the iteration, but the overall convergence rate of decline is not necessarily the fastest.
(2) Find the optimal problem with the steepest descent method iterations zigzag path at a right angle as shown below, the iterative steps begin soon, but closer to optimum convergence rate slower
Using matlab programming to solve, as I'm sure minimum interval bisection method in use here, and then find the minimum with the Golden Section, the code is relatively carefully, the final results are as follows:
clear;
xk=[0,0]';
t=0.01;
syms x1;
syms x2;
while (1)
[dfx1_value,dfx2_value]=steepest_gradient(xk);
deltafx=[dfx1_value,dfx2_value]';
gredfxabs=sqrt(dfx1_value.^2+dfx2_value.^2);
if (gredfxabs<t)
x_best=xk
%f=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
f=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
m=matlabFunction(f);
y_best=m(xk(1),xk(2))
break;
else
dk=-deltafx;
fx=lamdafunction(dk,xk);
lamda=goldensfenge(fx);
xk=xk-lamda*deltafx;
continue;
end
end
function [dfx1_value,dfx2_value]=steepest_gradient(xk)
syms x1;
syms x2;
%fx=x1.^2-2*x1*x2+4*x2.^2+x1-3*x2;
%fx=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
fx=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
dfx_1=diff(fx,x1);
dfx_2=diff(fx,x2);
dfx1=matlabFunction(dfx_1);
dfx2=matlabFunction(dfx_2);
dfx1_value=dfx1(xk(1),xk(2));
dfx2_value=dfx2(xk(1),xk(2));
function [a,b]=region(fx,x0)
dx=0.1;
P=fdx(fx,x0);
if (P==0)
x_best=x0;
elseif (P>0)
while (1)
x1=x0-dx;
dx=dx+dx;
P=fdx(fx,x1);
if(P==0)
x_best=x1;
break;
elseif (P<0)
a=x1;
b=x0;
break;
else
x0=x1;
end
end
else
while (1)
x1=x0+dx;
dx=dx+dx;
P=fdx(fx,x1);
if(P==0)
x_best=x1;
break;
elseif(P>0)
a=x0;
b=x1;
break;
else
x0=x1;
end
end
end
function fx=lamdafunction(dk,x_k)
syms lamda;
syms x1;
syms x2;
x1=x_k(1)+lamda*dk(1);
x2=x_k(2)+lamda*dk(2);
%fx=x1.^2-2*x1*x2+4*x2.^2+x1-3*x2;
%fx=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
fx=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
function x_best=goldensfenge(fx)
x0=10*rand;
e=0.005;
[a,b]=region(fx,x0);
%x0=a+rand*(b-a);
x1=a+0.382*(b-a);
x2=a+0.618*(b-a);
f1=fvalue(fx,x1);
f2=fvalue(fx,x2);
while(1)
if (f1>f2)
a=x1;
x1=x2;
f1=f2;
x2=a+0.618*(b-a);
f2=fvalue(fx,x2);
if(abs(b-a)<=e)
x_best=(a+b)/2;
break;
else
continue;
end
elseif(f1<f2)
b=x2;
x2=x1;
f2=f1;
x1=a+0.382*(b-a);
f1=fvalue(fx,x1);
if(abs(b-a)<=e)
x_best=(a+b)/2;
break;
else
continue;
end
else
a=x1;
b=x2;
if(abs(b-a)<=e)
x_best=(a+b)/2;
break;
else
x1=a+0.382*(b-a);
x2=a+0.618*(b-a);
f1=fvalue(fx,x1);
f2=fvalue(fx,x2);
continue;
end
end
end
function y_value=fvalue(fx,a)
syms x;
%y=2*x.^2-x-1;
f=matlabFunction(fx);
y_value=f(a);
function dy_value=fdx(fx,a)
syms x;
%y=2*x.^2-x-1;
dy=diff(fx);
sign2fun=matlabFunction(dy);
dy_value=sign2fun(a);
Editor: High Aerospace