The optimization algorithm - steepest descent method

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/m0_37570854/article/details/88559619

Introduction : When in solving unconstrained problem, a class of algorithms is often used in the steepest descent method, in machine learning algorithms for solving the model parameters, that is unconstrained optimization, gradient descent (Gradient Descent) is the most commonly used method of I. another common method is the least squares method. When solving the minimum loss function can be iteratively solved descent method step by step through a gradient, the loss function is minimized and the model parameter values. Conversely, if we need to solve the maximum loss function, then we need to iterate the gradient ascent method. In machine learning, based on the basic gradient descent gradient descent method we developed two methods, respectively, and a stochastic gradient descent batch gradient descent method. This section mainly introduce the steepest descent method, and in the end its use matlab programming. (ps. Recommended reading computer, the formula works best)

1. Principle

Data in this chapter discusses the following optimization model:

                                                               \underset{x\epsilon R^{2}}{min}f(x)

Where fis the xreal value of the continuous function, it is usually assumed having second continuous partial derivatives for the maxf(x)conversion may be equivalent to min (f (x)), the discussion below only the minimization problem.

       Since the steepest descent method only considers the current fastest decline rather than a global decline fastest in solving nonlinear unconstrained problem, the most important thing is to get direction at each iteration d^{(k)}and decrease the length of each step \lambda ^{_{(k)}}. Considering the function f(x)at the point x^_{(k)}in the direction of the ddirectional derivative f_{d}(x^{(k)})=\triangledown f(x^{(k)})^{T}din the sense that f(x)at a point x^{(k)}along at a drate of change. When fcontinuously when micro directional derivative is negative, indicating that the function value decreases in this direction; direction derivative is smaller (negative), indicating decreased faster, it is determined search direction d^{(k)}of an idea is to f(x)point x^{(k)}the directional derivative is minimized direction as the search direction.

1.1 search direction d^{(k)}determined

         Direction is provided das a unit vector \left \| d \right \|=1, the point x^{(k)}direction d, step size \lambdaacquired by searching the point x^{(k+1)}=x^{(k)}+\lambda_{k}d^{(k)}, to give the Taylor expansion formula:

                                            f(x^{(k)}+\lambda_{k} d^{(k)})=f(x^{(k)})+\lambda_{k} \triangledown f(x^{(k)})^{T}+o(\lambda )

You can get x^{(k)}change at the rate:

                            \lim_{t\rightarrow 0}\frac{f(x^{(k)}+\lambda_{k} d^{(k)})-f(x^{(k)}))}{\lambda_{k} }=\lim_{t\rightarrow 0}\frac{\lambda_{k} \triangledown f(x^{(k)})^{T}d^{(k)}+o(\lambda )}{\lambda_{k} }=\triangledown f(x^{(k)})^Td^{(k)}

Easy to see in x^{(k)}the fastest decline is to the x^{(k)}maximum rate of change out, it is to make \triangledown f(x^{(k)})^Td^{(k)}the minimum ( \triangledown f(x^{(k)})^Td^{(k)}<0), and for

  \triangledown f(x^{(k)})^Td^{(k)}=\left \|\triangledown f(x^{(k)}) \right \|\cdot \left \| d^{(k)} \right \|\cdot cos\thetaTo make the minimum is that when cos\theta =-1the time,

d^{(k)}=-\frac{\triangledown f(x^{(k)})}{\triangledown \left \| f(x^{(k)}) \right \|}That can determine the steepest descent direction -\triangledown f(x^{(k)}), which is the origin of the name of the steepest descent method.

1.2 step \lambda^{(k)}of determining

Using the method of steepest descent strategy search step usually taken to accurately search method steps, namely: \lambda_{k}=argminf(x^{(k)}+\lambda_{k}d^{(k)})by this expression for the minimum point to obtaining step, generally:

   \frac{df(x^{(k)}+\lambda d^{(k)})}{d\lambda}=d^{(k)}\triangledown f(x^{(k)})=0The equation shows d^{(k)}and d^{(k+1)}are orthogonal. Here I did not use this method, but one-dimensional search method (Golden Section <0.618>) to find the approximate minimum point, one-dimensional search to achieve a better understanding of this process through their own programming, the end result with precision Search almost unanimously.

2. algorithmic process

Solving the problem:       \underset{x\epsilon R^{2}}{min}f(x)

DETAILED steepest descent method step of:

1. Select the initial point x^{(k)}, k=1a given accuracy \varepsilon >0.

2. Calculate \triangledown f(x^{(k)}), if \left \| \triangledown f(x^{(k)}) \right \|<\varepsilon, then stop, otherwise the order d^{(k)}=-\triangledown f(x^{(k)});

3. In x^{(k)}the direction d^{(k)}for line search too x^{(k+1)}=x^{(k)}+\lambda_{k}d^{(k)}, k=k+1returns to 2.

3. Examples

Solving minimum points unconstrained nonlinear problems by using the steepest descent method:

                                                         minf (x) = x _ {1} ^ {2} + 2x_ {2} ^ {2} {1} -2x_ x_ {2} {2} -2x_

Which x=(x_{1},x_{2})^{T},x^{(0)}=(0,0)^{T}.

Here, for intuitive understanding, we visualize the problem, x_{1},x_{2}were taken [-10,10]in steps of 0.2. Draw the image below:                                           

                                                                                 

Solution: Here the first minimum point determined with precision search, and then followed by one-dimensional search for verification.

(1) \triangledown f(x)=(2x_{1}-2x_{2},4x_{2}-2x_{1}-2)^{T}

(2)\triangledown f(x^{(0)})=(0,-2)^{T}

(3)d^{(0)}=-\triangledown f(x^{(0)})=(0,2)^{T}

(4) the use of precision search seek step\lambda

\lambda=argminf(x^{(0)}+\lambda d^{(0)}), Available \lambda=1/2,

(5)x^{(1)}=x^{(0)}+\lambda d^{(0)}=(0,1)^{T}

  Similarly reversal (2) can always go back until iteration until the conditions are satisfied, the optimal solution is x^{*}=(1,1)^{T},y^{*}=-1

4. advantages and disadvantages

Advantages: 
(1) each iteration is simple, requires little initial point 
drawbacks: 
(1) Because it is optimal for each step of the iteration, but the overall convergence rate of decline is not necessarily the fastest. 
(2) Find the optimal problem with the steepest descent method iterations zigzag path at a right angle as shown below, the iterative steps begin soon, but closer to optimum convergence rate slower

Using matlab programming to solve, as I'm sure minimum interval bisection method in use here, and then find the minimum with the Golden Section, the code is relatively carefully, the final results are as follows:

clear;
xk=[0,0]';
t=0.01;
syms x1;
syms x2;
while (1)
   [dfx1_value,dfx2_value]=steepest_gradient(xk);
   deltafx=[dfx1_value,dfx2_value]';
   gredfxabs=sqrt(dfx1_value.^2+dfx2_value.^2);
   if (gredfxabs<t)
      x_best=xk
      %f=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
      f=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
      m=matlabFunction(f);
      y_best=m(xk(1),xk(2))
      break;
   else 
      dk=-deltafx;
      fx=lamdafunction(dk,xk);
      lamda=goldensfenge(fx);
      xk=xk-lamda*deltafx;
      continue;
   end
end
function [dfx1_value,dfx2_value]=steepest_gradient(xk)
syms x1;
syms x2;
%fx=x1.^2-2*x1*x2+4*x2.^2+x1-3*x2;
%fx=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
fx=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
dfx_1=diff(fx,x1);
dfx_2=diff(fx,x2);
dfx1=matlabFunction(dfx_1);
dfx2=matlabFunction(dfx_2);
dfx1_value=dfx1(xk(1),xk(2));
dfx2_value=dfx2(xk(1),xk(2));
function [a,b]=region(fx,x0)
dx=0.1;
P=fdx(fx,x0);
if (P==0)
      x_best=x0; 
elseif (P>0)
    while (1)
      x1=x0-dx;
      dx=dx+dx;
      P=fdx(fx,x1);
      if(P==0)
          x_best=x1;
          break;
      elseif (P<0)
          a=x1;
          b=x0;
          break;
      else 
          x0=x1;
      end
    end
else
    while (1)
        x1=x0+dx;
        dx=dx+dx;
        P=fdx(fx,x1);
        if(P==0)
            x_best=x1;
            break;
        elseif(P>0)
            a=x0;
            b=x1;
            break;
        else
            x0=x1;
        end
    end
end
function fx=lamdafunction(dk,x_k)
syms lamda;
syms x1;
syms x2;
x1=x_k(1)+lamda*dk(1);
x2=x_k(2)+lamda*dk(2);
%fx=x1.^2-2*x1*x2+4*x2.^2+x1-3*x2;
%fx=x1-x2+2*x1.^2+2*x1*x2+x2.^2;
fx=x1.^2+2.*x2.^2-2.*x1.*x2-2.*x2;
function x_best=goldensfenge(fx)
x0=10*rand;
e=0.005;
[a,b]=region(fx,x0);
%x0=a+rand*(b-a);
 x1=a+0.382*(b-a);
 x2=a+0.618*(b-a);
 f1=fvalue(fx,x1);
 f2=fvalue(fx,x2);
while(1)
    if (f1>f2)
        a=x1;
        x1=x2;
        f1=f2;
        x2=a+0.618*(b-a);
        f2=fvalue(fx,x2);
        if(abs(b-a)<=e)
            x_best=(a+b)/2;
            break;
        else
            continue;
        end
    elseif(f1<f2)
        b=x2;
        x2=x1;
        f2=f1;
        x1=a+0.382*(b-a);
        f1=fvalue(fx,x1);
        if(abs(b-a)<=e)
            x_best=(a+b)/2;
            break;
        else
            continue;
        end
    else
        a=x1;
        b=x2;
        if(abs(b-a)<=e)
            x_best=(a+b)/2;
            break;
        else
            x1=a+0.382*(b-a);
            x2=a+0.618*(b-a);
            f1=fvalue(fx,x1);
            f2=fvalue(fx,x2);
            continue;
        end
    end
end

function y_value=fvalue(fx,a)
syms x;
%y=2*x.^2-x-1;
f=matlabFunction(fx);
y_value=f(a);
function dy_value=fdx(fx,a)
syms x;
%y=2*x.^2-x-1;
dy=diff(fx);
sign2fun=matlabFunction(dy);
dy_value=sign2fun(a);

                                     

                                                                                                                                         Editor: High Aerospace

                                                                                                       

Guess you like

Origin blog.csdn.net/m0_37570854/article/details/88559619