[Gradient method, Newton method Matlab example] Iteratively solve the minimum value of the loss function of the logistic regression problem

Consider the following Logistic Regression Problem:

min_{w\in R_{n}}\frac{1}{m}\sum_{i=1}^{m}log(1+exp(-b_{i}w^{T}a_{i}))+\frac{1}{100m}\left \| w \right \|^{2}

where a_{i}\in R^{n},b_{i}\in {\{-1,+1\}}are given data

The meaning here b=(b_{1},b_{2},...,b_{m})^{T}is the label vector


Matlab Code.zip

Attached are a9a.test, CINA.test and ijcnn1.test data sets, and libsvmread.mexw64 file for reading data sets

If you don't want to download from CSDN (because it sucks), you can also download it through Baidu Netdisk:

Matlab code.zip (3.16MB) 


1. Mathematical form and its implementation in Matlab

1. The mathematical representation of the Logistic Regression loss function and its gradient: 

 Matlab implementation of the loss function and its gradient: 

function z = Sigmoid(z)
%Sigmoid函数 
    z = 1./(1 + exp(-z)); 
end 
function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数(即损失函数)
    object = m0*(sum(log(1+exp(-b.*(A'*w))))+(1/100)*(w'*w)); 
end 
function grad = Gradient(w,m0,A,b)
%目标函数的梯度 
    grad = m0*(-A*(b.*(1-Sigmoid(b.*(A'*w))))+(1/50)*w); 
end

2. The mathematical representation of the Hessian matrix of the loss function: 

Matlab implementation of the Hessian matrix of the loss function: 

function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵 
    sigmoid = Sigmoid(b.*(A*w)); 
    vector = sigmoid.*(1-sigmoid); 
%实际上应是 b.*sigmoid.*(1-sigmoid).*b,其中b代表的是逻辑回归问题的标签向量
%由于 b 的元素只能是±1,故左右点乘 b 是不发生任何影响,故可写作 sigmoid.*(1-sigmoid) 
    repeat_vector = repmat(vector',n,1); 
%在行维度和列维度上分别重复 vector 的转置 n 次和 1 次,构造 nxm 的矩阵 repeat_vector 
%用 repeat_vector.*A'*A 代替 A'*diag(vector)*A,如果 m>n 的话可以节省空间 
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低 
    hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1))); 
end

3. The mathematical form of the backtracking method:

(Note: t_{k}It represents the step size, p_{k}which represents the direction of descent, k is the number of iterations of the outer algorithm, and s is equivalent to the number of iterations of the backtracking method)

    For the gradient method:p_{k}=-\triangledown L(x_{k})

    For Newton's method:p_{k}=-{\triangledown ^{2}L(x_{k})}^{-1}\triangledown L(x_{k})

    α, β and \hat{t}are the parameters of the backtracking method, α is a small positive number, β∈(0,1), \hat{t}is the initial step size

    The backtracking method starts with a large step size \hat{t}and multiplies each iteration by β, that is, in the sth iteration t_{k}=\hat{t}\beta ^{s}, until the smallest non-negative integer s is found such that:L(x_{k}+t_{k}p_{k})\leq L(x_{k})+\alpha t_{k}{p_{k}}^{T}\triangledown L(x_{k})

Compared with the exact method to determine the step size: t_{k}=argmin_{t\geq 0}L(x_{k}+t{p}_{k}), the backtracking method to determine the step size is easier to implement

Matlab implementation of backtracking method to determine step size:

function StepSize = Backtracking(w,m0,A,b,object,grad,direction) 
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
    alpha = 1e-4;
    beta = 0.5; 
    StepSize = 1; 
    new_w = w + StepSize*direction; 
    while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad)) 
        StepSize = StepSize * beta; 
        new_w = w + StepSize*direction; 
    end 
end

4. The difference between Exact Newton and Inexact Newton

The descending direction of Newton's method is p_{k}=-{\triangledown ^{2}L(x_{k})}^{-1}\triangledown L(x_{k}), according to its calculation method, it can be divided into accurate method and inaccurate method;

  • The exact method is obtained by calculating the inverse of the Hessian matrix {\triangledown ^{2}L(x_{k})}^{-1}, multiplying it with the gradient \triangledown L(x_{k}), and finally taking the negative p_{k};
  • The non-exact method uses the conjugate gradient method (Conjugate Gradient, CG) to calculate the direction of descent to avoid the computational complexity of matrix inversion

The approximate solution of the linear system can be obtained by the conjugate gradient method Ax=b(different from the analytical solution x=A^{-1}b)

Let A={\triangledown ^{2}L(x_{k})}, x=p_{k}, , the solution of b=-\triangledown L(x_{k})the linear system can be obtained{\triangledown ^{2}L(x_{k})}p_{k}=-\triangledown L(x_{k})

Remember the residual (residual)r_{k}={\triangledown ^{2}L(x_{k})}p_{k}+\triangledown L(x_{k})

For convex problems, the termination condition of the conjugate gradient method iteration is:   \left \| r_{k} \right \|\leqslant\eta _{k}\left \| \bigtriangledown L(x_{k})\right \|

There  \eta _{k}are different methods, called inexact rules, and the three commonly used inexact rules are:

  • \eta _{k}=min\{0.5,\left \| \bigtriangledown L(x_{k})\right \|\} 
  •  \eta _{k}=min\{0.5,\sqrt{\left \| \bigtriangledown L(x_{k})\right \|}\} 
  •  \eta _{k}=0.5

Obviously, from left to right, the convergence speed of the inexact Newton method corresponding to the three inexact rules decreases successively

The code for calculating the direction of descent by the exact Newton method:

direction = - hess\grad;

The code for the conjugate gradient method used by the inexact method:

① Non-exact rules:

function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
    CG_tol = zeros(3,1);
    CG_tol(1) = min(0.5,ng)*ng;
    CG_tol(2) = min(0.5,sqrt(ng))*ng;
    CG_tol(3) = 0.5*ng;
end

②CG: 

function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
    x = 0;
    CG_tol = InexactRule(ng);
%残差(residual)=Ax+g,不妨设下降方向的初值是零向量,则残差的初值是g
    r = g;
    p = -r;
    for iter = 1:CG_MaxIter
        rr = r' * r;
        Ap = A * p;
        alpha = rr / (p'*Ap);
        x = x + alpha * p;
        r = r + alpha * Ap;
        nrl = norm(r);
        if nrl <= CG_tol(Rule)
           break;
        end
        beta = nrl^2 / rr;
        p = -r + beta * p;
    end
    fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end

2. Question 1:

result:

 code:

id = mod(21307140051,pow2(32));
%对于rng()来说学号太大,故可取模2^32的余数,来设置种子
rng(id,'twister');
%随机数模式为twister,用rand('seed',id)设置种子似乎并不能使随机数结果复现

m = 500;
m0 = 1/m;%优化,用于减少1/m运算的次数
n = 1000;
w = zeros(n,1);%自变量
A = randn(n,m);%参量
b = sign(rand(m,1)-0.5);
i = zeros(3,1);%i用于记录三次梯度法的迭代次数

fprintf("Constant Stepsize(0.02) Gradient Method:\n");
[history1,i(1)] = ConstantStepSizeGradient(w,m0,A,b,0.02,3000,1e-4);
%执行固定步长为0.02的梯度法,最多迭代3000次
fprintf("result = %f\twith %d steps\n",history1(i(1)+1,1),i(1));

fprintf("Constant Stepsize(0.04) Gradient Method:\n");
[history2,i(2)] = ConstantStepSizeGradient(w,m0,A,b,0.04,3000,1e-4);
%执行固定步长为0.04的梯度法,最多迭代3000次
fprintf("result = %f\twith %d steps\n",history2(i(2)+1,1),i(2));

fprintf("Backtracking Gradient Method:\n");
[history3,i(3)] = BacktrackingGradient(w,m0,A,b,3000,1e-4);
%执行回溯线搜索,最多迭代3000次,终止条件为梯度的二范数小于1e-4
fprintf("result = %f\twith %d steps\n",history3(i(3)+1,1),i(3));

%打印三个算法终止时的目标函数值以及迭代次数
fprintf(['Constant step size 0.02:\t%f\t(%d iteration)\n' ...
         'Constant step size 0.04:\t%f\t(%d iteration)\n' ...
         'Backtracking line search(t_hat = 1,alpha = 0.001,beta = 0.5):\t' ...
         '%f\t(%d iteration)\n'], ...
         history1(i(1)+1,1),i(1), ...
         history2(i(2)+1,1),i(2), ...
         history3(i(3)+1,1),i(3));

%创建绘图窗口
figure
%绘制“log10(梯度范数)---迭代次数”的图像
plot(0:i(1),log10(history1(1:i(1)+1,2)),'-.r', ...%设置了线型和颜色分别为
     0:i(2),log10(history2(1:i(2)+1,2)),'--b', ...%"实线绿色"、"虚线蓝色"和"点线红色"
     0:i(3),log10(history3(1:i(3)+1,2)),'-g','LineWidth',2)%设置了线宽为2磅(默认值的4倍)
%添加图例
legend('constant step size gradient method with alpha = 0.02', ...
       'constant step size gradient method with alpha = 0.04', ...
       'backtracking line search gradient method')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')

function z = Sigmoid(z)
%Sigmoid函数
    z = 1./(1 + exp(-z));
end

function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
    object = m0*(sum(log(1+exp(-b.*(A'*w))))+(1/100)*(w'*w));
end

function grad = Gradient(w,m0,A,b)
%目标函数的梯度
    grad = m0*(-A*(b.*(1-Sigmoid(b.*(A'*w))))+(1/50)*w);
end

function [history,i] = ConstantStepSizeGradient(w,m0,A,b,StepSize,Iteration,epsilon)
%固定步长为StepSize的梯度法,最多迭代Iteration次,终止条件为梯度的二范数小于epsilon    
    history = zeros(Iteration+1,2);
    for i = 1:Iteration
        object = L(w,m0,A,b);
        history(i,1) = object;
        grad = Gradient(w,m0,A,b);
        w = w - StepSize*grad; %update w, 固定步长
        history(i,2) = norm(grad);
        if history(i,2) < epsilon %Termination Criteria
           break
        end
        if mod(i,500)==0 %设置输出语句方便调试
           fprintf('Number of Iteration:\t%d\n',i);
        end
    end
    history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
    grad = Gradient(w,m0,A,b);
    history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end

function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
    alpha = 1e-4;
    beta = 0.5;
    StepSize = 1;
    new_w = w + StepSize*direction;
    while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
        StepSize = StepSize * beta;
        new_w = w + StepSize*direction;
    end
end

function [history,i] = BacktrackingGradient(w,m0,A,b,Iteration,epsilon)
%回溯线搜索,最多迭代Iteration次,终止条件为梯度的二范数小于epsilon
    history = zeros(Iteration+1,2);
    for i = 1:Iteration
        object = L(w,m0,A,b);
        history(i,1) = object;
        grad = Gradient(w,m0,A,b);
        direction = -grad; %下降方向
        StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
        w = w + StepSize*direction; %update w
        history(i,2) = norm(grad);
        if history(i,2) < epsilon %终止条件(Termination Criteria)
           break
        end
        if mod(i,500)==0 %设置输出语句方便调试
           fprintf('Number of Iteration:\t%d\n',i);
        end
    end
    history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
    grad = Gradient(w,m0,A,b);
    history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end

3. Question 2: 

 Matlab statement to load the a9a.test dataset:

dataset = 'a9a.test';
[b,A] = libsvmread(dataset);

To load the CINA.test and ijcnn1.test datasets, you only need to modify the first statement accordingly;

1. The second question (a):

result:

①a9a.test:

 ②CINA.test:

 

③ijcnn1.test:

code: 

%读取数据集
dataset = 'a9a.test';
[b,A] = libsvmread(dataset);

[m,n] = size(A);
m0 = 1/m;
w = zeros(n,1);
i = zeros(6,1);

%为保持与第二题(b)的对齐,编号为1的history和i保留
fprintf('Exact Newton:\n');
[history2,i(2)] = ExactNewton(w,m0,n,A,b,1e-6);
%精度为1e-6的精确牛顿法
fprintf('result = %f\twith %d steps\n',history2(i(2)+1,1),i(2));

fprintf('Inexact Newton (Rule 1):\n');
[history3,i(3)] = InexactNewton(w,m0,n,A,b,1e-6,1000,1);
%精度为1e-6、CG最大迭代次数为1000、使用第1条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history3(i(3)+1,1),i(3));

fprintf('Inexact Newton (Rule 2):\n');
[history4,i(4)] = InexactNewton(w,m0,n,A,b,1e-6,1000,2);
%精度为1e-6、CG最大迭代次数为1000、使用第2条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history4(i(4)+1,1),i(4));

fprintf('Inexact Newton (Rule 3):\n');
[history5,i(5)] = InexactNewton(w,m0,n,A,b,1e-6,1000,3);
%精度为1e-6、CG最大迭代次数为1000、使用第3条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history5(i(5)+1,1),i(5));

fprintf('Exact Newton with higher accuracy:\n');
[history6,i(6)] = ExactNewton(w,m0,n,A,b,1e-12);
%精度为1e-12的精确牛顿法,以此近似最优值
fprintf('result = %f\twith %d steps\n',history6(i(6)+1,1),i(6));

%以更高精度的牛顿法的结果作为最优值的近似值
optimal = history6(i(6)+1,1);
%打印算法终止时的目标函数值以及迭代次数
fprintf(['Exact Newton:\t%6.5f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 1):\t%6.5f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 2):\t%6.5f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 3):\t%6.5f\t(%d iteration)\n' ...
         'optimal:\t%6.5f\n'], ...
         history2(i(2)+1,1),i(2), ...
         history3(i(3)+1,1),i(3), ...
         history4(i(4)+1,1),i(4), ...
         history5(i(5)+1,1),i(5), ...
         optimal);

%创建绘图窗口
figure

%子图1,绘制“log10(梯度范数)-迭代数”图
subplot(2,1,1);
plot(0:i(2),log10(history2(1:i(2)+1,2)),'b--*', ...
     0:i(3),log10(history3(1:i(3)+1,2)),'r-.+', ...
     0:i(4),log10(history4(1:i(4)+1,2)),'y-.s', ...
     0:i(5),log10(history5(1:i(5)+1,2)),'g-.x', ...
     'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',6);
%添加直线y=-6
line(0:i(5),-6*ones(i(5)+1,1));
%添加图例
legend('Exact Newton Method', ...
       'Inexact Newton Method (Rule 1)', ...
       'Inexact Newton Method (Rule 2)', ...
       'Inexact Newton Method (Rule 3)');
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')

%子图2,绘制log10(残差)-迭代数图
subplot(2,1,2);
plot(0:i(2),log10(history2(1:i(2)+1,1)-optimal),'b--*', ...
     0:i(3),log10(history3(1:i(3)+1,1)-optimal),'r-.+',...
     0:i(4),log10(history4(1:i(4)+1,1)-optimal),'y-.s',...
     0:i(5),log10(history5(1:i(5)+1,1)-optimal),'g-.x',...
     'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',6);
%添加图例
legend('Exact Newton Method', ...
       'Inexact Newton Method (Rule 1)', ...
       'Inexact Newton Method (Rule 2)', ...
       'Inexact Newton Method (Rule 3)');
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')%添加坐标轴标签
ylabel('log10(Residuals of objective value)')

function z = Sigmoid(z)
%Sigmoid函数
    z = 1./(1 + exp(-z));
end

function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
    object = m0*(sum(log(1+exp(-b.*(A*w))))+(1/100)*(w'*w));
end

function grad = Gradient(w,m0,A,b)
%目标函数的梯度
    grad = m0*(-A'*(b.*(1-Sigmoid(b.*(A*w))))+(1/50)*w);
end

function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵
    sigmoid = Sigmoid(b.*(A*w));
    vector = sigmoid.*(1-sigmoid);
%实际上应是b.*sigmoid.*(1-sigmoid).*b
%但由于b的元素只能是±1,故左右点乘b是不发生任何影响,故可写作sigmoid.*(1-sigmoid)
    repeat_vector = repmat(vector',n,1);
%在行维度和列维度上分别重复vector的转置n次和1次,构造nxm的矩阵repeat_vector
%用repeat_vector.*A'*A代替A'*diag(vector)*A,如果m>n的话可以节省空间
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低
    hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1)));
end

function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
    alpha = 1e-4;
    beta = 0.5;
    StepSize = 1;
    new_w = w + StepSize*direction;
    while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
        StepSize = StepSize * beta;
        new_w = w + StepSize*direction;
    end
end

function [history,i] = ExactNewton(w,m0,n,A,b,epsilon)
%精确牛顿法,精度要求为epsilon
    history = zeros(1001,2);
    i = 0;
    object = L(w,m0,A,b);
    history(i+1,1) = object;
    grad = Gradient(w,m0,A,b);
    hess = Hessian(w,m0,n,A,b);
    direction = -hess\grad; %精确下降方向
    StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
    w = w + StepSize*direction;
    history(i+1,2) = norm(grad);
    fprintf('%d iteration done!\n',i+1);
    while history(i+1,2) >= epsilon
          i = i + 1;
          object = L(w,m0,A,b);
          history(i+1,1) = object;
          grad = Gradient(w,m0,A,b);
          hess = Hessian(w,m0,n,A,b);
          direction = -hess\grad; %精确下降方向
          StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
          w = w + StepSize*direction;
          history(i+1,2) = norm(grad);
          fprintf('%d iteration done!\n',i+1); %设置输出语句方便调试
    end
end

function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
    CG_tol = zeros(3,1);
    CG_tol(1) = min(0.5,ng)*ng;
    CG_tol(2) = min(0.5,sqrt(ng))*ng;
    CG_tol(3) = 0.5*ng;
end

function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
    x = 0;
    CG_tol = InexactRule(ng);
    r = g;
    p = -r;
    for iter = 1:CG_MaxIter
        rr = r' * r;
        Ap = A * p;
        alpha = rr / (p'*Ap);
        x = x + alpha * p;
        r = r + alpha * Ap;
        nrl = norm(r);
        if nrl <= CG_tol(Rule)
           break;
        end
        beta = nrl^2 / rr;
        p = -r + beta * p;
    end
    fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end

function [history,i] = InexactNewton(w,m0,n,A,b,epsilon,CG_MaxIter,Rule)
%非精确牛顿法,精度要求为epsilon,每次迭代中CG的最大迭代次数为CG_MaxIter
%Rule为使用的inexact rule的编号
    history = zeros(1001,2);
    i = 0;
    object = L(w,m0,A,b);
    history(i+1,1) = object;
    grad = Gradient(w,m0,A,b);
    hess = Hessian(w,m0,n,A,b);
    norm_grad = norm(grad);
    direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
    StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
    w = w + StepSize*direction;
    history(i+1,2) = norm_grad;
    while history(i+1,2) >= epsilon
          i = i + 1;
          object = L(w,m0,A,b);
          history(i+1,1) = object;
          grad = Gradient(w,m0,A,b);
          hess = Hessian(w,m0,n,A,b);
          norm_grad = norm(grad);
          direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
          StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
          w = w + StepSize*direction;
          history(i+1,2) = norm_grad;
    end
end

2. Question 2(b):

 result:

①a9a.test:

 ②CINA.test:

 ③ijcnn1.test

 code:

(The difference from the code in (a) is mainly the previous input and drawing instructions, and the BacktrackingGradient function is added)

%读取数据集
dataset = 'a9a.test';
[b,A] = libsvmread(dataset);

[m,n] = size(A);
m0 = 1/m;
w = zeros(n,1);
i = zeros(6,1);

fprintf('Backtracking Gradient:\n');
[history1,i(1)] = BacktrackingGradient(w,m0,A,b,1000);
%未设置精度要求的回溯线搜索,迭代1000次
fprintf('result = %f\twith %d steps\n',history1(i(1)+1,1),i(1));

fprintf('Exact Newton:\n');
[history2,i(2)] = ExactNewton(w,m0,n,A,b,1e-6);
%精度为1e-6的精确牛顿法
fprintf('result = %f\twith %d steps\n',history2(i(2)+1,1),i(2));

fprintf('Inexact Newton (Rule 1):\n');
[history3,i(3)] = InexactNewton(w,m0,n,A,b,1e-6,1000,1);
%精度为1e-6、CG最大迭代次数为1000、使用第1条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history3(i(3)+1,1),i(3));

fprintf('Inexact Newton (Rule 2):\n');
[history4,i(4)] = InexactNewton(w,m0,n,A,b,1e-6,1000,2);
%精度为1e-6、CG最大迭代次数为1000、使用第2条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history4(i(4)+1,1),i(4));

fprintf('Inexact Newton (Rule 3):\n');
[history5,i(5)] = InexactNewton(w,m0,n,A,b,1e-6,1000,3);
%精度为1e-6、CG最大迭代次数为1000、使用第3条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history5(i(5)+1,1),i(5));

fprintf('Exact Newton with higher accuracy:\n');
[history6,i(6)] = ExactNewton(w,m0,n,A,b,1e-12);
%精度为1e-12的精确牛顿法,以此近似最优值
fprintf('result = %f\twith %d steps\n',history6(i(6)+1,1),i(6));

%以更高精度的牛顿法的结果作为最优值的近似值
optimal = history6(i(6)+1,1);
%打印算法终止时的目标函数值以及迭代次数
fprintf(['Backtracking Line Search:\t%f\t(%d iteration)\n'...
         'Exact Newton:\t%f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 1):\t%f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 2):\t%f\t(%d iteration)\n' ...
         'Inexact Newton (Rule 3):\t%f\t(%d iteration)\n' ...
         'optimal:\t%f\n'], ...
         history1(i(1)+1,1),i(1), ...
         history2(i(2)+1,1),i(2), ...
         history3(i(3)+1,1),i(3), ...
         history4(i(4)+1,1),i(4), ...
         history5(i(5)+1,1),i(5), ...
         optimal);

%创建绘图窗口
figure
%子图1,绘制牛顿法“log10(梯度范数)-迭代数”图
subplot(2,2,1);
plot(0:i(2),log10(history2(1:i(2)+1,2)),'b--*', ...
     0:i(3),log10(history3(1:i(3)+1,2)),'r-.+', ...
     0:i(4),log10(history4(1:i(4)+1,2)),'y-.s', ...
     0:i(5),log10(history5(1:i(5)+1,2)),'g-.x', ...
     'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',5);
%添加直线y=-6
line(0:i(5),-6*ones(i(5)+1,1));
%添加图例
legend('Exact Newton Method', ...
       'Inexact Newton Method (Rule 1)', ...
       'Inexact Newton Method (Rule 2)', ...
       'Inexact Newton Method (Rule 3)')
%去掉图例的边线
legend('boxoff')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')

%子图2,绘制梯度法(回溯线搜索)“log10(梯度范数)-迭代数”图
subplot(2,2,2);
plot(0:i(1),log10(history1(1:i(1)+1,2)));
%添加图例,并去掉图例边线
legend('Backtracking Line Search')
legend('boxoff')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')

%子图3,绘制牛顿法“log10(目标函数与最优值残差)-迭代数”图
subplot(2,2,3);
plot(0:i(2),log10(history2(1:i(2)+1,1)-optimal),'b--*', ...
     0:i(3),log10(history3(1:i(3)+1,1)-optimal),'r-.+',...
     0:i(4),log10(history4(1:i(4)+1,1)-optimal),'y-.s',...
     0:i(5),log10(history5(1:i(5)+1,1)-optimal),'g-.x',...
     'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',5);
%添加图例
legend('Exact Newton Method', ...
       'Inexact Newton Method (Rule 1)', ...
       'Inexact Newton Method (Rule 2)', ...
       'Inexact Newton Method (Rule 3)')
%去掉图例的边线
legend('boxoff')
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Residuals of objective value)')

%子图4,绘制梯度法(回溯线搜索)“log10(目标函数与最优值残差)-迭代数”图
subplot(2,2,4);
plot(0:i(1),log(history1(1:i(1)+1,1)-optimal));
%添加图例,并去掉图例边线
legend('Backtracking Line Search')
legend('boxoff')
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Residuals of objective value)')

function z = Sigmoid(z)
%Sigmoid函数
    z = 1./(1 + exp(-z));
end

function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
    object = m0*(sum(log(1+exp(-b.*(A*w))))+(1/100)*(w'*w));
end

function grad = Gradient(w,m0,A,b)
%目标函数的梯度
    grad = m0*(-A'*(b.*(1-Sigmoid(b.*(A*w))))+(1/50)*w);
end

function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵
    sigmoid = Sigmoid(b.*(A*w));
    vector = sigmoid.*(1-sigmoid);
%实际上应是b.*sigmoid.*(1-sigmoid).*b
%但由于b的元素只能是±1,故左右点乘b是不发生任何影响,故可写作sigmoid.*(1-sigmoid)
    repeat_vector = repmat(vector',n,1);
%在行维度和列维度上分别重复vector的转置n次和1次,构造nxm的矩阵repeat_vector
%用repeat_vector.*A'*A代替A'*diag(vector)*A,如果m>n的话可以节省空间
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低
    hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1)));
end

function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
    alpha = 1e-4;
    beta = 0.5;
    StepSize = 1;
    new_w = w + StepSize*direction;
    while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
        StepSize = StepSize * beta;
        new_w = w + StepSize*direction;
    end
end

function [history,i] = BacktrackingGradient(w,m0,A,b,Iteration)
%回溯线搜索,迭代Iteration次
    history = zeros(Iteration+1,2);
    for i = 1:Iteration
        object = L(w,m0,A,b);
        history(i,1) = object;
        grad = Gradient(w,m0,A,b);
        direction = -grad;
        StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
        w = w + StepSize*direction; %update w
        history(i,2) = norm(grad);
        if mod(i,100)==0
           fprintf('Number of Iteration:\t%d\n',i); %设置输出语句方便调试
        end
    end
    history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
    grad = Gradient(w,m0,A,b);
    history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end

function [history,i] = ExactNewton(w,m0,n,A,b,epsilon)
%精确牛顿法,精度要求为epsilon
    history = zeros(1001,2);
    i = 0;
    object = L(w,m0,A,b);
    history(i+1,1) = object;
    grad = Gradient(w,m0,A,b);
    hess = Hessian(w,m0,n,A,b);
    direction = -hess\grad; %精确下降方向
    StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
    w = w + StepSize*direction;
    history(i+1,2) = norm(grad);
    fprintf('%d iteration done!\n',i+1);
    while history(i+1,2) >= epsilon
          i = i + 1;
          object = L(w,m0,A,b);
          history(i+1,1) = object;
          grad = Gradient(w,m0,A,b);
          hess = Hessian(w,m0,n,A,b);
          direction = -hess\grad; %精确下降方向
          StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
          w = w + StepSize*direction;
          history(i+1,2) = norm(grad);
          fprintf('%d iteration done!\n',i+1); %设置输出语句方便调试
    end
end

function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
    CG_tol = zeros(3,1);
    CG_tol(1) = min(0.5,ng)*ng;
    CG_tol(2) = min(0.5,sqrt(ng))*ng;
    CG_tol(3) = 0.5*ng;
end

function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
    x = 0;
    CG_tol = InexactRule(ng);
    r = g;
    p = -r;
    for iter = 1:CG_MaxIter
        rr = r' * r;
        Ap = A * p;
        alpha = rr / (p'*Ap);
        x = x + alpha * p;
        r = r + alpha * Ap;
        nrl = norm(r);
        if nrl <= CG_tol(Rule)
           break;
        end
        beta = nrl^2 / rr;
        p = -r + beta * p;
    end
    fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end

function [history,i] = InexactNewton(w,m0,n,A,b,epsilon,CG_MaxIter,Rule)
%非精确牛顿法,精度要求为epsilon,每次迭代中CG的最大迭代次数为CG_MaxIter
%Rule为使用的inexact rule的编号
    history = zeros(1001,2);
    i = 0;
    object = L(w,m0,A,b);
    history(i+1,1) = object;
    grad = Gradient(w,m0,A,b);
    hess = Hessian(w,m0,n,A,b);
    norm_grad = norm(grad);
    direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
    StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
    w = w + StepSize*direction;
    history(i+1,2) = norm_grad;
    while history(i+1,2) >= epsilon
          i = i + 1;
          object = L(w,m0,A,b);
          history(i+1,1) = object;
          grad = Gradient(w,m0,A,b);
          hess = Hessian(w,m0,n,A,b);
          norm_grad = norm(grad);
          direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
          StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
          w = w + StepSize*direction;
          history(i+1,2) = norm_grad;
    end
end

Thanks for reading!

Guess you like

Origin blog.csdn.net/ycy1300585044/article/details/130828097