Consider the following Logistic Regression Problem:
where are given data
The meaning here is the label vector
Attached are a9a.test, CINA.test and ijcnn1.test data sets, and libsvmread.mexw64 file for reading data sets
If you don't want to download from CSDN (because it sucks), you can also download it through Baidu Netdisk:
1. Mathematical form and its implementation in Matlab
1. The mathematical representation of the Logistic Regression loss function and its gradient:
Matlab implementation of the loss function and its gradient:
function z = Sigmoid(z)
%Sigmoid函数
z = 1./(1 + exp(-z));
end
function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数(即损失函数)
object = m0*(sum(log(1+exp(-b.*(A'*w))))+(1/100)*(w'*w));
end
function grad = Gradient(w,m0,A,b)
%目标函数的梯度
grad = m0*(-A*(b.*(1-Sigmoid(b.*(A'*w))))+(1/50)*w);
end
2. The mathematical representation of the Hessian matrix of the loss function:
Matlab implementation of the Hessian matrix of the loss function:
function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵
sigmoid = Sigmoid(b.*(A*w));
vector = sigmoid.*(1-sigmoid);
%实际上应是 b.*sigmoid.*(1-sigmoid).*b,其中b代表的是逻辑回归问题的标签向量
%由于 b 的元素只能是±1,故左右点乘 b 是不发生任何影响,故可写作 sigmoid.*(1-sigmoid)
repeat_vector = repmat(vector',n,1);
%在行维度和列维度上分别重复 vector 的转置 n 次和 1 次,构造 nxm 的矩阵 repeat_vector
%用 repeat_vector.*A'*A 代替 A'*diag(vector)*A,如果 m>n 的话可以节省空间
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低
hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1)));
end
3. The mathematical form of the backtracking method:
(Note: It represents the step size, which represents the direction of descent, k is the number of iterations of the outer algorithm, and s is equivalent to the number of iterations of the backtracking method)
For the gradient method:
For Newton's method:
α, β and are the parameters of the backtracking method, α is a small positive number, β∈(0,1), is the initial step size
The backtracking method starts with a large step size and multiplies each iteration by β, that is, in the sth iteration , until the smallest non-negative integer s is found such that:
Compared with the exact method to determine the step size: , the backtracking method to determine the step size is easier to implement
Matlab implementation of backtracking method to determine step size:
function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
alpha = 1e-4;
beta = 0.5;
StepSize = 1;
new_w = w + StepSize*direction;
while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
StepSize = StepSize * beta;
new_w = w + StepSize*direction;
end
end
4. The difference between Exact Newton and Inexact Newton
The descending direction of Newton's method is , according to its calculation method, it can be divided into accurate method and inaccurate method;
- The exact method is obtained by calculating the inverse of the Hessian matrix , multiplying it with the gradient , and finally taking the negative ;
- The non-exact method uses the conjugate gradient method (Conjugate Gradient, CG) to calculate the direction of descent to avoid the computational complexity of matrix inversion
The approximate solution of the linear system can be obtained by the conjugate gradient method (different from the analytical solution )
Let , , , the solution of the linear system can be obtained
Remember the residual (residual)
For convex problems, the termination condition of the conjugate gradient method iteration is:
There are different methods, called inexact rules, and the three commonly used inexact rules are:
Obviously, from left to right, the convergence speed of the inexact Newton method corresponding to the three inexact rules decreases successively
The code for calculating the direction of descent by the exact Newton method:
direction = - hess\grad;
The code for the conjugate gradient method used by the inexact method:
① Non-exact rules:
function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
CG_tol = zeros(3,1);
CG_tol(1) = min(0.5,ng)*ng;
CG_tol(2) = min(0.5,sqrt(ng))*ng;
CG_tol(3) = 0.5*ng;
end
②CG:
function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
x = 0;
CG_tol = InexactRule(ng);
%残差(residual)=Ax+g,不妨设下降方向的初值是零向量,则残差的初值是g
r = g;
p = -r;
for iter = 1:CG_MaxIter
rr = r' * r;
Ap = A * p;
alpha = rr / (p'*Ap);
x = x + alpha * p;
r = r + alpha * Ap;
nrl = norm(r);
if nrl <= CG_tol(Rule)
break;
end
beta = nrl^2 / rr;
p = -r + beta * p;
end
fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end
2. Question 1:
result:
code:
id = mod(21307140051,pow2(32));
%对于rng()来说学号太大,故可取模2^32的余数,来设置种子
rng(id,'twister');
%随机数模式为twister,用rand('seed',id)设置种子似乎并不能使随机数结果复现
m = 500;
m0 = 1/m;%优化,用于减少1/m运算的次数
n = 1000;
w = zeros(n,1);%自变量
A = randn(n,m);%参量
b = sign(rand(m,1)-0.5);
i = zeros(3,1);%i用于记录三次梯度法的迭代次数
fprintf("Constant Stepsize(0.02) Gradient Method:\n");
[history1,i(1)] = ConstantStepSizeGradient(w,m0,A,b,0.02,3000,1e-4);
%执行固定步长为0.02的梯度法,最多迭代3000次
fprintf("result = %f\twith %d steps\n",history1(i(1)+1,1),i(1));
fprintf("Constant Stepsize(0.04) Gradient Method:\n");
[history2,i(2)] = ConstantStepSizeGradient(w,m0,A,b,0.04,3000,1e-4);
%执行固定步长为0.04的梯度法,最多迭代3000次
fprintf("result = %f\twith %d steps\n",history2(i(2)+1,1),i(2));
fprintf("Backtracking Gradient Method:\n");
[history3,i(3)] = BacktrackingGradient(w,m0,A,b,3000,1e-4);
%执行回溯线搜索,最多迭代3000次,终止条件为梯度的二范数小于1e-4
fprintf("result = %f\twith %d steps\n",history3(i(3)+1,1),i(3));
%打印三个算法终止时的目标函数值以及迭代次数
fprintf(['Constant step size 0.02:\t%f\t(%d iteration)\n' ...
'Constant step size 0.04:\t%f\t(%d iteration)\n' ...
'Backtracking line search(t_hat = 1,alpha = 0.001,beta = 0.5):\t' ...
'%f\t(%d iteration)\n'], ...
history1(i(1)+1,1),i(1), ...
history2(i(2)+1,1),i(2), ...
history3(i(3)+1,1),i(3));
%创建绘图窗口
figure
%绘制“log10(梯度范数)---迭代次数”的图像
plot(0:i(1),log10(history1(1:i(1)+1,2)),'-.r', ...%设置了线型和颜色分别为
0:i(2),log10(history2(1:i(2)+1,2)),'--b', ...%"实线绿色"、"虚线蓝色"和"点线红色"
0:i(3),log10(history3(1:i(3)+1,2)),'-g','LineWidth',2)%设置了线宽为2磅(默认值的4倍)
%添加图例
legend('constant step size gradient method with alpha = 0.02', ...
'constant step size gradient method with alpha = 0.04', ...
'backtracking line search gradient method')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')
function z = Sigmoid(z)
%Sigmoid函数
z = 1./(1 + exp(-z));
end
function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
object = m0*(sum(log(1+exp(-b.*(A'*w))))+(1/100)*(w'*w));
end
function grad = Gradient(w,m0,A,b)
%目标函数的梯度
grad = m0*(-A*(b.*(1-Sigmoid(b.*(A'*w))))+(1/50)*w);
end
function [history,i] = ConstantStepSizeGradient(w,m0,A,b,StepSize,Iteration,epsilon)
%固定步长为StepSize的梯度法,最多迭代Iteration次,终止条件为梯度的二范数小于epsilon
history = zeros(Iteration+1,2);
for i = 1:Iteration
object = L(w,m0,A,b);
history(i,1) = object;
grad = Gradient(w,m0,A,b);
w = w - StepSize*grad; %update w, 固定步长
history(i,2) = norm(grad);
if history(i,2) < epsilon %Termination Criteria
break
end
if mod(i,500)==0 %设置输出语句方便调试
fprintf('Number of Iteration:\t%d\n',i);
end
end
history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
grad = Gradient(w,m0,A,b);
history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end
function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
alpha = 1e-4;
beta = 0.5;
StepSize = 1;
new_w = w + StepSize*direction;
while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
StepSize = StepSize * beta;
new_w = w + StepSize*direction;
end
end
function [history,i] = BacktrackingGradient(w,m0,A,b,Iteration,epsilon)
%回溯线搜索,最多迭代Iteration次,终止条件为梯度的二范数小于epsilon
history = zeros(Iteration+1,2);
for i = 1:Iteration
object = L(w,m0,A,b);
history(i,1) = object;
grad = Gradient(w,m0,A,b);
direction = -grad; %下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction; %update w
history(i,2) = norm(grad);
if history(i,2) < epsilon %终止条件(Termination Criteria)
break
end
if mod(i,500)==0 %设置输出语句方便调试
fprintf('Number of Iteration:\t%d\n',i);
end
end
history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
grad = Gradient(w,m0,A,b);
history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end
3. Question 2:
Matlab statement to load the a9a.test dataset:
dataset = 'a9a.test';
[b,A] = libsvmread(dataset);
To load the CINA.test and ijcnn1.test datasets, you only need to modify the first statement accordingly;
1. The second question (a):
result:
①a9a.test:
②CINA.test:
③ijcnn1.test:
code:
%读取数据集
dataset = 'a9a.test';
[b,A] = libsvmread(dataset);
[m,n] = size(A);
m0 = 1/m;
w = zeros(n,1);
i = zeros(6,1);
%为保持与第二题(b)的对齐,编号为1的history和i保留
fprintf('Exact Newton:\n');
[history2,i(2)] = ExactNewton(w,m0,n,A,b,1e-6);
%精度为1e-6的精确牛顿法
fprintf('result = %f\twith %d steps\n',history2(i(2)+1,1),i(2));
fprintf('Inexact Newton (Rule 1):\n');
[history3,i(3)] = InexactNewton(w,m0,n,A,b,1e-6,1000,1);
%精度为1e-6、CG最大迭代次数为1000、使用第1条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history3(i(3)+1,1),i(3));
fprintf('Inexact Newton (Rule 2):\n');
[history4,i(4)] = InexactNewton(w,m0,n,A,b,1e-6,1000,2);
%精度为1e-6、CG最大迭代次数为1000、使用第2条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history4(i(4)+1,1),i(4));
fprintf('Inexact Newton (Rule 3):\n');
[history5,i(5)] = InexactNewton(w,m0,n,A,b,1e-6,1000,3);
%精度为1e-6、CG最大迭代次数为1000、使用第3条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history5(i(5)+1,1),i(5));
fprintf('Exact Newton with higher accuracy:\n');
[history6,i(6)] = ExactNewton(w,m0,n,A,b,1e-12);
%精度为1e-12的精确牛顿法,以此近似最优值
fprintf('result = %f\twith %d steps\n',history6(i(6)+1,1),i(6));
%以更高精度的牛顿法的结果作为最优值的近似值
optimal = history6(i(6)+1,1);
%打印算法终止时的目标函数值以及迭代次数
fprintf(['Exact Newton:\t%6.5f\t(%d iteration)\n' ...
'Inexact Newton (Rule 1):\t%6.5f\t(%d iteration)\n' ...
'Inexact Newton (Rule 2):\t%6.5f\t(%d iteration)\n' ...
'Inexact Newton (Rule 3):\t%6.5f\t(%d iteration)\n' ...
'optimal:\t%6.5f\n'], ...
history2(i(2)+1,1),i(2), ...
history3(i(3)+1,1),i(3), ...
history4(i(4)+1,1),i(4), ...
history5(i(5)+1,1),i(5), ...
optimal);
%创建绘图窗口
figure
%子图1,绘制“log10(梯度范数)-迭代数”图
subplot(2,1,1);
plot(0:i(2),log10(history2(1:i(2)+1,2)),'b--*', ...
0:i(3),log10(history3(1:i(3)+1,2)),'r-.+', ...
0:i(4),log10(history4(1:i(4)+1,2)),'y-.s', ...
0:i(5),log10(history5(1:i(5)+1,2)),'g-.x', ...
'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',6);
%添加直线y=-6
line(0:i(5),-6*ones(i(5)+1,1));
%添加图例
legend('Exact Newton Method', ...
'Inexact Newton Method (Rule 1)', ...
'Inexact Newton Method (Rule 2)', ...
'Inexact Newton Method (Rule 3)');
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')
%子图2,绘制log10(残差)-迭代数图
subplot(2,1,2);
plot(0:i(2),log10(history2(1:i(2)+1,1)-optimal),'b--*', ...
0:i(3),log10(history3(1:i(3)+1,1)-optimal),'r-.+',...
0:i(4),log10(history4(1:i(4)+1,1)-optimal),'y-.s',...
0:i(5),log10(history5(1:i(5)+1,1)-optimal),'g-.x',...
'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',6);
%添加图例
legend('Exact Newton Method', ...
'Inexact Newton Method (Rule 1)', ...
'Inexact Newton Method (Rule 2)', ...
'Inexact Newton Method (Rule 3)');
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')%添加坐标轴标签
ylabel('log10(Residuals of objective value)')
function z = Sigmoid(z)
%Sigmoid函数
z = 1./(1 + exp(-z));
end
function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
object = m0*(sum(log(1+exp(-b.*(A*w))))+(1/100)*(w'*w));
end
function grad = Gradient(w,m0,A,b)
%目标函数的梯度
grad = m0*(-A'*(b.*(1-Sigmoid(b.*(A*w))))+(1/50)*w);
end
function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵
sigmoid = Sigmoid(b.*(A*w));
vector = sigmoid.*(1-sigmoid);
%实际上应是b.*sigmoid.*(1-sigmoid).*b
%但由于b的元素只能是±1,故左右点乘b是不发生任何影响,故可写作sigmoid.*(1-sigmoid)
repeat_vector = repmat(vector',n,1);
%在行维度和列维度上分别重复vector的转置n次和1次,构造nxm的矩阵repeat_vector
%用repeat_vector.*A'*A代替A'*diag(vector)*A,如果m>n的话可以节省空间
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低
hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1)));
end
function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
alpha = 1e-4;
beta = 0.5;
StepSize = 1;
new_w = w + StepSize*direction;
while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
StepSize = StepSize * beta;
new_w = w + StepSize*direction;
end
end
function [history,i] = ExactNewton(w,m0,n,A,b,epsilon)
%精确牛顿法,精度要求为epsilon
history = zeros(1001,2);
i = 0;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
direction = -hess\grad; %精确下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm(grad);
fprintf('%d iteration done!\n',i+1);
while history(i+1,2) >= epsilon
i = i + 1;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
direction = -hess\grad; %精确下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm(grad);
fprintf('%d iteration done!\n',i+1); %设置输出语句方便调试
end
end
function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
CG_tol = zeros(3,1);
CG_tol(1) = min(0.5,ng)*ng;
CG_tol(2) = min(0.5,sqrt(ng))*ng;
CG_tol(3) = 0.5*ng;
end
function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
x = 0;
CG_tol = InexactRule(ng);
r = g;
p = -r;
for iter = 1:CG_MaxIter
rr = r' * r;
Ap = A * p;
alpha = rr / (p'*Ap);
x = x + alpha * p;
r = r + alpha * Ap;
nrl = norm(r);
if nrl <= CG_tol(Rule)
break;
end
beta = nrl^2 / rr;
p = -r + beta * p;
end
fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end
function [history,i] = InexactNewton(w,m0,n,A,b,epsilon,CG_MaxIter,Rule)
%非精确牛顿法,精度要求为epsilon,每次迭代中CG的最大迭代次数为CG_MaxIter
%Rule为使用的inexact rule的编号
history = zeros(1001,2);
i = 0;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
norm_grad = norm(grad);
direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm_grad;
while history(i+1,2) >= epsilon
i = i + 1;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
norm_grad = norm(grad);
direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm_grad;
end
end
2. Question 2(b):
result:
①a9a.test:
②CINA.test:
③ijcnn1.test
code:
(The difference from the code in (a) is mainly the previous input and drawing instructions, and the BacktrackingGradient function is added)
%读取数据集
dataset = 'a9a.test';
[b,A] = libsvmread(dataset);
[m,n] = size(A);
m0 = 1/m;
w = zeros(n,1);
i = zeros(6,1);
fprintf('Backtracking Gradient:\n');
[history1,i(1)] = BacktrackingGradient(w,m0,A,b,1000);
%未设置精度要求的回溯线搜索,迭代1000次
fprintf('result = %f\twith %d steps\n',history1(i(1)+1,1),i(1));
fprintf('Exact Newton:\n');
[history2,i(2)] = ExactNewton(w,m0,n,A,b,1e-6);
%精度为1e-6的精确牛顿法
fprintf('result = %f\twith %d steps\n',history2(i(2)+1,1),i(2));
fprintf('Inexact Newton (Rule 1):\n');
[history3,i(3)] = InexactNewton(w,m0,n,A,b,1e-6,1000,1);
%精度为1e-6、CG最大迭代次数为1000、使用第1条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history3(i(3)+1,1),i(3));
fprintf('Inexact Newton (Rule 2):\n');
[history4,i(4)] = InexactNewton(w,m0,n,A,b,1e-6,1000,2);
%精度为1e-6、CG最大迭代次数为1000、使用第2条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history4(i(4)+1,1),i(4));
fprintf('Inexact Newton (Rule 3):\n');
[history5,i(5)] = InexactNewton(w,m0,n,A,b,1e-6,1000,3);
%精度为1e-6、CG最大迭代次数为1000、使用第3条规则的非精确牛顿法
fprintf('result = %f\twith %d steps\n',history5(i(5)+1,1),i(5));
fprintf('Exact Newton with higher accuracy:\n');
[history6,i(6)] = ExactNewton(w,m0,n,A,b,1e-12);
%精度为1e-12的精确牛顿法,以此近似最优值
fprintf('result = %f\twith %d steps\n',history6(i(6)+1,1),i(6));
%以更高精度的牛顿法的结果作为最优值的近似值
optimal = history6(i(6)+1,1);
%打印算法终止时的目标函数值以及迭代次数
fprintf(['Backtracking Line Search:\t%f\t(%d iteration)\n'...
'Exact Newton:\t%f\t(%d iteration)\n' ...
'Inexact Newton (Rule 1):\t%f\t(%d iteration)\n' ...
'Inexact Newton (Rule 2):\t%f\t(%d iteration)\n' ...
'Inexact Newton (Rule 3):\t%f\t(%d iteration)\n' ...
'optimal:\t%f\n'], ...
history1(i(1)+1,1),i(1), ...
history2(i(2)+1,1),i(2), ...
history3(i(3)+1,1),i(3), ...
history4(i(4)+1,1),i(4), ...
history5(i(5)+1,1),i(5), ...
optimal);
%创建绘图窗口
figure
%子图1,绘制牛顿法“log10(梯度范数)-迭代数”图
subplot(2,2,1);
plot(0:i(2),log10(history2(1:i(2)+1,2)),'b--*', ...
0:i(3),log10(history3(1:i(3)+1,2)),'r-.+', ...
0:i(4),log10(history4(1:i(4)+1,2)),'y-.s', ...
0:i(5),log10(history5(1:i(5)+1,2)),'g-.x', ...
'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',5);
%添加直线y=-6
line(0:i(5),-6*ones(i(5)+1,1));
%添加图例
legend('Exact Newton Method', ...
'Inexact Newton Method (Rule 1)', ...
'Inexact Newton Method (Rule 2)', ...
'Inexact Newton Method (Rule 3)')
%去掉图例的边线
legend('boxoff')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')
%子图2,绘制梯度法(回溯线搜索)“log10(梯度范数)-迭代数”图
subplot(2,2,2);
plot(0:i(1),log10(history1(1:i(1)+1,2)));
%添加图例,并去掉图例边线
legend('Backtracking Line Search')
legend('boxoff')
%添加标题
title('log10(Euclidean norm of gradient)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Euclidean norm of gradient)')
%子图3,绘制牛顿法“log10(目标函数与最优值残差)-迭代数”图
subplot(2,2,3);
plot(0:i(2),log10(history2(1:i(2)+1,1)-optimal),'b--*', ...
0:i(3),log10(history3(1:i(3)+1,1)-optimal),'r-.+',...
0:i(4),log10(history4(1:i(4)+1,1)-optimal),'y-.s',...
0:i(5),log10(history5(1:i(5)+1,1)-optimal),'g-.x',...
'LineWidth',2 ,'MarkerEdgeColor','k','MarkerSize',5);
%添加图例
legend('Exact Newton Method', ...
'Inexact Newton Method (Rule 1)', ...
'Inexact Newton Method (Rule 2)', ...
'Inexact Newton Method (Rule 3)')
%去掉图例的边线
legend('boxoff')
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Residuals of objective value)')
%子图4,绘制梯度法(回溯线搜索)“log10(目标函数与最优值残差)-迭代数”图
subplot(2,2,4);
plot(0:i(1),log(history1(1:i(1)+1,1)-optimal));
%添加图例,并去掉图例边线
legend('Backtracking Line Search')
legend('boxoff')
%添加标题
title('log10(Residuals of objective value)------Number of iterations')
%添加坐标轴标签
xlabel('Number of iterations')
ylabel('log10(Residuals of objective value)')
function z = Sigmoid(z)
%Sigmoid函数
z = 1./(1 + exp(-z));
end
function object = L(w,m0,A,b)
%逻辑回归(Logistic Regression)问题的目标函数
object = m0*(sum(log(1+exp(-b.*(A*w))))+(1/100)*(w'*w));
end
function grad = Gradient(w,m0,A,b)
%目标函数的梯度
grad = m0*(-A'*(b.*(1-Sigmoid(b.*(A*w))))+(1/50)*w);
end
function hess = Hessian(w,m0,n,A,b)
%目标函数的海塞矩阵
sigmoid = Sigmoid(b.*(A*w));
vector = sigmoid.*(1-sigmoid);
%实际上应是b.*sigmoid.*(1-sigmoid).*b
%但由于b的元素只能是±1,故左右点乘b是不发生任何影响,故可写作sigmoid.*(1-sigmoid)
repeat_vector = repmat(vector',n,1);
%在行维度和列维度上分别重复vector的转置n次和1次,构造nxm的矩阵repeat_vector
%用repeat_vector.*A'*A代替A'*diag(vector)*A,如果m>n的话可以节省空间
%而且将一次矩阵乘法变为矩阵点乘,时间复杂度也降低
hess = m0*(repeat_vector.*A'*A + diag((1/50)*ones(n,1)));
end
function StepSize = Backtracking(w,m0,A,b,object,grad,direction)
%回溯法求步长,即找到一个步长使得“新的目标函数值≤原目标函数值+α*步长*<下降方向,梯度>"
%α为一个小的正数(这里取α=1e-4),direction是下降方向,< , >代表向量内积
%找法是从一个较大的步长开始(这里从1开始),每次迭代乘以β,β∈(0,1)(这里取β=0.5),相当于不断往前摸索
%所谓“回溯”可能就是来源于此,相较于精确法确定步长(即解出最优步长),回溯法确定步长更容易实现
%精确线搜索(Exact line search)中使用的步长是下降方向上最优的步长
alpha = 1e-4;
beta = 0.5;
StepSize = 1;
new_w = w + StepSize*direction;
while L(new_w,m0,A,b) > (object + alpha * StepSize * (direction' * grad))
StepSize = StepSize * beta;
new_w = w + StepSize*direction;
end
end
function [history,i] = BacktrackingGradient(w,m0,A,b,Iteration)
%回溯线搜索,迭代Iteration次
history = zeros(Iteration+1,2);
for i = 1:Iteration
object = L(w,m0,A,b);
history(i,1) = object;
grad = Gradient(w,m0,A,b);
direction = -grad;
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction; %update w
history(i,2) = norm(grad);
if mod(i,100)==0
fprintf('Number of Iteration:\t%d\n',i); %设置输出语句方便调试
end
end
history(i+1,1) = L(w,m0,A,b); %若总共迭代i次,则会将i+1个目标函数值写入history的第一列
grad = Gradient(w,m0,A,b);
history(i+1,2) = norm(grad); %将i+1个梯度范数值写入history的第二列
end
function [history,i] = ExactNewton(w,m0,n,A,b,epsilon)
%精确牛顿法,精度要求为epsilon
history = zeros(1001,2);
i = 0;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
direction = -hess\grad; %精确下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm(grad);
fprintf('%d iteration done!\n',i+1);
while history(i+1,2) >= epsilon
i = i + 1;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
direction = -hess\grad; %精确下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm(grad);
fprintf('%d iteration done!\n',i+1); %设置输出语句方便调试
end
end
function CG_tol = InexactRule(ng)
%共轭梯度法的inexact rule
CG_tol = zeros(3,1);
CG_tol(1) = min(0.5,ng)*ng;
CG_tol(2) = min(0.5,sqrt(ng))*ng;
CG_tol(3) = 0.5*ng;
end
function x = CG(A,g,ng,CG_MaxIter,Rule)
%共轭梯度法,解方程Ax=-g,ng为g的二范数,Rule为使用的inexact rule的编号
%对于牛顿法,A=Hessian,g=gradient,ng=norm(gradient,2),x为近似牛顿方向
x = 0;
CG_tol = InexactRule(ng);
r = g;
p = -r;
for iter = 1:CG_MaxIter
rr = r' * r;
Ap = A * p;
alpha = rr / (p'*Ap);
x = x + alpha * p;
r = r + alpha * Ap;
nrl = norm(r);
if nrl <= CG_tol(Rule)
break;
end
beta = nrl^2 / rr;
p = -r + beta * p;
end
fprintf('Number of CG iteration:\t%d\n',iter); %设置输出语句方便调试
end
function [history,i] = InexactNewton(w,m0,n,A,b,epsilon,CG_MaxIter,Rule)
%非精确牛顿法,精度要求为epsilon,每次迭代中CG的最大迭代次数为CG_MaxIter
%Rule为使用的inexact rule的编号
history = zeros(1001,2);
i = 0;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
norm_grad = norm(grad);
direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm_grad;
while history(i+1,2) >= epsilon
i = i + 1;
object = L(w,m0,A,b);
history(i+1,1) = object;
grad = Gradient(w,m0,A,b);
hess = Hessian(w,m0,n,A,b);
norm_grad = norm(grad);
direction = CG(hess,grad,norm_grad,CG_MaxIter,Rule); %CG法计算近似下降方向
StepSize = Backtracking(w,m0,A,b,object,grad,direction); %回溯法确定步长
w = w + StepSize*direction;
history(i+1,2) = norm_grad;
end
end