1、Multi-class Classification
如果将这个题转换为神经网络,相当于这个模型只有两层:输入层和输出层,输入层由400个神经元(像素)组成,输出层由10个神经元组成,输出层的神经元编号为1到10,分别表示1到9和0(10表示0),每个神经元输出结果是预测输入图像是该神经元编号的概率,选取概率最大的神经元编号作为预测的数字。
1.3 Vectorizing Logistic Regression
function [J, grad] = lrCostFunction(theta, X, y, lambda)
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
J = (-y' * log(sigmoid(X * theta)) - (1 - y)' * log(1 - sigmoid(X * theta))) / m ...
+ lambda / 2 / m * sum(theta(2 : end) .^ 2);
temp = theta;
temp(1) = 0;
grad = (X' * (sigmoid(X * theta) - y) + lambda * temp) / m;
grad = grad(:);
end
1.4 One-vs-all Classification
函数fmincg与第三周里使用的fminunc类似,参考第三周编程作业https://blog.csdn.net/hugh___/article/details/81736271
注意调用fmincg时,theta的初始值为 all_theta(c,:)' ,有转置。
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
% Some useful variables
m = size(X, 1); % 行数 5000
n = size(X, 2); % 列数 400
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1); % 10 * 401
% Add ones to the X data matrix
X = [ones(m, 1) X]; % 5000 * 401
cost = 0;
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c = 1 : num_labels
% 传入 lrCostFunction() 里的 theta 是列向量,所以 all_theta(c,:)'
all_theta(c,:) = fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)), all_theta(c,:)', options)';
endfor
end
1.4.1 One-vs-all Prediction
function p = predictOneVsAll(all_theta, X)
m = size(X, 1);
num_labels = size(all_theta, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
% X: 5000 * 401, all_theta: 10 * 401
% X * all_theta': 5000 * 10
[maxx, p] = max(X * all_theta', [], 2);
end
2 Neural Networks
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
X = [ones(m, 1) X];
a2 = [ones(m, 1) sigmoid(X * Theta1')];
%size(a2)
[maxx, p] = max(sigmoid(a2 * Theta2'), [], 2);
end