1 One-vs-All

One-vs-All 算法用于处理多元分类问题。前面已经介绍了逻辑回归算法，根据预先设立的阈值即可实现二元分类。
面对多元分类问题，处理思路和二元分类基本一致。
OvA 算法思路为：

假设当前有 m 个分类，那么应该训练得到 m 个逻辑回归模型。
对于其中第 i 个模型，它在做一个二元分类：区分接受数据是否属于第 i 个分类。
对于单个输入数据，m 个模型一共得到 m 个预测结果，即 m 个 $ h_\theta(x) $： $$ h_\theta^{(i)}(x) = P{y=i|x;\theta},:i=1,2,...,m$$
挑选 m 个$ h_\theta(x) $中数值最大的那个分类，作为最终的预测结果。

对应代码如下：

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];

for c = 1: num_labels
    % 对于每一个循环，都把非当前 label 的 label 设置为0，构造一个二元分类问题
    % 查看可得，label 的取值为 [1:10]，此处共十个分类
    
    initial_theta = zeros(n+1, 1);
    options = optimset('GradObj', 'on', 'MaxIter', 50);
    % 迭代求得 theta
    [theta] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                      initial_theta, options);
    all_theta(c, :) = theta;
end

end

使用 OvA 方法进行预测：

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 

m = size(X, 1);
% num_labels = size(all_theta, 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];

p = sigmoid(X * all_theta');  % 得到一个 5000*10 的矩阵，表示5000行数据的预测结果
[max_num, max_col] = max(p, [], 2);  % 找到每一行中最大值和对应的col
p = max_col;

end

2 Neural Network

对于非线性问题，前章有介绍可以应用 Polynomial 的方法。但这种方法有一个问题就是：
随着 n 的增加，衍生出的 features 呈几何倍数增长，因而难度和计算量都将急剧增大。
接下来介绍一个可以很好处理这种情况的算法：神经网络 Neural Network。
课上所述关于神经网络的知识可以参考：http://www.ai-start.com/ml2014/html/week4.html

这里简单介绍：
对于神经网络算法，每一层都在做一个逻辑回归。不同之处在于，第 j+1 层以第 j 层的输出作为输入，从而得到灵活、复杂的模型，可应用于非线性问题求解。
一个三层的神经网络，包含：Layer 1(输入层)，Layer 2(隐藏层)，Layer 3(输出层)。
Layer 1 即最初输入的 X，与相应的 theta 矩阵相乘后，得到 Layer 2。
Layer 2 加上偏差单位后，与相应的 theta 矩阵相乘，得到 Layer 3。
Layer 3即为模型输出的部分，在此基础上设立阈值或者进行多元分类处理。

对应代码如下：

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

m = size(X, 1);

% 这个神经网络分为三层，一共两次模型输入
% 从 j 层推广至 j+1 层时，为其加上一个 Bias=1
% 神经网络每一层都在做逻辑回归

% 第一次
X1 = [ones(m, 1) X];
a1 = sigmoid(X1 * Theta1');
% 第二次
X2 = [ones(m, 1), a1];
a2 = sigmoid(X2 * Theta2');
% 输出
[max_num, max_col] = max(a2, [], 2);  % 找到每一行中最大值和对应的col
p = max_col;

end

Machine Learning by Ng - 编程作业3

1 One-vs-All

2 Neural Network

猜你喜欢