Machine Learning (20) SVM Example 1: [Octave] solves binary classification (linear SVM)

1 Visualization of training set ex6data1.mat

Plot the training set function plotData.m

function plotData(X, y)

% plotData(X, y) plots data,

% Input parameters: X input characteristic matrix, the number of rows is the number of samples, the number of columns is 2, each row is a two-dimensional point

% y outputs a feature vector, the number of rows is the number of samples, the number of columns is 1, and the value of each element is 0 or 1.

% Output result: If the output characteristic vector of the sample is 1, it will be marked with '+', and if it is 0, it will be marked with 'o'

%

% Create New Figure

figure; hold on;

% Find Indices of Positive and Negative Examples

pos = find(y == 1); neg = find(y == 0);

% Plot Examples

plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 1, 'MarkerSize', 5);

plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y','MarkerSize', 5);

hold off;

endfunction

The first part of the code of the training set visualization script LinearSVM.m

%% Initialization

clear ; close all; clc

%% =============== Part 1: Loading and visualizing data================

fprintf('Loading and Visualizing Data ...\n')

% Loading from the file ex6data1.mat, you will find that there are X and y variable values in the environment:

load('ex6data1.mat');

% Plot the training set data

plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');

pause;

Results of the

It can be seen that the data set here is divided into two categories, and there is an outlier point (0.1, 4.1), so this deviation point will affect the decision boundary of SVM.

2 SVM linear kernel linearKernel.m

function sim = linearKernel(x1, x2)

% linearKernel(x1, x2) returns a linear kernel between x1 and x2,

% Input parameters: x1, x2 column vectors. Ensure that x1 and x2 are column vectors with consistent dimensions

% Return value: sim dot product of column vectors x1, x2

%

x1 = x1(:); x2 = x2(:);

% Compute the kernel

sim = x1' * x2; % dot product

end

3 Training function svmTrain.m

function [model] = svmTrain(X, Y, C, kernelFunction, ...

                            tol, max_passes)

% svmTrain trains an SVM classifier using a simplified version of the SMO algorithm.

% Input: X training matrix, the number of rows is the number of samples, and the number of columns is the number of input features

% y The output feature vector of the training set is a column vector containing 1 and 0. The number of rows is the number of samples and the number of columns is 1.

% C Standard SVM regularization parameters.

% kernelFunction Kernel function for training

% tol is a tolerance value used to determine equal floating point numbers.

% max_passes controls the number of iterations over the data set before the algorithm exits (without changing the alpha value).

%Return value: model training model

%

%Note: This is a simplified version of the SMO algorithm used to train support vector machines.

% In practice, if you want to train an SVM classifier, we recommend using an optimized package, for example:

% LIBSVM (http://www.csie.ntu.edu.tw/ ~ cjlin / LIBSVM /)

%       SVMLight (http://svmlight.joachims.org/)

%Set the default tol value and max_passes

if ~exist('tol', 'var') || isempty(tol)

    tol = 1e-3;

endif

if ~exist('max_passes', 'var') || isempty(max_passes)

    max_passes = 5;

endif

% Parameters of the training set

m = size(X, 1); % number of samples

n = size(X, 2); % number of input features

% Replace the elements with value 0 in the output feature vector with -1

Y(Y==0) = -1;

% Initialize variables in the training model

alphas = zeros(m, 1);

b = 0;

E = zeros(m, 1);

passes = 0;

and = 0;

L = 0;

H = 0;

% Precompute the kernel matrix since our data set is small

% (In practice, use optimized SVM packages that can handle large data sets gracefully instead of using this code)

% We have implemented an optimized version of the vectorized kernel here to make svm training run faster.

if strcmp(func2str(kernelFunction), 'linearKernel')

    % Vectorized computation of linear kernels.

    % This is equivalent to computing the kernel on each pair of examples

    K = X*X';

elseif strfind(func2str(kernelFunction), 'gaussianKernel')

    % Vectorized RBF kernel

    % This is equivalent to computing the kernel on each pair of examples

    X2 = sum(X.^2, 2);

    K = bsxfun(@plus, X2, bsxfun(@plus, X2', - 2 * (X * X')));

    K = kernelFunction(1, 0) .^ K;

else

    % Precompute kernel matrix

    % The following operation may be slow due to lack of vectorization

    K = zeros(m);

    for i = 1:m

        for j = i:m

             K(i,j) = kernelFunction(X(i,:)', X(j,:)');

             K(j,i) = K(i,j); %the matrix is symmetric

        endfor

    endfor

endif

% train

fprintf('\nTraining ...');

dots = 12;

while passes < max_passes,

    num_changed_alphas = 0;

    for i = 1:m,

      % Use (2) to calculate Ei = f(x(i)) - y(i). .

      % E(i)=b +sum (X(i, :)*(repmat(alphas.*Y,1,n).*X)')- Y(i);

      E(i) = b + sum (alphas.*Y.*K(:,i)) - Y(i);

      if ((Y(i)*E(i)<-tol && alphas(i)< C) || (Y(i)*E(i)>tol && alphas(i)>0)),

        %In practice, many heuristics can be used to choose i and j.

        % In this simplified code we select them randomly.

        j = ceil(m * rand());

        while j == i, % Make sure i \neq j

          j = ceil(m * rand());

        old man

        % Use (2) to calculate Ej = f(x(j)) - y(j).

        E(j) = b + sum (alphas.*Y.*K(:,j)) - Y(j);

        % save old alphas

        alpha_i_old = alphas(i);

        alpha_j_old = alphas(j);

        % Calculate L and H via (10) or (11).

        if (Y(i) == Y(j)),

          L = max(0, alphas(j) + alphas(i) - C);

          H = min(C, alphas(j) + alphas(i));

        else

          L = max(0, alphas(j) - alphas(i));

          H = min(C, C + alphas(j) - alphas(i));

        endif

        if (L == H),

          % continue to next i.

          continue;

        endif

        % Calculate eta through (14).

        eta = 2 * K(i,j) - K(i,i) - K(j,j);

        if (eta >= 0),

          % continue to next i.

          continue;

        endif

        % Calculate and clip the new value of alphas using (12) and (15).

        alphas(j) = alphas(j) - (Y(j) * (E(i) - E(j))) / eta;

        % clip

        alphas(j) = min (H, alphas(j));

        alphas(j) = max (L, alphas(j));



        % Check whether the change in alphas is significant

        if (abs(alphas(j) - alpha_j_old) < tol),

          % continue to next i.

          % replace anyway

          alphas(j) = alpha_j_old;

          continue;

        endif



        % Use (16) to determine the value of alpha i.

        alphas(i) = alphas(i) + Y(i)*Y(j)*(alpha_j_old - alphas(j));



        % Calculate b1 and b2 using (17) and equation (18) respectively.

        b1 = b - E(i) ...

               - Y(i) * (alphas(i) - alpha_i_old) * K(i,j)' ...

               - Y(j) * (alphas(j) - alpha_j_old) * K(i,j)';

        b2 = b - E(j) ...

               - Y(i) * (alphas(i) - alpha_i_old) * K(i,j)' ...

               - Y(j) * (alphas(j) - alpha_j_old) * K(j,j)';

        % Calculate b using (19).

        if (0 < alphas(i) && alphas(i) < C),

          b = b1;

        elseif (0 < alphas(j) && alphas(j) < C),

          b = b2;

        else

          b = (b1+b2)/2;

        endif

        num_changed_alphas = num_changed_alphas + 1;

      endif

    endfor

    if (num_changed_alphas == 0),

      passes = passes + 1;

    else

      passes = 0;

    endif

    fprintf('.');

    dots = dots + 1;

    if dots > 78

      dots = 0;

      fprintf('\n');

    endif

    if exist('OCTAVE_VERSION')

      fflush(stdout);

    endif

old man

fprintf(' Done! \n\n');

% Save the model

idx = alphas > 0;

model.X= X(idx,:);

model.y= Y(idx);

model.kernelFunction = kernelFunction;

model.b= b;

model.alphas= alphas(idx);

model.w = ((alphas.*Y)'*X)';

endfunction

4 Visualize linear SVM decision boundary function visualizeBoundaryLinear.m

function visualizeBoundaryLinear(X, y, model)

% visualizeBoundaryLinear draws the linear decision boundary learned by the support vector machine

% Input: X training matrix, the number of rows is the number of samples, and the number of columns is the number of input features

% y The output feature vector of the training set is a column vector containing 1 and 0. The number of rows is the number of samples and the number of columns is 1.

% model The model obtained by the svmTrain function

w = model.w;

b = model.b;

xp = linspace(min(X(:,1)), max(X(:,1)), 100);

yp = - (w(1)*xp + b)/w(2);

plotData(X, y);

hold on;

plot(xp, yp, '-b');

hold off

endfunction

5 The second part of the linear SVM script LinearSVM.m code

%% ==================== Part 2: Linear SVM training ====================

% The following code will train a linear support vector machine on the dataset and plot the learned decision boundaries

% Loading from the file ex6data1.mat, you will find that there are X and y variable values in the environment:

load('ex6data1.mat');

fprintf('\nTraining Linear SVM ...\n')

% You should try changing the C value below to see how the decision boundary changes (for example, try C = 1000)

C = 1;

model = svmTrain(X, y, C, @linearKernel, 1e-3, 20);

visualizeBoundaryLinear(X, y, model);

fprintf('Program paused. Press enter to continue.\n');

pause;

load('ex6data1.mat');

fprintf('\nTraining Linear SVM ...\n')

% You should try changing the C value below to see how the decision boundary changes (for example, try C = 1000)

C = 100;

model = svmTrain(X, y, C, @linearKernel, 1e-3, 20);

visualizeBoundaryLinear(X, y, model);

6 Results of executing LinearSVM.m script

Here, the script first loads the classification situation when C=1. You can see that the deviation point is below the decision boundary and the effect is not good (left side of the picture below). After changing C=100 in the code, the output is on the right side of the picture below. C=100 is equivalent to reducing the regularization coefficient λ and increasing the weight of the first half of the feature x in the cost function. Therefore, you can see that the deviation point is already above the boundary.

In the picture on the left, C = 1, SVM places the decision boundary between the intervals of the two data, and incorrectly divides the data points further to the left.

In the image on the right, with C = 100, the SVM now correctly classifies all samples, but this decision boundary does not fit the data naturally.