Machine Learning - SVM (Support Vector Machine) and Face Recognition

    For details of Yiru’s complete project/code, please refer to github: https://github.com/yiru1225 (reprinted and marked with the source, do not star for projects thanks)

Table of contents

Series Article Directory

1. The concept and principle of SVM

1. Introduction to SVM

2. Basic process of SVM

3. The promotion of SVM in multi-classification

2. Classic SVM is applied to image recognition and classification

3. SVM is applied to face recognition

1. Pretreatment

1.1 Data import and processing

1.2 Data Dimensionality Reduction

2. Face recognition

2.1 OVO implements SVM

2.2 OVR implements SVM

3. PCA and LDA respectively combined with SVM in each dimension, the face recognition rate of each data set and its comparison with KNN

3.1 Comparison of SVM and KNN under ORL for PCA and LDA face recognition rate changes with dimensions

3.2 The average face recognition rate of various methods under different data sets

4. Comparison of face recognition rate of SVM realized by OVO and OVR

4.1 Comparison of OVO and OVR under ORL to realize the face recognition rate of SVM + PCA with the change of dimension                      

4.2 OVO and OVR realize the average face recognition rate of SVM + PCA under different data sets           

4. Innovative SVM algorithm design

5. Others

1. Datasets and resources

2. References

Summarize


Series Article Directory

This series of blogs focuses on the concepts, principles and code practices of machine learning, and does not include tedious mathematical derivations (if you have any questions, please discuss and point them out in the comment area, or contact me directly by private message).

The code can be copied in full, and     it makes sense for everyone to understand the principle and process to reproduce it! ! !

Chapter 1  Machine Learning - PCA (Principal Component Analysis) and Face Recognition

Chapter 2  Machine Learning - LDA (Linear Discriminant Analysis) and Face Recognition_@李忆如的博客-CSDN博客

Chapter 3  Machine Learning - LR (Linear Regression), LRC (Linear Regression Classification) and Face Recognition_@李梦如的博客

Chapter 4 Machine Learning - SVM (Support Vector Machine) and Face Recognition


synopsis

This blog mainly introduces the SVM (Support Vector Machine) algorithm, uses the classic SVM to realize the binary classification of image recognition, and uses SVM (both OVO and OVR) + dimensionality reduction methods (PCA and LDA) for face recognition, and uses SVM Compare with KNN, compare OVO with OVR (enclosed data set and matlab, python code)


1. The concept and principle of SVM

1. Introduction to SVM

A Support Vector Machine (SVM) is a supervised learning model and associated learning algorithm for analyzing data in classification and regression . It is essentially a binary classification model. Its classic model is a linear classifier with the largest interval defined in the feature space, and the learning strategy is to maximize the interval .

2. Basic process of SVM

SVM models represent instances as points in space (hyperplane, hyperspace) such that instances of individual classes are separated by as wide a distinct separation as possible. Then, new instances are mapped into the same space, and the category they belong to is predicted based on which side of the interval they fall on.

3. The promotion of SVM in multi-classification

Since SVM is a binary classifier, which uses a line or a plane to separate two different types of data, the following two methods are generally used to extend it to multi-classification.

① OVR: For the case of K categories, K SVMs are trained, and the jth SVM is used to judge whether any piece of data belongs to category j or category non-j. When predicting, the one with the largest distance from hyperspace i indicates that the given data x belongs to category i.

② OVO: For the case of K categories, train K*(K-1)/2 SVMs, and each SVM is only used to judge that any piece of data belongs to two specific categories in K. When predicting, use K*(K-1)/2 SVMs to make K*(K-1)/2 predictions, use the method of counting votes to determine which category the data is classified into the most, and consider the data x to belong to this category.

Tips: The detailed derivation of SVM can be seen: Detailed derivation of svm principle_weixin_42001089's blog-CSDN blog_svm derivation

2. Classic SVM is applied to image recognition and classification

① Problem description: Identify whether the picture belongs to the forest or the room

② The core of the algorithm implementation: use mapminmax to normalize the divided data set, use the function fitcsvm to train the normalized training set, obtain a and find the support vector by solving the quadratic programming and preparing parameters, and use the newly trained predict For model prediction, find b to get f(x) and predicted category.

③ The code is as follows:

clear;
% dataset是将bedroom和forest合并;dataset = [bedroom;forset];这行代码可以实现合并
load forest.mat                  %导入要分类的数据集
load bedroom.mat
dataset = [bedroom;MITforest];
load labelset.mat                %导入分类集标签集

% 选定训练集和测试集

% 将第一类的1-5,第二类的11-15做为训练集
train_set = [dataset(1:5,:);dataset(11:15,:)];
% 相应的训练集的标签也要分离出来
train_set_labels = [lableset(1:5);lableset(11:15)];
% 将第一类的6-10,第二类的16-20,做为测试集
test_set = [dataset(6:10,:);dataset(16:20,:)];
% 相应的测试集的标签也要分离出来
test_set_labels = [lableset(6:10);lableset(16:20)];
 
% 数据预处理,将训练集和测试集归一化到[0,1]区间
 
[mtrain,ntrain] = size(train_set);
[mtest,ntest] = size(test_set);
 
test_dataset = [train_set;test_set];
% mapminmax为MATLAB自带的归一化函数
[dataset_scale,ps] = mapminmax(test_dataset',0,1);
dataset_scale = dataset_scale';
 
train_set = dataset_scale(1:mtrain,:);
test_set = dataset_scale( (mtrain+1):(mtrain+mtest),: );
 
% SVM网络训练
model = fitcsvm(train_set,train_set_labels);
 
% SVM网络预测
[predict_label] = predict(model,test_set);
%[predict_label] = model.IsSupportVector;

 
% 结果分析
 
% 测试集的实际分类和预测分类图
% 通过图可以看出只有一个测试样本是被错分的
figure;
hold on;
plot(test_set_labels,'o');
plot(predict_label,'r*');
xlabel('测试集样本','FontSize',12);
ylabel('类别标签','FontSize',12);
legend('实际测试集分类','预测测试集分类');
title('测试集的实际分类和预测分类图','FontSize',12);
grid on;

④ The recognition effect is as follows:

 Analysis: Analysis of the above figure shows that the predicted classification is equal to the actual classification, and the prediction effect is very good.

3. SVM is applied to face recognition

1. Pretreatment

1.1 Data import and processing

Use imread to import the face database in batches, or directly load the corresponding mat file, and continuously pull the faces into column vectors to form reshaped_faces when importing, and take out n% as test data, and the remaining 100-n% as training data, Repeat this step to abstract the imported data into a frame, which can match the import of different data sets (the experimental frame is suitable for ORL, AR, and FERET data sets).

Tips: The code can be seen in the second article of this series (LDA and face recognition), which is basically the same.

1.2  Data Dimensionality Reduction

Use LDA or PCA to reduce the data dimension (see one or two articles in this series for specific principles and detailed codes)

% LDA
% 算每个类的平均
k = 1; 
class_mean = zeros(dimension, people_num); 
for i=1:people_num
    % 求一列(即一个人)的均值
    temp = class_mean(:,i);
    % 遍历每个人的train_pic_num_of_each张用于训练的脸,相加算平均
    for j=1:train_pic_num_of_each
        temp = temp + train_data(:,k);
        k = k + 1;
    end
    class_mean(:,i) = temp / train_pic_num_of_each;
end

% 算类类间散度矩阵Sb
Sb = zeros(dimension, dimension);
all_mean = mean(train_data, 2); % 全部的平均
for i=1:people_num
    % 以每个人的平均脸进行计算,这里减去所有平均,中心化
    centered_data = class_mean(:,i) - all_mean;
    Sb = Sb + centered_data * centered_data';
end
Sb = Sb / people_num;

% 算类内散度矩阵Sw
Sw = zeros(dimension, dimension);
k = 1; % p表示每一张图片
for i=1:people_num % 遍历每一个人
    for j=1:train_pic_num_of_each % 遍历一个人的所有脸计算后相加
        centered_data = train_data(:,k) - class_mean(:,i);
        Sw = Sw + centered_data * centered_data';
        k = k + 1;
    end
end
Sw = Sw / (people_num * train_pic_num_of_each);

% 目标函数一:经典LDA(伪逆矩阵代替逆矩阵防止奇异值)
% target = pinv(Sw) * Sb;

% PCA
centered_face = (train_data - all_mean);
cov_matrix = centered_face * centered_face';
target = cov_matrix;

% 求特征值、特征向量
[eigen_vectors, dianogol_matrix] = eig(target);
eigen_values = diag(dianogol_matrix);

% 对特征值、特征向量进行排序
[sorted_eigen_values, index] = sort(eigen_values, 'descend'); 
eigen_vectors = eigen_vectors(:, index);
eigen_vectors = real(eigen_vectors); % 处理复数,会导致一定误差(LDA用)

2. Face recognition

Realize the core process: use PCA/LDA to preprocess and reduce the dimensionality of the data first, SVM uses OVO/OVR to perform model training and test set prediction on the divided data set, and finally record the recognition rate and compare it with the previous one.

2.1 OVO implements SVM

%使用SVM人脸识别
    % SVM(OVO)
rate = []; %用于记录人脸识别率
for i=10:10:160
    right_num = 0;
    % 降维得到投影矩阵
    project_matrix = eigen_vectors(:,1:i);
    projected_train_data = project_matrix' * (train_data - all_mean);
    projected_test_data = project_matrix' * (test_data - all_mean);

    % SVM训练过程
           model_num = 1;
       for j = 0:1:people_num - 2
         train_img1 = projected_train_data(:,j * train_pic_num_of_each + 1 : j * train_pic_num_of_each + train_pic_num_of_each); % 取出每次SVM需要的训练集
         train_label1 = ones(1,train_pic_num_of_each)*(j + 1); % 给定训练标签
         test_img1 = projected_test_data(:,j * test_pic_num_of_each + 1 : j * test_pic_num_of_each + test_pic_num_of_each); % 取出每次SVM需要的测试集
         for z = j + 1:1:people_num - 1
         train_img2 = projected_train_data(:,z * train_pic_num_of_each + 1 : z * train_pic_num_of_each + train_pic_num_of_each); % 取出每次SVM需要的训练集
         train_label2 = ones(1,train_pic_num_of_each)*(z + 1); % 给定训练标签
         train_imgs = [train_img1,train_img2];
         train_label = [train_label1,train_label2];
         
         test_img2 = projected_test_data(:,z * test_pic_num_of_each + 1 : z * test_pic_num_of_each + test_pic_num_of_each); % 取出每次SVM需要的测试集
         test_imgs = [test_img1,test_img2];
         
          % 数据预处理,将训练集和测试集归一化到[0,1]区间 
        [mtrain,ntrain] = size(train_imgs); %m为行数,n为列数
        [mtest,ntest] = size(test_imgs);
 
        test_dataset = [train_imgs,test_imgs];
        % mapminmax为MATLAB自带的归一化函数
        [dataset_scale,ps] = mapminmax(test_dataset,0,1);
 
        train_imgs = dataset_scale(:,1:ntrain);
        test_imgs = dataset_scale( :,(ntrain+1):(ntrain+ntest) );
 
        % SVM网络训练
        train_imgs = train_imgs';
        train_label = train_label';
        expr = ['model_' num2str(model_num) ' = fitcsvm(train_imgs,train_label);']; % fitcsvm默认读取数据为按行,一张一脸为一列,需要转置
        eval(expr);
        model_num = model_num + 1;
         end
       end
       model_num = model_num - 1;
       
       % 人脸识别
       test = []; % 测试用
    for k = 1:1:test_pic_num_of_each * people_num
        test_img = projected_test_data(:,k); % 取出待识别图像
        test_real_label = fix((k - 1) / test_pic_num_of_each) + 1; % 给定待测试真实标签
        predict_labels = zeros(1,people_num); %用于OVO后续投票
      
       % SVM网络预测
       for t = 1:1:model_num
       predict_label = predict(eval(['model_' num2str(t)]),test_img');
       % test = [test,predict_label]; % 测试用
       predict_labels(1,predict_label) = predict_labels(1,predict_label) + 1;
       end
         [max_value,index] = max(predict_labels);
       if(index == test_real_label)
           right_num = right_num + 1;   
       end
    end
       
       recognition_rate = right_num / (test_pic_num_of_each * people_num); 
       rate = [rate,recognition_rate];
end    

2.2 OVR implements SVM

           % SVM(OVR)
rate = []; %用于记录人脸识别率
for i = 10:10:160
        right_num = 0;
    % 降维得到投影矩阵
    project_matrix = eigen_vectors(:,1:i);
    projected_train_data = project_matrix' * (train_data - all_mean);
    projected_test_data = project_matrix' * (test_data - all_mean);
         model_num = 1;
             % SVM训练过程(每次训练都要使用整个数据集)
         for j = 0:1:people_num - 1
         
         train_imgs = circshift(projected_train_data,-j * train_pic_num_of_each ,2); %使训练集始终在前几行
         train_label1 = ones(1,train_pic_num_of_each) * (j + 1);
         train_label2 = zeros(1,train_pic_num_of_each * (people_num - 1));
         train_label = [train_label1,train_label2];
         
         test_imgs = circshift(projected_test_data,-j * test_pic_num_of_each ,2); %使测试集始终在前几行

        % 数据预处理,将训练集和测试集归一化到[0,1]区间 
        [mtrain,ntrain] = size(train_imgs); %m为行数,n为列数
        [mtest,ntest] = size(test_imgs);
 
        test_dataset = [train_imgs,test_imgs];
        % mapminmax为MATLAB自带的归一化函数
        [dataset_scale,ps] = mapminmax(test_dataset,0,1);

        train_imgs = dataset_scale(:,1:ntrain);
        test_imgs = dataset_scale( :,(ntrain+1):(ntrain+ntest) );
 
        % SVM网络训练
        train_imgs = train_imgs';
        train_label = train_label';
        expr = ['model_' num2str(model_num) ' = fitcsvm(train_imgs,train_label);']; % fitcsvm默认读取数据为按行,一张一脸为一列,需要转置
        eval(expr);
        model_num = model_num + 1;
         end
        model_num = model_num - 1;
         % 人脸识别
       for k = 1:1:test_pic_num_of_each * people_num
        test_img = projected_test_data(:,k); % 取出待识别图像
        test_real_label = fix((k - 1) / test_pic_num_of_each) + 1; % 给定待测试真实标签
        predict_labels = zeros(1,people_num); %用于OVR预测
      
       % SVM网络预测
       for t = 1:1:model_num
       [predict_label,possibility] = predict(eval(['model_' num2str(t)]),test_img');
       if predict_label ~= 0
       predict_labels(1,predict_label) = predict_labels(1,predict_label) + possibility(1,1); 
       end
       end
         [min_value,index] = min(predict_labels); % 若一张图片被预测为多类,选择离超平面最远的作为最终预测类
       if(index == test_real_label)
           right_num = right_num + 1;   
       end
       end
       recognition_rate = right_num / (test_pic_num_of_each * people_num); 
       rate = [rate,recognition_rate];
        
end

3. PCA and LDA respectively combined with SVM in each dimension, the face recognition rate of each data set and its comparison with KNN

Run SVM.m, compare the face recognition rate of ORL (or other data sets) SVM (OVO implementation as an example) + dimensionality reduction method and KNN + dimensionality reduction method as the face recognition rate changes with dimensions, and compare different data sets The face recognition rate of the method, the results and analysis are as follows:

3.1 Comparison of SVM and KNN under ORL for PCA and LDA face recognition rate changes with dimensions

3.2 The average face recognition rate of various methods under different data sets

 Tips: After the data set is the division ratio of the training set and the test set

Analysis: Analyzing the above two sets of figures shows that the face recognition rate of each method has a huge relationship with dimensions, data sets, and parameter selection, and there is no combination of methods that is superior to all data sets in all dimensions.

4. Comparison of the face recognition rate of SVM realized by OVO and OVR

Using the ideas of OVO and OVR to train different SVM classifiers and apply them to face recognition, the face recognition rates in different dimensions and different data sets are as follows:

4.1  Comparison of OVO and OVR under ORL to realize the face recognition rate of SVM + PCA with the change of dimension                      

4.2 OVO and OVR realize the average face recognition rate of SVM + PCA under different data sets           

Analysis: Analyzing the above two sets of graphs, we can see that the recognition rate of OVR is lower than that of OVO in low dimensions when implementing SVM. As the dimension increases, the recognition rate of OVR is gradually and stably better than that of OVO, and in the three experimental data sets, The average recognition rate of OVR is higher than that of OVO. In addition, the number of classifiers trained by OVR is small, and the efficiency is higher in small data sets. However, as the data size increases, the OVR training data is too large each time, and the efficiency will gradually be lower than that of OVO. 

5. Supplementary methods and functions

For multi-class SVM, matlab can use the fitcecoc function, and python can use svm.SVC with fit and predict to directly implement multi-class SVM without training binary SVM multiple times.

4. Innovative SVM algorithm design

① Insufficient classic SVM: it cannot handle nonlinear data, and it is difficult to implement large-scale training samples.

Improvements have been made: soft margin SVM/ kernel SVM/ CSVM , etc.

③ Brief description of the innovative SVM algorithm:

(1) After SMO, the efficiency of the algorithm can be greatly increased by caching the kernel function. Because in the classic SVM, the space consumption is mainly in the training of the model, and the time consumption is mainly in the calling of various kernel functions and other models and functions.

(2) The calculation of f(x) in the optimization problem needs to be looped continuously, and the cost is very high. Here, a g(x) is defined and cached as follows, so that we can calculate the error at any time:

( 3 ) Optimize the division of data sets and realize the separation of hot and cold data. Hot data comes first ( alpha value greater than 0 and less than C ), and cold data comes later ( alpha value less than or equal to 0 or greater than or equal to C ) . With the deepening of the iteration, most of the time, only the hot data needs to be solved, and the size of the hot data will gradually shrink continuously. Therefore, after distinguishing between hot and cold, SVM mostly iterates on limited hot data.

( 4 ) Introduce the correction result of the external classification "weight" to improve the recognition rate. Multiply the original C by the weights of the respective categories to obtain Cp and Cn , and then when iterating, different samples are brought into the calculation with different C values ​​according to their y value signs.

(5) Both sparse and non-sparse are supported, and an appropriate solution is selected for different data.

5. Others

1. Datasets and resources

The data sets used in this experiment: ORL5646, AR5040, FERET_80, the code framework can be applied to multiple data sets.

The commonly used face data sets are as follows (don't prostitute hahaha)

Link: https://pan.baidu.com/s/12Le0mKEquGMgh5fhNagZGw 
Extraction code: yrnb

SVM complete code and data set required for binary classification: Li Yiru/Yiru's Machine Learning- Gitee.com

2. References

1. Derivation of Support Vector Machine (SVM) (Linear SVM, Soft Interval SVM, Kernel Trick) - liuyang0 - 博客园 

2. Hand tearing SVM formula - hard interval, soft interval, nuclear technique_Dominic_S's blog-CSDN blog_hard interval

3. Handwritten SVM algorithm (Matlab implementation) - Zhihu (zhihu.com)

4. SVM support vector machine + example display_Snippers' blog-CSDN blog_svm support vector machine

5. "Machine Learning" by Zhou Zhihua


Summarize

As a classic linear binary classifier, SVM performs hyperspace (plane) projection and division of data by "maximizing the interval" as the goal. It still has excellent performance in many fields of machine learning (data classification, language image processing, recommendation system). And as a supervised learning method (using the original information of the data), SVM can better retain the data information. However, SVM still has the problem of dealing with high-dimensional data and large calculation costs, and it is not easy to describe nonlinear problems. In addition, the data attributes assumed by SVM are often difficult to achieve in real-world problems, which affects the experimental results. Follow-up blogs will analyze other algorithms. Optimize or solve the above problems.

Guess you like

Origin blog.csdn.net/weixin_51426083/article/details/124396497