Machine Learning - LDA (Linear Discriminant Analysis) and Face Recognition

    For details of Yiru’s complete project/code, please refer to github: https://github.com/yiru1225 (reprinted and marked with the source, do not star for projects thanks)

Table of contents

Series Article Directory

1. The concept and principle of LDA

1. Introduction to LDA

2. LDA algorithm model

3. Insufficiency and optimization of LDA

2. LDA is used in face recognition

1. Pretreatment

1.1 Data import and processing

1.2 Calculation of various mean values, inter-class scatter Sb, and intra-class scatter Sw

2. LDA core (construct the objective function and perform eigendecomposition on it)

3. Face recognition

4. LDA and PCA image dimensionality reduction and visualization comparison

5. Summary of similarities and differences between LDA and PCA

6. Other

6.1 Internal function definition

6.2 Datasets and resources

6.3 References

Summarize


Series Article Directory

This series of blogs focuses on the concepts, principles and code practices of machine learning, and does not include tedious mathematical derivations (if you have any questions, please discuss them in the comment area, or contact me directly by private message) .

The code can be copied in full, and     it makes sense for everyone to understand the principle and process to reproduce it! ! !
Chapter 1 Machine Learning - PCA (Principal Component Analysis) and Face Recognition_@李忆如的博客-CSDN博客

Chapter 2 LDA and Face Recognition


synopsis

This blog mainly introduces the LDA (Linear Discriminant Analysis) algorithm and uses LDA and its various equivalent models for face recognition, image dimensionality reduction and visualization, and compares LDA with PCA (enclosed dataset and matlab code)


1. The concept and principle of LDA

1. Introduction to LDA

LDA (Linear Discriminant Analysis) is a mainstream linear dimensionality reduction algorithm. Guided by the goal of " minimizing intra-class variance and maximizing inter-class variance ", through dimensionality reduction (projection), the purpose of dimensionality reduction is achieved to better classify samples.

2. LDA algorithm model

The classic LDA problem solving can be divided into the following steps

1. Classify the data set and calculate the mean value of each class (LDA uses the category (data) label of the sample, which is supervised learning).

2. Calculate the inter-class scatter matrix Sb and the intra-class scatter matrix Sw.

3. Construct the objective function (a variety of different objective functions) and perform eigendecomposition on it.

4. Take out a certain number of eigenvectors to get the projection matrix.

5. Project the test data into the subspace and use KNN for classification (in the actual problem).

3. Insufficiency and optimization of LDA

1. Finite projection axis problem (≤ number of categories - 1)

2. Small sample problem

3. When dealing with high-dimensional data, the calculation cost is very high

Optimization: Preprocess with PCA before using LDA, or add regularization perturbation to the objective function

4. Cannot describe nonlinear problems well

Optimization: processing data using LDA with kernel attributes

2. LDA is used in face recognition

1. Pretreatment

1.1 Data import and processing

Use imread to import the face database in batches, or directly load the corresponding mat file, and continuously pull the faces into column vectors to form reshaped_faces when importing, and take out 30% as test data, and the remaining 70% as training data, repeat this The first step is to abstract the imported data into a framework, which can match the import of different data sets (the experimental framework is suitable for ORL, AR, and FERET data sets).

clear;
% 1.人脸数据集的导入与数据处理框架
reshaped_faces=[];
% 声明数据库名
database_name = "ORL";

% ORL5646
if (database_name == "ORL")
  for i=1:40    
    for j=1:10       
        if(i<10)
           a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));     
        else
            a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));  
        end          
        b = reshape(a,2576,1);
        b=double(b);        
        reshaped_faces=[reshaped_faces, b];  
    end
  end
row = 56; 
column = 46;
people_num = 40;
pic_num_of_each = 10;
train_pic_num_of_each = 7; % 每张人脸训练数量
test_pic_num_of_each = 3;  % 每张人脸测试数量
end

%AR5040
if (database_name == "AR")
    for i=1:40    
      for j=1:10       
        if(i<10)
           a=imread(strcat('C:\AR_Gray_50by40\AR00',num2str(i),'-',num2str(j),'.tif'));     
        else
            a=imread(strcat('C:\AR_Gray_50by40\AR0',num2str(i),'-',num2str(j),'.tif'));  
        end          
        b = reshape(a,2000,1);
        b=double(b);        
        reshaped_faces=[reshaped_faces, b];  
      end
    end
row = 50;
column = 40;
people_num = 40;
pic_num_of_each = 10;
train_pic_num_of_each = 7;
test_pic_num_of_each = 3;
end

%FERET_80
if (database_name == "FERET")
    for i=1:80    
      for j=1:7       
        a=imread(strcat('C:\Users\hp\Desktop\face\FERET_80\ff',num2str(i),'_',num2str(j),'.tif'));              
        b = reshape(a,6400,1);
        b=double(b);        
        reshaped_faces=[reshaped_faces, b];  
      end
    end
row = 80;
column = 80;
people_num = 80;
pic_num_of_each = 7;
train_pic_num_of_each = 5;
test_pic_num_of_each = 2;
end

% 取出前30%作为测试数据,剩下70%作为训练数据
test_data_index = [];
train_data_index = [];
for i=0:people_num-1
    test_data_index = [test_data_index pic_num_of_each*i+1:pic_num_of_each*i+test_pic_num_of_each];
    train_data_index = [train_data_index pic_num_of_each*i+test_pic_num_of_each+1:pic_num_of_each*(i+1)];
end

train_data = reshaped_faces(:,train_data_index);
test_data = reshaped_faces(:, test_data_index);
dimension = row * column; %一张人脸的维度

1.2  Calculation of various mean values, inter-class divergence Sb, and intra-class divergence Sw

The between-class scatter matrix Sb and the intra-class scatter matrix Sw of the sample are defined as:

Figure 1 Definition of Sb and Sw

 Divide the face data set into n categories according to the number of people n, calculate the average value of each category, and obtain the corresponding matrix according to the definition and mathematical derivation of Sb and Sw.

% 算每个类的平均
k = 1; 
class_mean = zeros(dimension, people_num); 
for i=1:people_num
    % 求一列(即一个人)的均值
    temp = class_mean(:,i);
    % 遍历每个人的train_pic_num_of_each张用于训练的脸,相加算平均
    for j=1:train_pic_num_of_each
        temp = temp + train_data(:,k);
        k = k + 1;
    end
    class_mean(:,i) = temp / train_pic_num_of_each;
end

% 算类类间散度矩阵Sb
Sb = zeros(dimension, dimension);
all_mean = mean(train_data, 2); % 全部的平均
for i=1:people_num
    % 以每个人的平均脸进行计算,这里减去所有平均,中心化
    centered_data = class_mean(:,i) - all_mean;
    Sb = Sb + centered_data * centered_data';
end
Sb = Sb / people_num;

% 算类内散度矩阵Sw
Sw = zeros(dimension, dimension);
k = 1; % p表示每一张图片
for i=1:people_num % 遍历每一个人
    for j=1:train_pic_num_of_each % 遍历一个人的所有脸计算后相加
        centered_data = train_data(:,k) - class_mean(:,i);
        Sw = Sw + centered_data * centered_data';
        k = k + 1;
    end
end
Sw = Sw / (people_num * train_pic_num_of_each);

2. LDA core (construct the objective function and perform eigendecomposition on it)

Tips: This experiment uses pinv (matrix pseudo-inverse) instead of inv (matrix inverse) to eliminate the influence of some singular values ​​on the experiment

Each objective function corresponds to the equivalent model of different LDA ( division, subtraction, and its exchange position, etc. ) and PCA model (the objective function and the specific principle can be viewed in the corresponding blog) , and the eigendecomposition is performed after the objective function is determined.

% 目标函数一:经典LDA(伪逆矩阵代替逆矩阵防止奇异值)
target = pinv(Sw) * Sb;

% 目标函数二:不可逆时需要正则项扰动
%   Sw = Sw + eye(dimension)*10^-6;
%   target = Sw^-1 * Sb;

% 目标函数三:相减形式
% target = Sb - Sw;

% 目标函数四:相除
% target = Sb/Sw;

% 目标函数五:调换位置
% target = Sb * pinv(Sw);

%PCA
% centered_face = (train_data - all_mean);
% cov_matrix = centered_face * centered_face';
% target = cov_matrix;

% 求特征值、特征向量
[eigen_vectors, dianogol_matrix] = eig(target);
eigen_values = diag(dianogol_matrix);

% 对特征值、特征向量进行排序
[sorted_eigen_values, index] = sort(eigen_values, 'descend'); 
eigen_vectors = eigen_vectors(:, index);

3. Face recognition

The dimensionality reduction process LDA is basically the same as PCA. The first n largest eigenvectors are taken out of the eigenvectors sorted by eigenvalues ​​to construct a projection matrix to achieve dimensionality reduction (dimensionality reduction to n dimensions), and KNN is used for classification prediction to achieve Face recognition, and compare the face recognition accuracy of various equivalent models and regular models of PCA and LDA under different data sets.

Tips: A single run is the face recognition rate corresponding to the selected objective function.

% 人脸识别

index = 1;
X = [];
Y = [];
% i为降维维度
for i=1:5:161

    % 投影矩阵
    project_matrix = eigen_vectors(:,1:i);
    projected_train_data = project_matrix' * (train_data - all_mean);
    projected_test_data = project_matrix' * (test_data - all_mean);

    % KNN的k值
    K=1;

    % 用于保存最小的k个值的矩阵
    % 用于保存最小k个值对应的人标签的矩阵
    minimun_k_values = zeros(K,1);
    label_of_minimun_k_values = zeros(K,1);

    % 测试脸的数量
    test_face_number = size(projected_test_data, 2);

    % 识别正确数量
    correct_predict_number = 0;

    % 遍历每一个待测试人脸 
    for each_test_face_index = 1:test_face_number

        each_test_face = projected_test_data(:,each_test_face_index);

        % 先把k个值填满,避免在迭代中反复判断
        for each_train_face_index = 1:K
            minimun_k_values(each_train_face_index,1) = norm(each_test_face - projected_train_data(:,each_train_face_index));
            label_of_minimun_k_values(each_train_face_index,1) = floor((train_data_index(1,each_train_face_index) - 1) / pic_num_of_each) + 1;
        end

        % 找出k个值中最大值及其下标
        [max_value, index_of_max_value] = max(minimun_k_values);

        % 计算与剩余每一个已知人脸的距离
        for each_train_face_index = K+1:size(projected_train_data,2)

            % 计算距离
            distance = norm(each_test_face - projected_train_data(:,each_train_face_index));

            % 遇到更小的距离就更新距离和标签
            if (distance < max_value)
                minimun_k_values(index_of_max_value,1) = distance;
                label_of_minimun_k_values(index_of_max_value,1) = floor((train_data_index(1,each_train_face_index) - 1) / pic_num_of_each) + 1;
                [max_value, index_of_max_value] = max(minimun_k_values);
            end
        end

        % 最终得到距离最小的k个值以及对应的标签
        % 取出出现次数最多的值,为预测的人脸标签
        predict_label = mode(label_of_minimun_k_values);
        real_label = floor((test_data_index(1,each_test_face_index) - 1) / pic_num_of_each)+1;

        if (predict_label == real_label)
            correct_predict_number = correct_predict_number + 1;
        end
    end

    correct_rate = correct_predict_number/test_face_number;

    X = [X i];
    Y = [Y correct_rate];

    fprintf("k=%d,i=%d,总测试样本:%d,正确数:%d,正确率:%1f\n", K, i,test_face_number,correct_predict_number,correct_rate);

    if (i == 161)
        waitfor(plot(X,Y));
    end
end

Figure 2 The relationship between the face recognition rate and the dimension of the classic LDA improved by pinv under ORL

Figure 3 Comparison of the face recognition rate between each model of LDA and PCA in FERET

Analysis: When FERET has a large data set, the recognition rate of transposition LDA, division LDA, and PCA fluctuates greatly, which is lower than the other three models, while the performance of regular LDA and classic LDA is relatively stable in each data set ( Other figures are not shown), the recognition rate is higher than that of classical PCA, especially in large data sets, LDA has obvious advantages over PCA. 

4. LDA and PCA image dimensionality reduction and visualization comparison

Use PCA and LDA to reduce the dimensionality of the face image to 2 and 3 dimensions, and take the distribution of the first three categories, and use the first picture of each category as a representative picture.

Tips: This experiment takes the 2D and 3D visualization of the test set as an example

% 二三维可视化
class_num_to_show = 3;
pic_num_in_a_class = pic_num_of_each;
pic_to_show = class_num_to_show * pic_num_in_a_class;
for i=[2 3]

    % 取出相应数量特征向量
    project_matrix = eigen_vectors(:,1:i);

    % 投影
    projected_test_data = project_matrix' * (reshaped_faces - all_mean);
    projected_test_data = projected_test_data(:,1:pic_to_show);

    color = [];
    for j=1:pic_to_show
        color = [color floor((j-1)/pic_num_in_a_class)*20];
    end

    % 显示
    if (i == 2)
        subplot(1, 7, [1, 2, 3, 4]);
        scatter(projected_test_data(1, :), projected_test_data(2, :), [], color, 'filled');
        for j=1:3
            subplot(1, 7, j+4);
            fig = show_face(test_data(:,floor((j - 1) * pic_num_in_a_class) + 1), row, column);
        end
        waitfor(fig);
    else
        subplot(1, 7, [1, 2, 3, 4]);
        scatter3(projected_test_data(1, :), projected_test_data(2, :), projected_test_data(3, :), [], color, 'filled');
        for j=1:3
            subplot(1, 7, j+4);
            fig = show_face(test_data(:,floor((j - 1) * pic_num_in_a_class) + 1), row, column);
        end
        waitfor(fig);
    end
end

                             Figure 4 2D and 3D visualization of LDA under the ORL dataset                                      

Figure 5 Two-dimensional and three-dimensional visualization of PCA under the ORL dataset

Analysis: Figure 4 and Figure 5 (you can change the objective function or framework for other data sets and models by yourself) shows the distribution of images after dimension reduction by LDA and PCA . More scattered. PCA , on the other hand, is relatively mixed and has no obvious rules. According to the purpose and principle of the two algorithms, the difference between PCA and LDA can be compared (that is, the difference between eigenface and fisherface )

5. Summary of similarities and differences between LDA and PCA

References to other authors

                                           PCA LDA

Analysis: Both PCA and LDA are common linear dimensionality reduction algorithms, but the dimensionality reduction principles and purposes of the two algorithms are different . The core idea of ​​LDA is to "minimize the intra-class variance and maximize the inter-class variance", so as to better complete the data The classification and identification of PCA, and the core idea of ​​PCA is to "minimize the covariance matrix (minimize the reconstruction error)", so as to achieve data compression and principal component reconstruction. Generally, the effect of PCA and LDA is similar or even exceeds that of LDA in small data sets. LDA, but in large data sets, LDA is significantly better than PCA. In addition, LDA uses the label (category) information of the data, which is supervised learning, while PCA is not used, which is unsupervised learning.

6. Other

6.1 Internal function definition

In this experiment, the face image display is abstracted into a function, and the function is defined as follows:

% 输入向量,显示脸
function fig = show_face(vector, row, column)
    fig = imshow(mat2gray(reshape(vector, [row, column])));
end

6.2 Datasets and resources

This experiment uses the ORL5646 dataset as a demonstration, and the code can be applied to multiple datasets.

The commonly used face data sets are as follows (don't prostitute hahaha)

Link: https://pan.baidu.com/s/12Le0mKEquGMgh5fhNagZGw 
Extraction code: yrnb

LDA Complete Code: Li Yiru/Yiru's Machine Learning - Gitee.com

6.3 References

1. Lai Zhihui's class

2. LDA algorithm principle and matlab implementation_dulingtingzi's blog-CSDN blog_lda matlab

3. LDA-based face recognition method --fisherface - Zhihu (zhihu.com)

4. "Machine Learning" by Zhou Zhihua


Summarize

As a classic linear dimensionality reduction algorithm, LDA projects the data to achieve dimensionality reduction by "minimizing intra-class variance and maximizing inter-class variance" as a goal, and better completes data classification. It still has excellent performance in many fields of machine learning (data classification, language image processing, recommendation system). And as a supervised learning method (using the original information of the data), LDA can better retain the data information. However, LDA still has the above-mentioned limited projection axis problem, small sample problem, high computational cost problem in dealing with high-dimensional data, difficult to describe nonlinear problems, etc. In addition, LDA assumes that the data of each class is a Gaussian distribution Developed in the context of , this property is often absent in real-world problems. Without this property, the separability of different classes cannot be well described by inter-class scattering , thus affecting the experimental results . Subsequent blogs will analyze other algorithm optimizations or solve the above problems.

Guess you like

Origin blog.csdn.net/weixin_51426083/article/details/123885066