For details of Yiru’s complete project/code, please refer to github: https://github.com/yiru1225 (reprinted and marked with the source, do not star for projects thanks)
Table of contents
1. The concept and principle of LDA
3. Insufficiency and optimization of LDA
2. LDA is used in face recognition
1.1 Data import and processing
1.2 Calculation of various mean values, inter-class scatter Sb, and intra-class scatter Sw
2. LDA core (construct the objective function and perform eigendecomposition on it)
4. LDA and PCA image dimensionality reduction and visualization comparison
5. Summary of similarities and differences between LDA and PCA
6.1 Internal function definition
Series Article Directory
This series of blogs focuses on the concepts, principles and code practices of machine learning, and does not include tedious mathematical derivations (if you have any questions, please discuss them in the comment area, or contact me directly by private message) .
The code can be copied in full, and it makes sense for everyone to understand the principle and process to reproduce it! ! !
Chapter 1 Machine Learning - PCA (Principal Component Analysis) and Face Recognition_@李忆如的博客-CSDN博客
Chapter 2 LDA and Face Recognition
synopsis
This blog mainly introduces the LDA (Linear Discriminant Analysis) algorithm and uses LDA and its various equivalent models for face recognition, image dimensionality reduction and visualization, and compares LDA with PCA (enclosed dataset and matlab code)
1. The concept and principle of LDA
1. Introduction to LDA
LDA (Linear Discriminant Analysis) is a mainstream linear dimensionality reduction algorithm. Guided by the goal of " minimizing intra-class variance and maximizing inter-class variance ", through dimensionality reduction (projection), the purpose of dimensionality reduction is achieved to better classify samples.
2. LDA algorithm model
The classic LDA problem solving can be divided into the following steps
1. Classify the data set and calculate the mean value of each class (LDA uses the category (data) label of the sample, which is supervised learning).
2. Calculate the inter-class scatter matrix Sb and the intra-class scatter matrix Sw.
3. Construct the objective function (a variety of different objective functions) and perform eigendecomposition on it.
4. Take out a certain number of eigenvectors to get the projection matrix.
5. Project the test data into the subspace and use KNN for classification (in the actual problem).
3. Insufficiency and optimization of LDA
1. Finite projection axis problem (≤ number of categories - 1)
2. Small sample problem
3. When dealing with high-dimensional data, the calculation cost is very high
Optimization: Preprocess with PCA before using LDA, or add regularization perturbation to the objective function
4. Cannot describe nonlinear problems well
Optimization: processing data using LDA with kernel attributes
2. LDA is used in face recognition
1. Pretreatment
1.1 Data import and processing
Use imread to import the face database in batches, or directly load the corresponding mat file, and continuously pull the faces into column vectors to form reshaped_faces when importing, and take out 30% as test data, and the remaining 70% as training data, repeat this The first step is to abstract the imported data into a framework, which can match the import of different data sets (the experimental framework is suitable for ORL, AR, and FERET data sets).
clear;
% 1.人脸数据集的导入与数据处理框架
reshaped_faces=[];
% 声明数据库名
database_name = "ORL";
% ORL5646
if (database_name == "ORL")
for i=1:40
for j=1:10
if(i<10)
a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));
else
a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));
end
b = reshape(a,2576,1);
b=double(b);
reshaped_faces=[reshaped_faces, b];
end
end
row = 56;
column = 46;
people_num = 40;
pic_num_of_each = 10;
train_pic_num_of_each = 7; % 每张人脸训练数量
test_pic_num_of_each = 3; % 每张人脸测试数量
end
%AR5040
if (database_name == "AR")
for i=1:40
for j=1:10
if(i<10)
a=imread(strcat('C:\AR_Gray_50by40\AR00',num2str(i),'-',num2str(j),'.tif'));
else
a=imread(strcat('C:\AR_Gray_50by40\AR0',num2str(i),'-',num2str(j),'.tif'));
end
b = reshape(a,2000,1);
b=double(b);
reshaped_faces=[reshaped_faces, b];
end
end
row = 50;
column = 40;
people_num = 40;
pic_num_of_each = 10;
train_pic_num_of_each = 7;
test_pic_num_of_each = 3;
end
%FERET_80
if (database_name == "FERET")
for i=1:80
for j=1:7
a=imread(strcat('C:\Users\hp\Desktop\face\FERET_80\ff',num2str(i),'_',num2str(j),'.tif'));
b = reshape(a,6400,1);
b=double(b);
reshaped_faces=[reshaped_faces, b];
end
end
row = 80;
column = 80;
people_num = 80;
pic_num_of_each = 7;
train_pic_num_of_each = 5;
test_pic_num_of_each = 2;
end
% 取出前30%作为测试数据,剩下70%作为训练数据
test_data_index = [];
train_data_index = [];
for i=0:people_num-1
test_data_index = [test_data_index pic_num_of_each*i+1:pic_num_of_each*i+test_pic_num_of_each];
train_data_index = [train_data_index pic_num_of_each*i+test_pic_num_of_each+1:pic_num_of_each*(i+1)];
end
train_data = reshaped_faces(:,train_data_index);
test_data = reshaped_faces(:, test_data_index);
dimension = row * column; %一张人脸的维度
1.2 Calculation of various mean values, inter-class divergence Sb, and intra-class divergence Sw
The between-class scatter matrix Sb and the intra-class scatter matrix Sw of the sample are defined as:
Figure 1 Definition of Sb and Sw
Divide the face data set into n categories according to the number of people n, calculate the average value of each category, and obtain the corresponding matrix according to the definition and mathematical derivation of Sb and Sw.
% 算每个类的平均
k = 1;
class_mean = zeros(dimension, people_num);
for i=1:people_num
% 求一列(即一个人)的均值
temp = class_mean(:,i);
% 遍历每个人的train_pic_num_of_each张用于训练的脸,相加算平均
for j=1:train_pic_num_of_each
temp = temp + train_data(:,k);
k = k + 1;
end
class_mean(:,i) = temp / train_pic_num_of_each;
end
% 算类类间散度矩阵Sb
Sb = zeros(dimension, dimension);
all_mean = mean(train_data, 2); % 全部的平均
for i=1:people_num
% 以每个人的平均脸进行计算,这里减去所有平均,中心化
centered_data = class_mean(:,i) - all_mean;
Sb = Sb + centered_data * centered_data';
end
Sb = Sb / people_num;
% 算类内散度矩阵Sw
Sw = zeros(dimension, dimension);
k = 1; % p表示每一张图片
for i=1:people_num % 遍历每一个人
for j=1:train_pic_num_of_each % 遍历一个人的所有脸计算后相加
centered_data = train_data(:,k) - class_mean(:,i);
Sw = Sw + centered_data * centered_data';
k = k + 1;
end
end
Sw = Sw / (people_num * train_pic_num_of_each);
2. LDA core (construct the objective function and perform eigendecomposition on it)
Tips: This experiment uses pinv (matrix pseudo-inverse) instead of inv (matrix inverse) to eliminate the influence of some singular values on the experiment
Each objective function corresponds to the equivalent model of different LDA ( division, subtraction, and its exchange position, etc. ) and PCA model (the objective function and the specific principle can be viewed in the corresponding blog) , and the eigendecomposition is performed after the objective function is determined.
% 目标函数一:经典LDA(伪逆矩阵代替逆矩阵防止奇异值)
target = pinv(Sw) * Sb;
% 目标函数二:不可逆时需要正则项扰动
% Sw = Sw + eye(dimension)*10^-6;
% target = Sw^-1 * Sb;
% 目标函数三:相减形式
% target = Sb - Sw;
% 目标函数四:相除
% target = Sb/Sw;
% 目标函数五:调换位置
% target = Sb * pinv(Sw);
%PCA
% centered_face = (train_data - all_mean);
% cov_matrix = centered_face * centered_face';
% target = cov_matrix;
% 求特征值、特征向量
[eigen_vectors, dianogol_matrix] = eig(target);
eigen_values = diag(dianogol_matrix);
% 对特征值、特征向量进行排序
[sorted_eigen_values, index] = sort(eigen_values, 'descend');
eigen_vectors = eigen_vectors(:, index);
3. Face recognition
The dimensionality reduction process LDA is basically the same as PCA. The first n largest eigenvectors are taken out of the eigenvectors sorted by eigenvalues to construct a projection matrix to achieve dimensionality reduction (dimensionality reduction to n dimensions), and KNN is used for classification prediction to achieve Face recognition, and compare the face recognition accuracy of various equivalent models and regular models of PCA and LDA under different data sets.
Tips: A single run is the face recognition rate corresponding to the selected objective function.
% 人脸识别
index = 1;
X = [];
Y = [];
% i为降维维度
for i=1:5:161
% 投影矩阵
project_matrix = eigen_vectors(:,1:i);
projected_train_data = project_matrix' * (train_data - all_mean);
projected_test_data = project_matrix' * (test_data - all_mean);
% KNN的k值
K=1;
% 用于保存最小的k个值的矩阵
% 用于保存最小k个值对应的人标签的矩阵
minimun_k_values = zeros(K,1);
label_of_minimun_k_values = zeros(K,1);
% 测试脸的数量
test_face_number = size(projected_test_data, 2);
% 识别正确数量
correct_predict_number = 0;
% 遍历每一个待测试人脸
for each_test_face_index = 1:test_face_number
each_test_face = projected_test_data(:,each_test_face_index);
% 先把k个值填满,避免在迭代中反复判断
for each_train_face_index = 1:K
minimun_k_values(each_train_face_index,1) = norm(each_test_face - projected_train_data(:,each_train_face_index));
label_of_minimun_k_values(each_train_face_index,1) = floor((train_data_index(1,each_train_face_index) - 1) / pic_num_of_each) + 1;
end
% 找出k个值中最大值及其下标
[max_value, index_of_max_value] = max(minimun_k_values);
% 计算与剩余每一个已知人脸的距离
for each_train_face_index = K+1:size(projected_train_data,2)
% 计算距离
distance = norm(each_test_face - projected_train_data(:,each_train_face_index));
% 遇到更小的距离就更新距离和标签
if (distance < max_value)
minimun_k_values(index_of_max_value,1) = distance;
label_of_minimun_k_values(index_of_max_value,1) = floor((train_data_index(1,each_train_face_index) - 1) / pic_num_of_each) + 1;
[max_value, index_of_max_value] = max(minimun_k_values);
end
end
% 最终得到距离最小的k个值以及对应的标签
% 取出出现次数最多的值,为预测的人脸标签
predict_label = mode(label_of_minimun_k_values);
real_label = floor((test_data_index(1,each_test_face_index) - 1) / pic_num_of_each)+1;
if (predict_label == real_label)
correct_predict_number = correct_predict_number + 1;
end
end
correct_rate = correct_predict_number/test_face_number;
X = [X i];
Y = [Y correct_rate];
fprintf("k=%d,i=%d,总测试样本:%d,正确数:%d,正确率:%1f\n", K, i,test_face_number,correct_predict_number,correct_rate);
if (i == 161)
waitfor(plot(X,Y));
end
end
Figure 2 The relationship between the face recognition rate and the dimension of the classic LDA improved by pinv under ORL
Figure 3 Comparison of the face recognition rate between each model of LDA and PCA in FERET
Analysis: When FERET has a large data set, the recognition rate of transposition LDA, division LDA, and PCA fluctuates greatly, which is lower than the other three models, while the performance of regular LDA and classic LDA is relatively stable in each data set ( Other figures are not shown), the recognition rate is higher than that of classical PCA, especially in large data sets, LDA has obvious advantages over PCA.
4. LDA and PCA image dimensionality reduction and visualization comparison
Use PCA and LDA to reduce the dimensionality of the face image to 2 and 3 dimensions, and take the distribution of the first three categories, and use the first picture of each category as a representative picture.
Tips: This experiment takes the 2D and 3D visualization of the test set as an example
% 二三维可视化
class_num_to_show = 3;
pic_num_in_a_class = pic_num_of_each;
pic_to_show = class_num_to_show * pic_num_in_a_class;
for i=[2 3]
% 取出相应数量特征向量
project_matrix = eigen_vectors(:,1:i);
% 投影
projected_test_data = project_matrix' * (reshaped_faces - all_mean);
projected_test_data = projected_test_data(:,1:pic_to_show);
color = [];
for j=1:pic_to_show
color = [color floor((j-1)/pic_num_in_a_class)*20];
end
% 显示
if (i == 2)
subplot(1, 7, [1, 2, 3, 4]);
scatter(projected_test_data(1, :), projected_test_data(2, :), [], color, 'filled');
for j=1:3
subplot(1, 7, j+4);
fig = show_face(test_data(:,floor((j - 1) * pic_num_in_a_class) + 1), row, column);
end
waitfor(fig);
else
subplot(1, 7, [1, 2, 3, 4]);
scatter3(projected_test_data(1, :), projected_test_data(2, :), projected_test_data(3, :), [], color, 'filled');
for j=1:3
subplot(1, 7, j+4);
fig = show_face(test_data(:,floor((j - 1) * pic_num_in_a_class) + 1), row, column);
end
waitfor(fig);
end
end
Figure 4 2D and 3D visualization of LDA under the ORL dataset
Figure 5 Two-dimensional and three-dimensional visualization of PCA under the ORL dataset
Analysis: Figure 4 and Figure 5 (you can change the objective function or framework for other data sets and models by yourself) shows the distribution of images after dimension reduction by LDA and PCA . More scattered. PCA , on the other hand, is relatively mixed and has no obvious rules. According to the purpose and principle of the two algorithms, the difference between PCA and LDA can be compared (that is, the difference between eigenface and fisherface )
5. Summary of similarities and differences between LDA and PCA
PCA LDA
Analysis: Both PCA and LDA are common linear dimensionality reduction algorithms, but the dimensionality reduction principles and purposes of the two algorithms are different . The core idea of LDA is to "minimize the intra-class variance and maximize the inter-class variance", so as to better complete the data The classification and identification of PCA, and the core idea of PCA is to "minimize the covariance matrix (minimize the reconstruction error)", so as to achieve data compression and principal component reconstruction. Generally, the effect of PCA and LDA is similar or even exceeds that of LDA in small data sets. LDA, but in large data sets, LDA is significantly better than PCA. In addition, LDA uses the label (category) information of the data, which is supervised learning, while PCA is not used, which is unsupervised learning.
6. Other
6.1 Internal function definition
In this experiment, the face image display is abstracted into a function, and the function is defined as follows:
% 输入向量,显示脸
function fig = show_face(vector, row, column)
fig = imshow(mat2gray(reshape(vector, [row, column])));
end
6.2 Datasets and resources
This experiment uses the ORL5646 dataset as a demonstration, and the code can be applied to multiple datasets.
The commonly used face data sets are as follows (don't prostitute hahaha)
Link: https://pan.baidu.com/s/12Le0mKEquGMgh5fhNagZGw
Extraction code: yrnb
LDA Complete Code: Li Yiru/Yiru's Machine Learning - Gitee.com
6.3 References
1. Lai Zhihui's class
2. LDA algorithm principle and matlab implementation_dulingtingzi's blog-CSDN blog_lda matlab
3. LDA-based face recognition method --fisherface - Zhihu (zhihu.com)
4. "Machine Learning" by Zhou Zhihua
Summarize
As a classic linear dimensionality reduction algorithm, LDA projects the data to achieve dimensionality reduction by "minimizing intra-class variance and maximizing inter-class variance" as a goal, and better completes data classification. It still has excellent performance in many fields of machine learning (data classification, language image processing, recommendation system). And as a supervised learning method (using the original information of the data), LDA can better retain the data information. However, LDA still has the above-mentioned limited projection axis problem, small sample problem, high computational cost problem in dealing with high-dimensional data, difficult to describe nonlinear problems, etc. In addition, LDA assumes that the data of each class is a Gaussian distribution Developed in the context of , this property is often absent in real-world problems. Without this property, the separability of different classes cannot be well described by inter-class scattering , thus affecting the experimental results . Subsequent blogs will analyze other algorithm optimizations or solve the above problems.