Machine Learning - LR (Linear Regression), LRC (Linear Regression Classification) and Face Recognition

    For details of Yiru’s complete project/code, please refer to github: https://github.com/yiru1225 (reprinted and marked with the source, do not star for projects thanks)

Table of contents

Series Article Directory

1. The concept and principle of LR and the prediction of LR for simple data

1. Introduction to LR

2. LR algorithm model

3. LR is used for prediction of simple data

2. Introduction and algorithm flow of LRC

1. Introduction to LRC (Linear Regression Classification)

2. LRC algorithm process

3. LRC and its various optimization models are used for face recognition

1. Data import and processing

2. Face recognition and classification

2.1. Classical linear regression classification for face recognition

2.2. Ridge regression for face recognition

2.3.lasso regression for face recognition

2.4. Block LRC for face recognition

2.5 Regression part code integration

3. Comparison of multiple regressions in face recognition

4. New linear regression design, practice, comparison

5. Other

5.1 Internal function definition

5.2 Datasets and resources

5.3 References

Summarize


Series Article Directory

This series of blogs focuses on the concepts, principles and code practices of machine learning, and does not include tedious mathematical derivations (if you have any questions, please discuss and point them out in the comment area, or contact me directly by private message).

The code can be copied in full, and     it makes sense for everyone to understand the principle and process to reproduce it! ! !
Chapter 1 Machine Learning - PCA (Principal Component Analysis) and Face Recognition_@李忆如的博客-CSDN博客

Chapter 2  Machine Learning - LDA (Linear Discriminant Analysis) and Face Recognition_@李忆如的博客-CSDN博客

Chapter 3 LR (Linear Regression), LRC (Linear Regression Classification) and Face Recognition


synopsis

This blog mainly introduces LR (linear regression), LRC (linear regression classification) and LR is used for simple data prediction, LRC and its various optimization models (ridge (ridge) regression, lasso regression, block LRC) for Face recognition, and designed a new linear regression algorithm for face recognition and compared it with classic LR (enclosed data set and matlab code)


1. The concept and principle of LR and the prediction of LR for simple data

1. Introduction to LR

Regression analysis refers to a predictive modeling technique, mainly to study the relationship between independent variables and dependent variables. LR (Linear Regression) is the most basic regression algorithm. Use models such as lines (surfaces) to fit the existing relatively linear data with less loss , and make the fitted model better predict the data.

For the derivation of the least squares method, see: Optimization method - least squares method and gradient descent method

2. LR algorithm model

The classic LR problem solving can be divided into the following steps

① Import the data set to filter and control variables

② Do scatter plot and correlation analysis on the data with normal distribution

③ Determine the parameters by minimizing the loss function, and get (fit) the regression equation (using the normal equation to calculate w or least squares or gradient descent)

④ Continuously test the model, optimize parameters, and obtain the optimal regression equation

⑤ Use the regression equation to make predictions

Figure 1 Detailed process of linear regression analysis 

Tips: For those who are interested in the mathematical principles of linear regression, the least square method, gradient descent, and the mathematical analysis and derivation of the positive regression equation, please refer to: Machine Learning Algorithms - Linear Regression (Super Detailed and Popular)_A Serious Vegetable Dog's Blog-CSDN Blog_Linear Regression

3. LR is used for prediction of simple data

① Problem description: Explore the relationship between student grades and student learning time

② Linear regression implementation: use learning time as a variable and grade as a predicted value, establish a regression equation, and use the least square method to minimize the loss function, obtain the regression equation and verify it, and use it to predict after verification.

③ The core code is as follows:

x=[23.80,27.60,31.60,32.40,33.70,34.90,43.20,52.80,63.80,73.40];
y=[41.4,51.8,61.70,67.90,68.70,77.50,95.90,137.40,155.0,175.0];
figure
plot(x,y,'r*') %作散点图(制定横纵坐标)
xlabel('x(学生学习时间)','fontsize',12)
ylabel('y(学生成绩)','fontsize',12)
set(gca,'linewidth',2)
%采用最小二乘拟合
Lxx=sum((x-mean(x)).^2);
Lxy=sum((x-mean(x)).*(y-mean(y)));
b1=Lxy/Lxx;
b0=mean(y)-b1*mean(x);
y1=b1*x+b0; %线性方程用于预测和拟合
hold on
plot(x,y1,'linewidth',2);
m2=LinearModel.fit(x,y); %函数进行线性回归

After data fitting, the regression model is plotted as follows:

 Figure 2 LR is used for simple data prediction regression equation drawing

Analysis: Analyzing the above figure, it can be seen that the model fits the data well and the prediction is relatively linear. To predict data that is not in the graph, simply substitute the corresponding learning time as x into the regression model (equation).

2. Introduction and algorithm flow of LRC

1. Introduction to LRC (Linear Regression Classification)

LRC: Use LR for classification tasks (such as face recognition, which type of person a certain face belongs to (one person is a category))

2. LRC algorithm process

① Import and classify the dataset

② Read each class data for linear regression, divide training set and test set

③ Use the normal equation to calculate w (or least squares or gradient descent), and determine the parameters by minimizing the loss function to obtain (fit) the regression equation

Tips: Here y is the corresponding single data that needs to be classified

④ Use the regression equation to predict data

⑤ Calculate the distance (loss) between the predicted data and the real data, and output the category corresponding to the minimum distance as the predicted category

3. LRC and its various optimization models are used for face recognition

1. Data import and processing

Use imread to import the face database in batches, or directly load the corresponding mat file, and continuously pull the faces into column vectors to form reshaped_faces when importing, and take out n% as test data, and the remaining 100-n% as training data, Repeat this step to abstract the imported data into a framework, which can match the import of different datasets (the experimental framework is suitable for ORL, AR, and FERET datasets).

Tips: The code can be seen in the author's previous article (LDA and face recognition), which is basically the same.

2. Face recognition and classification

Tips: The first four parts describe the algorithm process, core process, and analysis of LRC and its optimization model. See the fifth part for complete code integration.

2.1. Classical linear regression classification for face recognition

Algorithm flow: Similar to the algorithm flow in 2.2, import and classify the data set, read each class data for linear regression, divide the training set and test set, use the normal equation to calculate w, and calculate the prediction after predicting the data The distance between the data and the real data, and the category corresponding to the minimum distance is used as the predicted category, and compared with the label corresponding to the test set to detect (identify) whether the classification is correct, and obtain the correct rate.

② Core code:

 % 1.线性回归
   w = inv(train_data' * train_data) * train_data' * totest;
   img_predict = train_data * w; % 计算预测图片
 % 分类、预测过程(各种分类都类似)   
 % show_face(img_predict,row,column); %预测人脸展示
         dis = img_predict - totest; % 计算误差
       
       distest = [distest,norm(dis)]; % 计算欧氏距离
     % 取出误差最小的预测图片 并找到他对应的标签 作为预测结果输出
     end
            [min_dis,label_index] = min(distest); % 找到最小欧氏距离下标(预测类)
            if label_index == totest_index
              count_right = count_right + 1;
            else  
                fprintf("预测错误:%d\n" ,(i + 1) * (k - train_num_each));
            end
    end

③ Analysis: The calculation of w (minimizing the optimal parameters of the loss function) and the difference in the method of calculating the predicted picture have created different methods. Among them, for data overfitting and irreversible key matrix phenomena, ridge regression and lasso regression are introduced; for data occlusion problems, block LRC is introduced.

2.2. Ridge regression for face recognition

① Algorithm core: add L2 regularization to classical linear regression (get dense solution)

② Core code:

  % 2.岭回归
        rr_data = (train_data' * train_data) + eye(train_num_each)*10^-6;
        w = inv(rr_data) * train_data' * totest;
        img_predict = train_data * w; % 计算预测图片

③Analysis : Through regularization perturbation, the problem of serious overfitting or multicollinearity between variables is effectively avoided.

2.3.lasso regression for face recognition

① Algorithm core: add L1 regularization to classical linear regression (get sparse solution)

② Core code:

         % 3.lasso回归
        [B,FitInfo] = lasso(train_data , totest);
        img_predict = train_data * B + FitInfo.Intercept;

Tips: The lasso regression in the matlab library is used here. For more detailed implementation and analysis of custom lasso regression, see: Modeling Algorithm Series Nineteen: Lasso Regression Derivation with MATLAB Source Code - Zhihu (zhihu.com)

③Analysis : Generally speaking, for high-dimensional feature data, especially if the linear relationship is sparse, we will use Lasso regression. Or to find the main features among a bunch of features, Lasso regression is the first choice.

2.4. Block LRC for face recognition

① Algorithm core: Divide each data in the database into M blocks (the face is divided into four blocks in this experiment), perform LRC on each block in the data, select the minimum value in the M block min_dis of the same data, and its corresponding The block is used as the classification basis, and the predicted classification result is given (or the predicted classification result that appears most in the same data M block is selected by voting method).

② Core code:

%%数据导入部分进行分块(不同分块规则大大影响实验效果,此处以均分四块为例)  
for i=1:40    
    for j=1:10       
        if(i<10)
           a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));     
        else
            a=imread(strcat('C:\Users\hp\Desktop\face\ORL56_46\orl',num2str(i),'_',num2str(j),'.bmp'));  
        end
        a = double(a);
        a = mat2cell(a,[row/2,row/2],[column/2,column/2]);
        a1 = a{1};
        a2 = a{2};
        a3 = a{3};
        a4 = a{4};
        b1 = reshape(a1,row * column / 4,1);
        b1=double(b1);
        b2 = reshape(a2,row * column / 4,1);
        b2=double(b2);
        b3 = reshape(a3,row * column / 4,1);
        b3=double(b3);
        b4 = reshape(a4,row * column / 4,1);
        b4=double(b4);
        reshaped_faces=[reshaped_faces, b1,b2,b3,b4]; 
        
    end
  end

③ Analysis: Block linear regression is an effective method for dealing with occluded image recognition.

2.5 Regression part code integration

% 回归过程
dimension = row * column;
count_right = 0;

for i = 0:1:people_num - 1
    totest_index = i + 1; %取出图片对应标签
    %对每一类进行一次线性回归
    for k = train_num_each + 1:1:pic_num_of_each
       totest = reshaped_faces(:,i*pic_num_of_each + k); %取出每一待识别(分类)人脸
       distest = []; %记录距离
     for j = 0:1:people_num - 1
       batch_faces = reshaped_faces(:,j * pic_num_of_each + 1 :j * pic_num_of_each + pic_num_of_each); %取出每一类图片
       % 划分训练集与测试集
       %第一次  batch中的前train_num_each个数据作为训练集 后面数据作为测试集合
       train_data = batch_faces(:,1:train_num_each);
       test_data = batch_faces(:,train_num_each + 1:pic_num_of_each);
         % 1.线性回归
         w = inv(train_data' * train_data) * train_data' * totest;
         img_predict = train_data * w; % 计算预测图片           

         % 2.岭回归
%        rr_data = (train_data' * train_data) + eye(train_num_each)*10^-6;
%        w = inv(rr_data) * train_data' * totest;
%        img_predict = train_data * w; % 计算预测图片

         % 3.lasso回归
%        [B,FitInfo] = lasso(train_data , totest);
%        img_predict = train_data * B + FitInfo.Intercept;

         % 4.权重线性回归(代码有误)
%        W = eye(dimension);
%        kk = 10^-1;
%            for jj = 1:1:dimension
%               diff_data = reshaped_faces(j+1,:) - reshaped_faces(jj,:);
%               W(jj,jj) = exp((diff_data * diff_data')/(-2.0 * kk^2));
%            end
%            w = inv(train_data' * W * train_data) * train_data' * W * totest;

         % 5.新线性回归(已提前PCA降维)
%           rr_data = (train_data' * train_data) +
%           eye(train_num_each)*10^-6; 
%           w = inv(rr_data) * train_data' * test_data; % 改良w
%           img_predict = train_data * w; % 计算预测图片
         
       % show_face(img_predict,row,column); %预测人脸展示
         dis = img_predict - totest; % 计算误差
       
       distest = [distest,norm(dis)]; % 计算欧氏距离
     % 取出误差最小的预测图片 并找到他对应的标签 作为预测结果输出
     end
            [min_dis,label_index] = min(distest); % 找到最小欧氏距离下标(预测类)
            if label_index == totest_index
              count_right = count_right + 1;
            else  
                fprintf("预测错误:%d\n" ,(i + 1) * (k - train_num_each));
            end
    end
         
end
recognition_rate = count_right / test_sum; 

3. Comparison of multiple regressions in face recognition

Datasets used: ORL5646, AR5040, FERET_80

Use classical linear regression, ridge regression, lasso regression, and block linear regression to classify faces in different datasets. The face recognition rates are compared as follows:

Table 1 Face recognition rate of different regressions under different data sets

Tips: The numbers in parentheses in the table are the number of face training and testing for a class of faces

 Figure 3 Face recognition rate of different regressions under different data sets

Analysis: Analyzing the above figure and the above table, we can get the following conclusions

a. In the above three data sets, the face recognition rate of classical linear regression is the same as that of ridge regression. This is because a personal face is pulled into a column vector, the dimension of train' * train is small, there is no singular value in the matrix operation, and the inverse matrix is ​​reversible.

b. In the above three data sets, the face recognition rate of lasso regression is higher than that of classical linear regression.

c. In the above three data sets, simple linear regression is better than block linear regression. This is caused by the block logic (in this experiment, the face is divided into four blocks), and the data set used is basically unoccluded, and the advantages of block linear regression cannot be reflected.

4. New linear regression design, practice, comparison

① Disadvantages of classical linear regression: large data size, serious overfitting, multicollinearity between variables, and insufficient consideration of the local structure within the class in the calculation of normal equations

② New linear regression design: PCA dimensionality reduction is used in data preprocessing to compress data. In the process of minimizing the loss function, L2 regularization (ridge regression idea) is added to avoid serious overfitting and multi-collinearity problems between variables, and the test set of the entire class is used in the w operation of the normal equation instead of A single image retains a certain class structure.

③ New linear regression normal equation and core code

If ② is designed, the normal equation becomes the following form:

The core code is as follows:

Tips: See the complete code for PCA dimensionality reduction, or refer to the author's first article (PCA and face recognition).

         % 5.新线性回归(已提前PCA降维)
           rr_data = (train_data' * train_data) +
           eye(train_num_each)*10^-6; 
           w = inv(rr_data) * train_data' * test_data; % 改良w
           img_predict = train_data * w; % 计算预测图片

 Table 2 Face recognition rates of new and old linear regression under different data sets

 Tips: The numbers in parentheses in the table are the number of face training and testing for a class of faces

  Figure 4 Face recognition rate of new and old linear regression under different data sets

Analysis: Analyzing the above figure and the above table, in ORL and FERET, the face recognition rate of the new linear regression is higher than that of the classic linear regression, and it is almost the same in AR. This verifies the feasibility and correctness of the new linear regression, and verifies the superiority of the new linear regression over the linear regression.

5. Other

5.1 Internal function definition

In this experiment, the face image display is abstracted into a function, and the function is defined as follows:

% 输入向量,显示脸
function fig = show_face(vector, row, column)
    fig = imshow(mat2gray(reshape(vector, [row, column])));
end

5.2 Datasets and resources

The data sets used in this experiment: ORL5646, AR5040, FERET_80, the code framework can be applied to multiple data sets.

The commonly used face data sets are as follows (don't prostitute hahaha)

Link: https://pan.baidu.com/s/12Le0mKEquGMgh5fhNagZGw 
Extraction code: yrnb

LR, LRC complete code: Li Yiru/Yiru's machine learning - Gitee.com

5.3 References

1. Lai Zhihui's class

2. Summary of linear regression analysis ideas! Easy to understand and comprehensive! - Zhihu (zhihu.com)

3. Linear regression for face recognition - ORL dataset_HeiGe__'s blog-CSDN blog_orl face dataset

4. "Machine Learning" by Zhou Zhihua


Summarize

As a classic and most basic regression analysis method, LR has a simple process of fitting and predicting linear data with excellent results. LRC and its various optimization models also perform well in data classification tasks. And as a supervised learning method (using the original information of the data), LR can better retain data information for classification. However, LR is still sensitive to outliers, it is easy to create an over-fitting model, it is not easy to describe nonlinear problems, etc., which affects the experimental results. Some optimization models of LR solve some of the shortcomings. Follow-up blogs will analyze other algorithm optimizations or solutions other problems.

Guess you like

Origin blog.csdn.net/weixin_51426083/article/details/124202201