Computer Vision Experiment: Face Recognition System Design

Experimental content

The design of a computer vision target recognition system is related to practical applications (suggestion: the final display form is a system with an interface that can run), and one of the following content should be selected.

1. Design of face recognition system

(1) Face recognition system design (required): According to the theoretical knowledge learned in class (including feature extraction and classifier design), design a face recognition system with a good recognition rate. Experiments can be carried out on the provided AR face image dataset ( 120 persons), Feret face image dataset ( 175 persons), face video dataset ( 10 persons), real face video or other public datasets .

(2) Improvement of the face recognition system (choose at least one of the questions to do): the face recognition system for the actual environment will consider more links, including image preprocessing, feature extraction, feature selection, classifier design, training and testing wait. Face recognition algorithms will encounter the following problems in real applications, including noise interference, illumination changes, occlusion effects, and angle changes. Please discuss at least one of the above problems (such as noise interference, lighting changes, occlusion effects, and angle changes), analyze what causes the recognition performance to decline, and propose methods to enhance the performance of the face recognition system and improve the system's handling of abnormal situations. Ability to make the adaptability and stability of the entire identification system reach a better state.

hint:

  1. In terms of noise interference, image enhancement algorithms can be considered, including median filtering, mean filtering, Gaussian filtering, etc.;
  2. In terms of illumination changes, LBP algorithm and its extended version can be considered , or image enhancement algorithms, such as histogram equalization, gamma transformation, etc.;
  3. In terms of occlusion effects, consider voting on image blocks, or identify occlusion areas in a linear representation with the smallest residual error;
  4. In terms of angle change, you can consider adding sampling pictures at different angles, or introducing affine transformation considerations;
  5. For feature extraction, methods such as Gabor features, eigenfaces, and depth features can be used;
  6. Classifier design can use Bayesian classifier, neural network and other methods;
  7. The AR face data set and the Feret face data set can be used to test the performance of the algorithm under noise interference (noise needs to be artificially added), illumination changes, occlusion effects, and angle changes. On this basis, the recognition system can be tested in a real environment.
  8. You can choose a face detector or introduce an affine transformation according to your needs. This experiment provides a Haar detector and a face detector based on affine transformation.
  9. Face dataset description:

AR face data set: Contains 120 people, 26 pictures per person, image resolution width 80 , height 100 , can test algorithm performance under illumination changes and occlusion.   

Feret face data set: Contains 175 people, 7 photos per person, image resolution width 80 , height 80 , can test algorithm performance under different angles and lighting changes.

Face Video Dataset: Contains videos of 10 people. Each person has a training video sequence and a test video sequence, which can test the algorithm performance under different angles, illumination changes, occlusion interference, and noise interference.

Real face data collection: it can be taken according to the actual situation of the system.

(3) Analyze the difference in the design of recognition algorithms in the laboratory environment and the natural environment, and how to improve algorithm innovation? (optional)

2. Self-selected target identification content, self-made topic

Experimental steps and process

Face recognition system design

  The key point of face recognition is feature extraction. Compared with data collection, preprocessing and classifier selection, feature extraction is the most unique and the main difference between various face recognition systems. Common face feature extraction methods include local binary pattern histogram method, Gabor filter method, statistical methods ( PCA , LDA ), texture feature method, etc. In this experiment, I chose the LDA linear discriminant method for feature extraction. At the same time, before feature extraction, I used the Haar face detector to detect and locate the face, and then aligned the faces. After extraction, the KNN classifier is used to classify and identify the features. Its overall idea is shown in Figure 1 .

Figure 1 Face recognition system idea diagram

Dataset import

  Three datasets are provided in the topic, among which the AR and feret datasets are standard face datasets, and the video dataset is a dataset extracted from videos. The information of the dataset is shown in Figure 2 below .

Figure 2 Basic information of the three data sets

Use the method of file traversal to read all the pictures through opencv , and finally pass them into the image preprocessing module for processing. In the code, the data reading module belongs to the load_dataset function in the Face_reg class . This function receives four parameters, where path represents the path of the dataset, trun represents the first few digits of the intercepted file name as the category, mode represents the reset mode or add mode, 1 represents reset, 0 represents add, format represents the data set file structure type 1 Represents the structure of AR and feret 0 represents the structure of the video dataset.

image preprocessing

In the image preprocessing module, operations such as gray scale transformation, affine transformation to correct the angle, face positioning and cropping, histogram equalization, and image size adjustment are performed on the image. The idea is shown in Figure 3 below .

Figure 3 Schematic diagram of image preprocessing

1. Grayscale transformation

Converting a color image into a grayscale image reduces computational complexity and improves recognition performance. The result after conversion is shown in the figure below. The left image is the original image, and the right image is the grayscale transformed result.

Figure 4 grayscale transformation map

2. Affine transformation correction angle

After the grayscale transformation, the affine transformation is used to correct the angle of the face. The angle correction principle here is to adjust the image angle according to the relative coordinates of the eyes, so that the position of the main feature points is relatively fixed, which is convenient for feature extraction and matching. The detection of feature points mainly uses dlib 's ' shape_predictor_68_face_landmarks ' model. The result of the affine transformation is shown in Figure 5 below .

Figure 5 Affine transformation result map

3. Face positioning and cropping

  For the detection and positioning of faces, I used the already trained Haar face detector. The Haar face detector is also known as the Viola-Jones detector , which was proposed in 2001 . This detector detects and localizes faces through sliding windows and extracting Haar features. Figure 6 below is the result after detection and extraction.

Figure 6 Face positioning and cropping results

4. Histogram equalization

In some images, often due to lighting changes, the image is too bright or Guo An, making certain details unable to stand out. Therefore, the histogram equalization process can be performed on the image, so that the brightness value becomes uniform, so that the brightness is uniform and the influence of the brightness change is reduced. Figure 7 below shows the result of histogram equalization.

Figure 7 Histogram equalization

5. Image resizing

When using LDA for dimensionality reduction, the model requires that the number of features be the same, and here the one-dimensional brightness value of the image is directly used as the feature. Therefore, the image needs to be adjusted here so that the input image size is consistent.

feature extraction

  In terms of feature extraction, LDA is used to reduce the dimensionality of the one-dimensional grayscale sequence of the image, and finally down to 25 features. In addition, the use of LBP features and Garbor features has been considered , but the final results are not good. The figure below shows the overall idea.

 

Figure 8 feature extraction

1. Feature extraction

  • Flattening the gray value: The data obtained by image preprocessing is a 80*80 two-dimensional matrix, and in order to better extract features, the two-dimensional matrix needs to be flattened into a one-dimensional list. Then perform dimensionality reduction.
  • LBP feature extraction: After calculating the LBP feature map, it is flattened and then input to LDA for dimensionality reduction training.
  • Garbor feature extraction: Garbor filtering is performed on the image to calculate the mean value as feature input.

2. LDA dimensionality reduction

  In the above feature extraction, because the latter two are not effective, only the first one is selected as the feature for dimensionality reduction. The dimensionality reduction method used is LDA linear discriminant. The principle is to map high-dimensional data to a low-dimensional space through a linear projection, so that samples of different categories have a larger inter-class distance in the projected space (that is, the distance between different categories is larger), while maintaining The intra-class distance between samples of the same class is small. Doing so can improve classification accuracy. Here, after a few adjustments , 25 is finally selected as the dimensionality reduction dimension. The accuracy rate below 25 will decrease, but there will be no significant improvement above 25 .

Classifier design and training

1. Classifier design

  In terms of classifier selection, I chose the K- nearest neighbor algorithm with a simple principle. It is simple and intuitive, highly flexible, and can have good results. It mainly determines the category of the sample by the distance between each sample point and the surrounding points.

2. Parameter optimization

In face recognition, choosing an appropriate value of K is crucial to classification performance. If you choose a K value that is too small , the classification result may be sensitive to noise, resulting in overfitting. If you choose a K value that is too large , it may make the classification boundary too blurred, resulting in underfitting. So here I used a grid search to find the best K value by trying different K values ​​( 1-10 ) . Ultimately it can help us choose a K value that performs well on the training data and has better generalization ability , thereby improving the classification performance of face recognition.

Improvement of face recognition system

Result without optimization

  In the initial system design, too much image preprocessing was not considered. At the same time, the parameter selection of LDA and KNN is not considered. Use 80% of the data for training and 20% for evaluation. The results of its evaluation are shown below.

 

Figure 9 Accuracy before optimization

It can be seen that the accuracy rates of the three data sets are all higher than 0.8 , indicating that they have certain accuracy. The overall accuracy of the face video dataset is higher than that of the other two datasets. Observing the data set, we can find that the faces in the face video data set are relatively stable and have fewer categories, while the faces in the other two data sets have more changes such as expressions, occlusions, and directions. Therefore, the face video data set will have a better effect relatively.

The problem with the model

  To find the problem, analyze the wrongly predicted images as follows.

 

Figure 10 Wrongly predicted faces

Based on the above questions, they can be organized into the following table. and propose corresponding solutions.

 There is a problem

solution

Solution effect

Expression changes affect recognition

Solutions with underrepresentation of the same features

The effect of underrepresentation of the same features

Shooting Direction Affects Recognition

1. Add LBP features

2. Affine transformation correction angle

The LBP feature effect is not good , which reduces the accuracy rate, so it is discarded .

underrepresentation

1. Increase the dimensionality of LDA output

2. Extract Garbor features ( mean value )

1. The accuracy rate becomes more than 90% when the dimension is increased from 9 to 25 , and the effect is the same for dimensions above 25 .

2. The effect is not obvious after adding the Garbor feature

Light changes affect recognition

Apply histogram equalization to the image to make the brightness uniform

It is effective for faces misjudged by light problems , but it causes misjudgment of other images

Sunglasses cover affect recognition

Solutions with underrepresentation of the same features

The effect of underrepresentation of the same features

The problem with the model

The results of the final system improvement are shown below. It can be seen that after optimization, the effect of the first two data sets increases to more than 0.9 , and its accuracy rate is greatly improved. However, there is no improvement for the video data set, because a key point in the optimization process is the remaining feature dimension of dimensionality reduction. Since LDA requires the dimension to be less than or equal to the number of categories, and there are only 10 categories in the face video dataset , the dimension can only be kept at 10 , so the accuracy rate has no hint. While the other two datasets improved.

 

Figure 10 Optimized indicators

Face recognition system based on deep learning

  In the past ten years, the traditional machine learning algorithm has always been the mainstream method of the face recognition system. However, with the improvement of computer computing power, the deep learning method has become the most mainstream method at present. The following will introduce the current common face recognition models, and reproduce the FaceNet model.

Common Face Recognition Models

1. DeepFace

Released in 2014 , the DeepFace model is a face recognition model based on deep learning. DeepFace first uses traditional face detection algorithms to locate faces in images, and then aligns the detected faces to ensure consistent poses during feature extraction. The aligned face images are input into a deep convolutional neural network for feature extraction. Through the convolutional layer and the fully connected layer, DeepFace learns a 128- dimensional feature vector to represent each face image. In the face recognition stage, it uses the learned feature vectors for comparison. It calculates the similarity of the feature vector between the face to be recognized and the known face, usually using cosine similarity as the similarity measure. If the similarity between the feature vector of the face to be recognized and the feature vector of a known face exceeds a certain threshold, it can be determined that they belong to the same person.

Figure 11 DeepFace structure diagram

2. FaceNet

FaceNet is a face recognition model based on deep learning released by Google in 2015 . It uses a deep convolutional neural network to extract high-dimensional feature vectors of faces, and trains with a triplet loss function to optimize the feature representation. By maximizing the similarity between feature vectors of the same face and minimizing the similarity between feature vectors of different faces, the feature vectors learned by FaceNet have the ability to distinguish different faces. In the face recognition stage, recognition can be performed by calculating the distance between the face feature vector to be recognized and the known face feature vector. The smaller the distance, the higher the matching degree.

Figure 12 FaceNet model structure

3. ArcFace

ArcFace is a deep learning model for face recognition, released in 2019 . It optimizes the representation of eigenvectors by angular cosine distance. Different from traditional models, ArcFace considers the angle information of feature vectors, making the feature vectors of the same face closer and the feature vectors of different faces farther apart. This design makes ArcFace perform well in face recognition tasks, and has achieved high accuracy and robustness.

Figure 13 ArcFace model structure

model reproduction

  Here I chose FaceNet for reproduction, and its code is mainly referenced in: GitHub - timesler/facenet-pytorch: Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models

Here are the steps to reproduce:

1. Environment installation

(1) Install facenet-pytorch

FaceNet provides the corresponding python package, which can be installed directly using pip . The installation command is as follows:

pip install facenet-pytorch

It should be noted that since FaceNet depends on the pytorch environment, it is necessary to implement the environment where pytorch is installed .

(2) Import library files

After completion, you can use the following statement to import the model. If the import is successful, the installation is complete.

#Load face detector and feature extractor
mtcnn = MTCNN()
resnet = InceptionResnetV1(
pretrained = 'vggface2' ).eval()

2. Face registration

FaceNet provides a pre-trained model. Since this is only used as an extended part of the experiment, the pre-trained model is directly loaded. After the loading is complete, our current face library is empty, so we need to enter the face first. The face entry here is mainly to extract the features of a standard face. It first performs some preprocessing on the image and then detects and crops the face, and aligns the face. Then call the function of the model to directly get its features, and finally store it in a list.

3. Face recognition

After the face is entered, the next step is to recognize the unknown face. The idea of ​​recognition is similar to that of face entry. First, the image is preprocessed, then the face is located and cropped, and finally features are extracted for classification. See 'FaceNet.py' for specific code .

Visualization platform construction

  Even if a face recognition system has the best accuracy and efficiency, if there is no corresponding application and visualization platform, it will not be a better system. Therefore, the vue and flask frameworks are used here to build the corresponding web face recognition online web platform.

Visualization system design

  The core of the system is the encapsulated face_reg object, using python 's Flask as the back-end framework and vue as the front-end framework through the api interface . After the user uploads the image, the image is sent back to the backend and the classification function is called to classify the face category. Finally, the result is passed back to the web page. The figure below shows the idea of ​​the visualization system.

 

Figure 14 Visualization system design ideas

backend implementation

  The backend is mainly used to receive the image sent back from the frontend and to crop and predict the face location of the image. At the same time, the pictures returned by the front end will be stored in the server, and finally the back end will return the link of the image and the predicted label to the front end for display. The figure below shows the idea of ​​the back-end design.

 

Figure 15 backend idea map

​​​​​​​Front-end implementation

  The front end uses the vue3 framework, and uses element-plus components, axios and other tools. The front end mainly designs the visual page, and completes the logic of image upload and reception at the same time. The figure below is the effect diagram of the realization.

Figure 16 front-end visualization effect diagram

references:

[1] Zhao, W., Chellappa, R., Phillips, P. J., & Rosenfeld, A. (2003). Face recognition: A literature survey. ACM Computing Surveys (CSUR), 35(4), 399-458.

[2] Taigman Y, Yang M, Ranzato M A, et al. Deepface: Closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1701-1708.

[3] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 815-823.

[4] Deng J, Guo J, Xue N, et al. Arcface: Additive angular margin loss for deep face recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4690-4699.

[5] Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(1), 4-20.

[6]Zhang, D., & Zhou, Z. H. (2011). Face recognition: A literature survey. ACM Computing Surveys (CSUR), 43(3), 1-52.

[7] Ma, L., Tan, T., Wang, Y., & Zhang, D. (2003). Personal identification based on iris texture analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1519-1533.

[8] Turk, M., & Pentland, A. (1991). Face recognition using eigenfaces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 586-591.

[9] Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711-720.

[10] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1, I-511.

[11] Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1, 886-893.

[12] Research and Application of LDA Algorithm in Face Recognition丨【Variety AI Show】-Cloud Community-HUAWEI CLOUD

[13] Research on Face Recognition Algorithm Based on LDA_51CTO Blog_Principle of Face Recognition Algorithm

[14] https://blog.csdn.net/weixin_42163563/article/details/127957504

Experimental conclusion or experience

This experiment has achieved good results, successfully designed a face recognition system and achieved high accuracy on AR, feret and video datasets. By increasing the feature dimension and considering factors such as illumination changes and angle changes, the accuracy of the system is improved from 0.8 to 0.9. In addition, the visualization module of the face recognition system is also designed, and considerable results have been achieved.

However, there are still some aspects of the system that need improvement. First, the generalization performance of the system needs to be enhanced. The current system shows high accuracy on data within the dataset, but only 60% accuracy when combining two different datasets. In order to improve the generalization of the system, further research and optimization of the algorithm is required to better adapt to the characteristics and changes of different data sets.

Secondly, it is necessary to improve the functions of the system so that it supports the entry and recognition of faces. At present, the system mainly focuses on the face recognition process, but in practical applications, face entry is an essential step. Therefore, it is necessary to design and implement the function of face registration, so that users can easily add new face data to the system and perform accurate recognition.

Finally, we also need to improve the generalization performance of the face detector. Face detection is the pre-step of face recognition system, its accuracy and robustness are crucial to the performance of the whole system. The current system still has some room for improvement in face detection under different scenes, angles and lighting conditions. We will continue to research and improve the face detector to improve its generalization performance and accuracy.

In conclusion, this experiment has made positive progress in the design and optimization of the face recognition system, but still faces some challenges and room for improvement. By strengthening the generalization of the system, improving the functions and improving the performance of the face detector, we are confident to further improve the accuracy and stability of the system to meet the needs of practical applications.

Guess you like

Origin blog.csdn.net/weixin_51735061/article/details/132028075