Deep Learning Algorithm 2: Face Detection and Recognition (mtcnn+facenet) Thoughts and understanding of small records

One, face detection process (mtcnn)

1. First of all, there are three networks in the whole model, which are in orderP-net,R-net,O-net

2. InIncoming P-netBefore, we will preprocess the picture, mainly to convert a picture into multiple sizes, and then pass multiple sizes into P-netit, so that it can be adapted to face detection of different sizes

3, P-netthe picture is divided into h/2*w/2a grid, the output of two parts, will be placed in the same list: [(1,h/2,w/2,2),(1,h/2,w/2,4)]the front one is h/2*w/2one (不是人脸的概率,是人脸的概率)such second part is h/2*w/2the top left and bottom right corner two blocks of the location of Offset, each point has (dx,dy)two offsets, so two points have a total of four values, that is(dx1,dy1,dx2,dy2)

4. P-netAfter coming out, it will go through a post-processing module. The main function is to filter out some boxes according to the set threshold, and then (h/2*w/2)calculate the original position by a formula according to the position of the grid in the grid and the predicted offset. The actual position in the figure; it will also be calculated based on the probability of the face predicted before score, so the final output is (n_p,x1,y1,x2,y2,score), where n represents the number of frames, which is less than h/2*w/2the threshold, because the post-processing set a threshold, and some will be removed frame

5. In the same way, we P-netintercept the picture according to the information of the frame obtained by post-processing, that is, cut out each area that is considered to be a human face separately, and each one is used as a picture and thenIncoming R-net, The output is also(n_r_1,2)和(n_r_1,4)

6. After post-processing, it will (dx1,dy1,dx2,dy2)calculate the position of the corrected box according to a new formula based on the prediction obtained , and will set a new threshold, filter out some prediction boxes again, and the final output shape is(n_r_2_,x1,y1,x2,y2,score)

7. R-netThe output after post-processing will beIncoming O-net, Its output is:, (x1,y1,x2,y2,sc,pts0,pts1,pts2,pts3,pts4,pts5,pts6,pts7,pts8,pts9)that is (n_o_1,15), including位置,score,五个特征点坐标

8. Finally, after the post-processing module, according to the prediction (dx1,dy2,dx2,dy2), the position and feature point coordinates of the corrected frame are calculated according to a formula.

Two, face coding process (facenet)

1. Enter a face picture, which is usually a face extracted through location information after face detection, and has undergone affine transformation to become a positive face

2. Extract features through a deep learning network. This network is also facenetits backbone network Inception-ResNetV1. There are four important parts stem、 Inception-resnet-A、Inception-resnet-B、Inception-resnet-C:; The latter three modules are actually Inception和resnetcombined variant residual networks. They Inception-ResNetV1will be used as one in blockmultiple times. For example, 5 consecutive ones are Inception-resnet-Aused together. Because it is a residual network, the input and output can be added directly, so 5 can be used consecutively Inception-resnet-A; the same is true for the latter two Will be called 10 times and 5 times in succession. Finally, through a fully connected denselayer, the output (128,)feature vector, this feature vector is the key to face recognition, because the distance between the feature vectors of similar faces in the Euclidean space is very small, we can pass this distance less than A certain threshold is used to judge the face.

3. Standardize the obtained (128,)feature vector to L2obtain the final feature vector

Three, face recognition process (mtcnn+facenet)

1), database initialization:

The specific implementation process of database initialization is:

1. Traverse all the pictures in the database.
2. Detect the position of the face in each picture.
3. Use mtcnnto intercept and download faces.
4. Align the acquired faces.
5. Use facenetto encode human faces.
6. Put the results of all face encodings in a list, that is self.known_face_encodings=[]; at the same time, put their names in the list self.known_face_names=[]. The elements at the same position in these two lists are corresponding, so that it is convenient to follow the corresponding The coded index finds the name, and then displays it in the test results in real time.

The list obtained in step 6 is 【已知的所有人脸】the feature list. The faces in the real-time images obtained later need to be compared with known faces so that we can know who is who.

2), real-time picture processing

1. Intercept and align the face
2. Use facenetto encode the corrected face
3. Compare the facial features in the real-time image with those in the database

The comparison process is as follows:
1. Obtain every face feature in the real-time picture.
2. Compare each face feature with all the faces in the database, and calculate the distance. If the distance is less than the threshold, it is considered to have a certain degree of similarity.
3. Obtain the serial number of the most similar face of each face in the database.
4. Judge whether the face distance corresponding to this serial number is less than the threshold, if yes, it is considered that the face recognition is successful, and he is the person.

Attach reference project:

https://blog.csdn.net/weixin_44791964/article/details/103697409

https://github.com/bubbliiiing/keras-face-recognition

Guess you like

Origin blog.csdn.net/qq_39507748/article/details/109983717