MTCNN for face detection (2) - model training attempt

foreword

This article mainly records the training process of the model. The reason why it is called "try" is that due to limited personal ability, only simple and superficial training is done here, and detailed suggestions for parameter adjustment/effect improvement cannot be provided. The training code, including the training set data, mainly refers to the self -link . There are some errors in the code in the original text, and some modifications have been made here. In addition, here only the training model outputs the face score and the position of the face frame, but does not output the five key points of the face.

data preparation

The ground truth data is required for training, and the relevant data is downloaded here from the face dataset (password: ctvw). The downloaded data includes the image data and the labeled xml file containing the face frame. It is necessary to parse the face frame data from the xml file and save it in a text corresponding to the image file. For details, see parse_xml in my project. .py.
Afterwards, we need to generate a training set. In the original version of the algorithm, four types of data sets pos, part, neg, and landmark need to be generated. Since there is no data of key points of the face, the fourth data set will not be generated. By randomly cropping part of the image data in the face photo, and calculating the IOU value between this area and the reference face frame. If the IOU value is greater than 0.65, it is classified as part, if the IOU is less than 0.4, it is classified as neg; otherwise, it is classified as part. During the training, pos+neg will be used to train the face score part of the network, and pos+part will be used to train the face frame position part of the network. Scale the width and height of the cropped image to 12X12, and the data generated here are all for training the PNET network.

model training

Training PNET: The input image width and height are 12X12, and the output is the face score and the position of the face frame;
training RNET: apply the obtained PNET to the face data set, and calculate the IOU between the obtained face frame and the reference face frame, According to its size, it is divided into three categories: pos, part, and neg; scale the image size to 24X24, and use it as the RNET training set to train RNET and train
ONET: apply the obtained PNET and RNET to the face dataset in turn, and apply the obtained person Calculate the IOU between the face frame and the reference face frame, and then divide it into pos, part, and neg according to its size; scale the image size to 48X48, use it as the ONET training set, and train ONET

test

The test results using the input image are as follows. You can see that there are two adjacent face frames in a face area, which can be filtered out by adjusting the threshold in NMS. See the link for the complete project code: https://pan.baidu.com/s/1UdY_9J8hGfEEZoOv-1CdWg Extraction code: inny

insert image description here

reference

https://blog.csdn.net/weixin_41668848/article/details/107333162

Guess you like

Origin blog.csdn.net/lwx309025167/article/details/114594533