Paper Notes--PCN:Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

  1. Keywords: rotation-invariant face detection, rotation-in-plane, coarse-to-fine
  2. Core summary: This article is a paper by Dr. Wu Shuzhe at CVPR2018 by the VIPL research group of the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. The paper is mainly aimed at face detection at different plane angles, and the theme can be summarized as Progressive Calibration Networks (PCN), that is, gradually correcting faces at different angles.
  3. Existing methods: At present, there are mainly three strategies for face detection based on plane angles, namely data augmentation, divide-and-conquer, and rotation router.

 

 

 

4. Improvement: In order to quickly detect faces at different plane angles (0°~360°), through the step-by-step correction route, the face detected at the first level will be [-180°, 180° ] face is flipped to [-90°, 90°]. In simple terms, this step is to flip the downward-facing face to face-up, which reduces the angular range by half. The second stage continues to flip with two ±45° axes, limiting the angular range of the face to [-45°, 45°]. The third stage uses angular deviation regression to predict accurate angles. The calibration process is as follows:

 

 5. Sample division:

Positive, IOU > 0.7

Negative , IOU <0.3

Suspected,  IOU ∈[0.3, 0.7]

Positive and negative are used for face classification, and positive and suspected are used for face box regression and angle correction.

It should be noted,

The training sample inputs of the three-level network are 24x24, 24x24, and 48x48, respectively.

For the first-level network, the face range is divided into 2 parts, the face-up angle range is [−65°, 65°], and the face-down range is [-180°, -115°]∪[115 °, 180°], other angle ranges are not used as training data. It can be defined that the label facing up is 0, and the label facing down is 1.

For the second-level network, the face range is divided into 3 parts, which are [-90°, -45°], [-45°, 45°], [45°, 90°]. The labels can be defined as 0, respectively. 1, 2.

For the third-level network, the face range is [-45°, 45°], and unlike the first two networks, the training task is the regression of face angles.

 6. Training details:

The proportion of samples in each batch, positive: negative: suspected=2:2:1

max_iters: 100,000

type:SGD

lr_base:0.001

gamma:0.1

lr_policy:step

step:70,000

wd:0.0005

7. Network structure:

 

 8. Algorithm introduction:

8.1 PCN-1

For each input sliding window, the first-level network has three goals: face and non-face judgment (f), face frame regression (t), and angle classification score (g).

 

The first objective f, using softmax-loss, y=1 if face else 0

 

The second target t, using l 1 loss

   

The regression of the face frame consists of 3 parts, w represents the width, (a, b) represents the upper left corner of the face frame

   

The third target g, similar to the first using softmax-loss, y=1 if face is up else 0

   

The final loss is, λ is the weight of each loss

 

The first-level face angle division is divided according to the predicted θ. 0° means that the face is up and not flipped; 180° means that the face is down and flipped.

   

8.2 PCN-2

The second stage is similar to the first stage, except that the correction range of the angle has changed to [-90°, -45°], [-45°, 45°], [45°, 90°]

 

8.3 PCN-3

After the second-level correction, the range of the face has been corrected to a vertical area. By directly regressing the angle, the loss used becomes l 1 loss.

The final angle can be obtained by superimposing the angles detected by the 3-level network.

   

9. Experimental results:

   

 

   

 

   

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325040828&siteId=291194637