Faster RCNN analysis of target detection

Basic process

Insert image description here

  1. The image is input into the network to obtain the feature map
  2. Use RPN to generate candidate boxes, and project the candidate boxes to the feature map to obtain the feature matrix.
  3. Use ROI pooling on the feature matrix to obtain the feature map and flatten it to obtain the prediction result.

Key analysis
The position of RPN in the network

Insert image description here
In the figure above, from the feature map layer, there are two arrows pointing to the upper layer. The left one points to the Region Proposal Network, which is the RPN structure, and the right one points to Roi pooling, which is consistent with fast rcnn.

RPN structure

Insert image description here
Use 3 ∗ 3 3*3 on the feature map33 sliding windows, each position gets a 256-dimensional vector (not unique, the depth of the backbone output here is 256), and then through the fully connected layer, 2k scores of classification are obtained (corresponding to the k anchors pointing to the right, each The two parameters of anchor describe the foreground probability and background probability), and the 4k parameters of position box regression.

Correspondence between the anchor of the feature map and the original image :
Based on the scaling coefficient of the original image and the feature map, the coordinates of the anchor center of the feature map corresponding to the original image can be calculated, and then a series of anchors of specified sizes are generated on the original image.
In the original image, the anchor has 9 different sizes at each position, with three areas { 128 ∗ 128 128*128 128128 256 ∗ 256 256*256 256256 512 ∗ 512 512*512 512512 }, three aspect ratios { 1 : 1 1:1 1:1 1 : 2 1:2 1:2 2 : 1 2:1 2:1}

2k parameters (each group of two, represents (foreground probability, background probability)):

( 0.2 0.8) (0.7 0.3) (0.4 0.6) (0.9 0.1)

4k parameters (each group of four represents the predicted regression of the k-th anchor ( dxk, dyk, dwk, dhk d_x^k,d_y^k,d_w^k,d_h^kdxk,dyk,dwk,dhk)):

( 0.12 0.21 0.74 0.33) (0.54 0.16 0.09 0.21)

RPN loss function

It consists of classification loss and bounding box loss
Insert image description here
where
pi p_ipiis the probability pi ∗ p_i^* that the i-th anchor is predicted to be the real label
piThe positive sample is 1 and the negative sample is 0
ti t_itiis the parameter ti ∗ t_i^* that predicts the bounding box of the i-th anchor
tiis the parameter
N cls N_{cls} of the real boxNclsIs the number of samples in a batch
N reg N_{reg}Nregis the number of anchor positions

Faster R-CNN training

Insert image description here
The paper adopts RPN Loss + Fast R-CNN Loss joint training method

  1. Use the pre-trained classification model to initialize the convolutional network parameters (i.e., the CNN layer in the above figure) and train the RPN network separately (arrow part on the left side of the above figure)
  2. Fix the convolutional layer and fully connected layer parameters of the RPN network, and use the target box generated by the RPN network to train the Fast RCNN network (arrow part on the right side of the above figure)
  3. Fix the trained Fast RCNN network parameters and fine-tune the RPN network parameters
  4. Fixed the convolutional layer and fully connected layer parameters of the RPN network, and fine-tuned the Fast RCNN network parameters (Roi pooling layer and later)

Guess you like

Origin blog.csdn.net/qq_44116998/article/details/128427879
Recommended