07- Algorithm Interpretation Faster_R-CNN (Target Detection)

Main points:

  • Faster_R-CNN = RPN + Fast R-CNN

GitHub地址:vision/torchvision/models/detection at main · pytorch/vision · GitHub


Three Faster_R-CNN

Faster R-CNN is another masterpiece of the author Ross Girshick after Fast R-CNN . Also using VGG16 as the backbone of the network , the inference speed reaches 5fps on the GPU ( including the generation of candidate regions ) , and the accuracy rate has been further improved. In 2015 I L S V R C and C O C O competition won the first place in several projects.  

 The Faster R-CNN algorithm process can be divided into 3 steps

  • Input the image into the network to get the corresponding feature map
  • Use the RPN structure to generate candidate frames, and project the candidate frames generated by RPN onto the feature map to obtain the corresponding feature matrix
  • Scale each feature matrix to a 7x7 feature map through the ROI pooling layer , and then flatten the feature map through a series of fully connected layers to get the prediction result

3.1 Loss logic

For each 3x3 sliding window on the feature map, calculate the center point of the sliding window corresponding to the center point on the original image, and calculate k anchor boxes (note the difference with the proposal).

  • Three scales (area) { 1282 , 2562 , 5122}
  • Three ratios { 1:1, 1:2, 2:1 } 
  • Each position (each sliding window) corresponds to a 3x3=9 anchor on the original image

For a 1000x600x3 image, there are about 60x40x9 (20k) anchors. After ignoring the anchors that cross the border, there are about 6k anchors left. There is a lot of overlap between the candidate frames generated by RPN. Based on the cls score of the candidate frame, non-maximum suppression is used , and the IoU is set to 0.7, so that only 2k candidate frames are left for each picture.

3.2 RPN Multi-task loss

3.3 Classification loss

3.4 Bounding box regression loss

 3.5 Faster R-CNN training

Directly adopt the joint training method of RPN Loss+ Fast R-CNN Loss

In the original paper, the method of training RPN and Fast R-CNN respectively

(1) Use the imageNet pre-trained classification model to initialize the parameters of the pre-convolution network layer, and start training the RPN network parameters separately;

(2) Fix the unique convolution layer and fully connected layer parameters of the RPN network, and then use the marelNet pre-training classification to initialize the pre-convolution network parameters, and use the target suggestion box generated by the RPN network to train the FastRCNN network parameters

(3) Fix the pre-convolutional network layer parameters trained by Fast RCNN, and fine-tune the unique convolutional layer and fully connected layer parameters of the RPN network.

(4) Also keep the parameters of the pre-convolutional network layer fixed , and fine-tune the parameters of the fully connected layer of the FatRCNN network. Finally, the RPN network and the FastRCN network share the parameters of the pre-convolutional network layer to form a unified network.

3.6 Faster R-CNN framework

Structural comparison:


3.7 FPN network

  • For object detection tasks
  • cocoAP increased by 2.3 points
  • PascalAP increased by 3.8 points

3.7.1 For different prediction feature layers , the weight sharing RPN and Fast RCNN


3.8 Interpretation of source code

Environment configuration:

  • Python3.6/3.7/3.8
  • Pytorch1.7.1 (Note: must be 1.6.0 or above, because it is only supported after using the official mixed precision training 1.6.0)
  • pycocotools(Linux: pip install pycocotools; Windows: pip install pycocotools-windows(no need to install vs))
  • Best to use GPU for training
  • For detailed environment configuration, seerequirements.txt

file structure:

  ├── backbone: feature extraction network, you can choose according to your own requirements
  ├── network_files: Faster R-CNN network (including Fast R-CNN and RPN modules)
  ├── train_utils: training and verification related modules (including cocotools)
  ├ ── my_dataset.py: custom dataset for reading VOC datasets
  ├── train_mobilenet.py: training with MobileNetV2 as the backbone
  ├── train_resnet50_fpn.py: training with resnet50+FPN as the backbone
  ├── train_multi_GPU .py: for users who use multiple GPUs
  ├── predict.py: simple prediction script, use the trained weights for prediction testing
  ├── validation.py: use the trained weights to validate/test COCO indicators of the data, And generate record_mAP.txt file
  └── pascal_voc_classes.json: pascal_voc label file

Pre-training weight download address (after downloading, put it in the backbone folder):

The PASCAL VOC2012 dataset is used in this project :

Training method:

  • Make sure to prepare your dataset ahead of time
  • Make sure to download the corresponding pre-trained model weights in advance
  • To train mobilenetv2+fasterrcnn, use the train_mobilenet.py training script directly
  • To train resnet50+fpn+fasterrcnn, use the train_resnet50_fpn.py training script directly

Precautions:

  • When using the training script, be careful to set '--data-path' (VOC_root) to the root directory where you store the 'VOCdevkit' folder
  • Since the Faster RCNN with FPN structure consumes a lot of video memory, if the GPU video memory is not enough (if the batch_size is less than 8), it is recommended to use the default norm_layer in the create_model function, that is, the norm_layer variable is not passed, and FrozenBatchNorm2d is used by default (that is, it will not be updated. The bn layer of the parameter), and the effect is found to be very good in use.
  • When using the predict script, set 'train_weights' to your own generated weight paths.
  • When using the validation file, make sure that your validation set or test set must contain the target of each category, and you only need to modify ' --num-classes', '--data-path' and '--weights' when using it. Yes, try not to change other codes.

Guess you like

Origin blog.csdn.net/March_A/article/details/130569656