Target detection - RCNN family

	基于深度学习的目标检测RCNN家族

1. R-CNN
R-CNN series papers (R-CNN, fast-RCNN, faster-RCNN) are the originator papers of object detection using deep learning, among which fast-RCNN and faster-RCNN follow the idea of ​​R-CNN .
The full name of R-CNN is region with CNN features. In fact, its name is a good explanation. Use CNN to extract the featues in Region Proposals, and then perform SVM classification and bbox regression.
2. The overall process of R-CNN
insert image description here

RCNN的流程:

(1) Image segmentation is performed on the image first, and it is divided into 2k image regions.
(2) Pre-train a CNN network for image classification
(3) Extract the features of the cut region image through the pre-trained CNN network, without performing the fully-connected layer of the CNN network, and only perform feature extraction on the image (4) Feature
extraction The extracted image establishes a corresponding SVM classifier, and the image features are input for classification.
(5) Pass the feature extracted image to the positioning model network, locate the center position, width and height of the target in the area, and train the network
insert image description here

(1) Selective Search working principle:

insert image description here

insert image description here

(2) Bounding Box Regression (boundary box regression):
insert image description here

Loss function:
insert image description here

(3) R-CNN architecture diagram
insert image description here

(4) : Computational bottleneck of R-CNN
insert image description here

3. Fast R-CNN
insert image description here

(1) Fast R-CNN architecture diagram

insert image description here
insert image description here

	可以看出Fast RCNN主要有3个改进:

1. Convolution is no longer performed on each region proposal, but directly on the entire image, which reduces a lot of repeated calculations. It turns out that RCNN performs convolution on each region proposal separately, because there are about 2000 region proposals in an image, and the overlap rate between them must be high, so repeated calculations occur.
2. Use ROI pooling to transform the size of the feature, because the input of the fully connected layer requires the same size, so the region proposal cannot be directly used as the input.
3. Put the regressor into the network to train together, each category corresponds to a regressor, and replace the original SVM classifier with softmax.

(2) ROI pooling
insert image description here
insert image description here
insert image description here

(3) Loss function

insert image description here

(4) Summary
insert image description here

4. Faster R-CNN

insert image description here

Region Proposal Network(RPN):

insert image description here

That is, the region candidate network, which replaces the Selective Search of the previous RCNN version, and is used to generate candidate boxes. There are two parts to the task here, one is classification: judge whether all preset anchors are positive or negative (that is, whether there is a target in the anchor, two classifications); there is also a bounding box regression: correct anchors to get more accurate proposals. Therefore, the RPN network is equivalent to doing a part of detection in advance, that is, judging whether there is a target (the specific category is not judged here), and correcting the anchor to make the frame more accurate.
(1) Training steps
insert image description here

(2) Loss function
insert image description here

(3) Summary

insert image description here

5. Schematic diagram of RCNN family summary

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43391596/article/details/127999981