基于深度学习的目标检测RCNN家族
1. R-CNN
R-CNN series papers (R-CNN, fast-RCNN, faster-RCNN) are the originator papers of object detection using deep learning, among which fast-RCNN and faster-RCNN follow the idea of R-CNN .
The full name of R-CNN is region with CNN features. In fact, its name is a good explanation. Use CNN to extract the featues in Region Proposals, and then perform SVM classification and bbox regression.
2. The overall process of R-CNN
RCNN的流程:
(1) Image segmentation is performed on the image first, and it is divided into 2k image regions.
(2) Pre-train a CNN network for image classification
(3) Extract the features of the cut region image through the pre-trained CNN network, without performing the fully-connected layer of the CNN network, and only perform feature extraction on the image (4) Feature
extraction The extracted image establishes a corresponding SVM classifier, and the image features are input for classification.
(5) Pass the feature extracted image to the positioning model network, locate the center position, width and height of the target in the area, and train the network
(1) Selective Search working principle:
(2) Bounding Box Regression (boundary box regression):
Loss function:
(3) R-CNN architecture diagram
(4) : Computational bottleneck of R-CNN
3. Fast R-CNN
(1) Fast R-CNN architecture diagram
可以看出Fast RCNN主要有3个改进:
1. Convolution is no longer performed on each region proposal, but directly on the entire image, which reduces a lot of repeated calculations. It turns out that RCNN performs convolution on each region proposal separately, because there are about 2000 region proposals in an image, and the overlap rate between them must be high, so repeated calculations occur.
2. Use ROI pooling to transform the size of the feature, because the input of the fully connected layer requires the same size, so the region proposal cannot be directly used as the input.
3. Put the regressor into the network to train together, each category corresponds to a regressor, and replace the original SVM classifier with softmax.
(2) ROI pooling
(3) Loss function
(4) Summary
4. Faster R-CNN
Region Proposal Network(RPN):
That is, the region candidate network, which replaces the Selective Search of the previous RCNN version, and is used to generate candidate boxes. There are two parts to the task here, one is classification: judge whether all preset anchors are positive or negative (that is, whether there is a target in the anchor, two classifications); there is also a bounding box regression: correct anchors to get more accurate proposals. Therefore, the RPN network is equivalent to doing a part of detection in advance, that is, judging whether there is a target (the specific category is not judged here), and correcting the anchor to make the frame more accurate.
(1) Training steps
(2) Loss function
(3) Summary
5. Schematic diagram of RCNN family summary