Faster R-CNN understood

1. Introduction to Object Detection

目标检测是一项非常重要的视觉任务,她可以带动实例分割、动作识别等应用的发展。目标检测技术日趋成熟,逐渐发展成两种主流的检测方法。单阶段的目标检测 以及 双阶段的目标检测。
双阶段的目标检测:先是判断anchor是foreground还是background,再判断foreground中的目标类别;R-CNN家族
单阶段的目标检测:直接利用滑窗的方法等判断anchor中的目标是什么类别。YOLO家族, SSD

Target detection includes targeting, target recognition, the difference between them can be expressed as follows:
(1) classification: What is?
(2) Orientation: Where? What is? (Single target)
(3) detection: Where? What are they? (Multiple target)

2.Faster R-CNN frame

Faster R-CNN mainly includes four parts. Feature extraction, regional nomination, ROI pooling, Classification and Regression

(1) Feature extraction (CNN)
that extracts a feature part of her use of CNN (feature map), for ROI pooling region as well as the nomination of the subsequent network RPN, FC layer
(2) nominated area (RPN)
traversal feature map, as each pixel assigned nine anchor (corresponding to multi-scale detection); and then classified by the target and background binary softmax
while anchor regression offsets, to obtain a relatively accurate anchor. This step is mainly the nomination relatively precise area (region proposal) to produce the target
(3) ROI pooling
accept the nomination from the RPN area network produce different sizes, plus CNN extracted feature map generating region feature map, then unified pooling operations
(the region feature map into the same grid, for pooled) to produce a characteristic expression of the same dimensions. Take her back is FC layer, this step is necessary.
(4) Classification and Regression (FC, SoftMax)
through the ROI pooling produce a characteristic expression of the same dimensions, then a further layer FC, two routes, along with object recognition SoftMax; another direct return path offset anchor,
so that accurate or anchor.

3. several important issues

Where (1) Faster R-CNN advantage?
First, Faster R-CNN to feature extraction, regional nominations, target classification and regression bounding box integrated in a network of them, she is totally differentiable, you can end training,
overall performance has been significantly improved, she detection speed is reached excellent results.
Second, instead of using a conventional area nomination sliding window, selective search methods such as, she directly RPN generated network. Significantly improve the speed
(2) wherein design features extracted VGG16 the
convolution kernel size 3x3, padding = 1, stride = 1; the size of the image before and after the change is not so convolutional
cell layer size 2x2, padding = 0, stride = 2; each cell of such a width and height of the image size becomes half. After four pooling, feature map width and height of 1/16 of the original image
featuure map so Conv layers can be generated and picture association.
(3) the structure of the network RPN what?
RPN's aim is to generate the target region proposal. She mainly by two routes: one for determining whether anchor foreground; return all the way to anchor offset
close to the first 3x3 convolution by fusion, and then through the two routes 1x1 convolution changing the number of channels. Classification becomes 2x9 (9 th anchor, foreground, background score); regression becomes 4x9 (anchor has a center coordinate, four variable width and height)
here, equivalent to the completion of the targeting function.
Necessity (4) ROI pooling of
To deal with the size of the image to a fixed size of the image before we sent to CNN. There is a general approach: crop a portion of the incoming network from the image; the size of the image warp back to the desired incoming network
but these two approaches is not very good. After the complete destruction of the crop in the structure of the image, and deterioration of the image warp original shape information. Thus, ROI pooling necessary
her works:
first parameter using spatial_scale dimension feature map which maps back to (M / 16) x (N / 16) size, which is due to the proposal corresponding MxN scale;
then each feature map divided into regions corresponding to the level of proposal {pooled_w} x {pooled_h} grid;
for every grid max pooling process are performed.
(5) Faster R-CNN training process (loop iteration 2)
[1] has been trained on Model, network training RPN corresponding stage1_rpn_train.pt
[2] the step of using a trained network RPN collected proposals corresponding rpn_test.pt
[. 3] of the first training Fast RCNN network, corresponding to stage1_fast_rcnn_train.pt
[4] a second network training RPN corresponding stage2_rpn_train.pt
[. 5] step 4 reuse RPN trained network, collect Proposals, corresponds rpn_test.pt
[. 6] Fast RCNN second training the network, the corresponding stage2_fast_rcnn_train.pt
diagram is as follows:

4. Reference

https://zhuanlan.zhihu.com/p/31426458
https://www.cnblogs.com/gujianhan/p/6035514.html
https://zhuanlan.zhihu.com/p/24916624

Guess you like

Origin www.cnblogs.com/laokanblog/p/11013354.html