论文阅读:Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

1、论文总述

在这里插入图片描述

从DCNv2过来的,因为DCNv2里面提到了这篇论文里对RCNN分类能力的再次利用然后以知识蒸馏的方式提出了RCNN mimicking(更好地利用DCNv2的特征提取能力),这篇论文主要是讨论了RCNN的优点和Faster RCNN的一些问题,然后提出了DCR模块( Decoupled Classification Refinement )再次利用RCNN的分类能力与Faster RCNN或者DCN(可变形卷积网络)结合都能在VOC数据集上涨点。

这篇论文也刷新了我对RCNN的认识,有点温故而知新的意思,虽然RCNN方法比较笨速度比较慢,但是没有现在检测网络里的一些乱七八糟的问题,比如anchor 与 特征对不齐的问题、大小目标的感受野不一样的问题、多任务学习里检测任务和分类任务的优化目标不一致问题等,推荐一读。

Following the above argument, we propose a simple yet effective approach,
named Decoupled Classification Refinement (DCR) , to eliminate high-scored
false positives and improve the region proposal classification results. DCR decouples the classification and localization tasks in Faster RCNN styled detectors.
It takes input from a base classifier, e.g. the Faster RCNN, and refine the classi-
fication results using a RCNN-styled network. DCR samples hard false positives,
namely the false positives with high confidence scores, from the base classifier,
and then trains a stronger correctional classifier for the classification refinement.
Designedly, we do not share any parameters between the Faster RCNN and
our DCR module, so that the DCR module can not only utilize the multi-task
learning improved results from region proposal networks (RPN) and bounding
box regression tasks, but also better optimize the newly introduced module to
address the challenging classification cases.

在网上找解读的时候,很少,看到了一篇质量比较高的解读,论文的主要内容里面都有,自己就不写了:Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

2、two-stage与one-stage的两个注意点

However, two-stage detectors are slow in speed and require
very large input sizes due to the ROI Pooling operation. Aimed at achieving real
time object detectors, one-stage method, such as OverFeat [28], SSD [9,24] and
YOLO [25,26], predict object classes and locations directly. Though single stage
methods are much faster than two-stage methods, their results are inferior and
they need more extra data and extensive data augmentation to get better results. Our paper follows the method of two-stage detectors [10, 11, 27], but with
a main focus on analyzing reasons why detectors make mistakes.(two-stage需要更大的输入,one-stage需要更多的数据增广)

3、Faster RCNN的失败案例及其失败原因

失败案例:
在这里插入图片描述

失败原因:(1 、2、3与图案例对应)

such errors are mainly due to three reasons:
(1) Shared feature representation
for both classification and localization may not be optimal for region proposal
classification, the mismatched goals in feature learning lead to the reduced classification power of Faster RCNN; (分类问题的平移不变性与定位问题的平移变性矛盾)
(2) Multi-task learning in general helps to
improve the performance of object detectors as shown in Fast RCNN [10] and
Faster RCNN, but the joint optimization also leads to possible sub-optimal to
balance the goals of multiple tasks and could not directly utilize the full potential
on individual tasks; (多任务学习导致分类能力没有完全挖掘出来)
(3) Receptive fields in deep CNNs such as ResNet-101 [15]
are large, the whole image are usually fully covered for any given region proposals. Such large receptive fields could lead to inferior classification capacity by
introducing redundant context information for small objects.(感受野是一样的,导致大小目标引入的context比例不一样,有可能小目标的上下文信息太多,大目标只有其一部分)

对应着图分析:

Faster RCNN produces 3 typical types of hard false positives, as shown in Fig 3:
(1) The classification is correct but the overlap between the predicted box and
ground truth has low IoU, e.g. < 0.5 in Fig 3 (a). This type of false negative
boxes usually cover the most discriminative part and have enough information
to predict the correct classes due to translation invariance.
(2) Incorrect classification for predicted boxes but the IoU with ground truth are large enough , e.g.in Fig 3 (b). It happens mainly because some classes share similar discriminative parts and the predicted box does not align well with the true object and happens to cover only the discriminative parts of confusion. Another reason is that
the classifier used in the detector is not strong enough to distinguish between
two similar classes.
(3) the detection is a “confident” background, meaning that
there is no intersection or small intersection with ground truth box but classi-
fier’s confidence score is large, e.g. in Fig 3 ©. Most of the background pattern
in this case is similar to its predicted class and the classifier is too weak to distinguish. Another reason for this case is that the receptive field is fixed and it
is too large for some box that it covers the actual object in its receptive field.

4、three principals to design a better object detector.

Decoupled Features
Current detectors still place classification head and localization head on the same backbone, hence we propose that classification head
and localization head should not share parameter (as the analysis given in Section 3), resulted in a decoupled feature using pattern by RCNN.
Decoupled Optimization
RCNN also decouples the optimization for object
proposal and classification. In this paper, we make a small change in optimization. We propose a novel two-stage training where, instead of optimizing the sum
of classification and localization loss, we optimize the concatenation of classifi-
cation and localization loss, Ldetection = [Lcls + Lbbox, Lcls], where each entry is
being optimized independently in two steps.
Adaptive Receptive Field
The most important advantage of RCNN is that its
receptive field always covers the whole ROI, i.e. the receptive field size adjusts
according to the size of the object by cropping and resizing each proposal to
a fixed size.

5、DCR的crop&resize与ROI pooling的resize&fix的不同

主要是不同就是一个在原图上做,一个在feature map上做

Noticed that this processing is very similar to moving an ROI Pooling from final
feature maps to the image, however, it is quite different than doing ROI Pooling
on feature maps. Even though the final output feature map sizes are the same,
features from ROI Pooling sees larger region because objects embedded in an
image has richer context. We truncated the context by cropping objects directly
on the image and the network cannot see context outside object regions.

参考文献

1、Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

发布了71 篇原创文章 · 获赞 56 · 访问量 6万+

猜你喜欢

转载自blog.csdn.net/j879159541/article/details/102731619