1、论文总述

在这里插入图片描述
本文是CVPR2018通用目标检测的一篇Oral，在Faster R-CNN框架下，通过级联多个分类器，逐步提高目标定位的精度，向高精度的目标检测器迈出了坚实的一步，作者选用的级联结构共有4个stages（RPN算第一个）， IoU阈值分别设定为递进的0.5/0.6/0.7，大于该阈值的选为正样本，其余均为负样本。

作者认为，一个单独的检测器只能预测某个质量级别的proposals，如RPN提供一些比较粗糙的正样本，即这些proposal与GT的IOU可能在0.5或0.6，而后面的一个stage可以继续微调这些比较粗糙的proposal，将他们调的与GT进一步接近，但这个stage只能调整个大概，如果输入的是比较接近GT的proposal，那这个stage则没有足够的能力继续微调它，所以作者又在后面级联了几个检测器，让他们的正负样本iou分隔值变高，这样进来的正样本都是IOU0.6以上的，这个stage经过训练就有能力可以继续微调这些比较接近GT的proposal，就是这个stage的活儿变得细了，然后在后面的stage，它的正负样本的IOU分隔值接着增大，即可以为0.7，那这个最后的stage的活儿更细，需要把这些更高质量的proposal继续往GT靠近。

同时作者在论文里提到一个 mismatch 的问题：

意思是训练阶段和测试阶段的不匹配问题，即在training阶段，由于我们知道gt，所以可以很自然的把与gt的iou大于threshold（0.5）的Proposals作为正样本，这些正样本参与之后的bbox回归学习。在inference阶段，由于我们不知道gt，所以只能把所有的proposal都当做正样本，让后面的bbox回归器回归坐标。

就是说训练阶段和测试阶段送进box回归器的proposal的分布不一样，训练阶段送进去的质量高，可以对他们做出相应调整，但测试阶段的proposal的质量参差不齐，网络有可能不知道该怎么调整。

In this work, we define the quality of an hypothesis as its
IoU with the ground truth, and the quality of the detector as
the IoU threshold u used to train it. The goal is to investi
gate the, so far, poorly researched problem of learning high
quality object detectors, whose outputs contain few close
false positives, as shown in Figure 1 (b). The basic idea is
that a single detector can only be optimal for a single quality level. This is known in the cost-sensitive learning literature [7, 24], where the optimization of different points of
the receiver operating characteristic (ROC) requires different loss functions. The main difference is that we consider
the optimization for a given IoU threshold, rather than false
positive rate.

In general, a detector optimized at a
single IoU level is not necessarily optimal at other levels.
These observations suggest that higher quality detection requires a closer quality match between the detector and the
hypotheses that it processes. In general, a detector can only
have high quality if presented with high quality proposals

文中提到的重采样策略是指：这个stage调整过后的box，整体来说他们的IOU是高于输入的，这时候他们的IOU已经高了，相当于box的IOU分布变了，然后拿着这个去训练下一阶段的stage。

2、cascade RCNN与iterative BBox architecture的不同

It differs from the iterative BBox architecture of Figure
3 (b) in several ways.
First, while iterative BBox is a postprocessing procedure used to improve bounding boxes, cascaded regression is a resampling procedure that changes the
distribution of hypotheses to be processed by the different
stages.
Second, because it is used at both training and inference, there is no discrepancy between training and inference distributions.
Third, the multiple specialized regressors {fT , fT −1, · · · , f1} are optimized for the resampled distributions of the different stages. This opposes to the
single f of (3), which is only optimal for the initial distribution. These differences enable more precise localization
than iterative BBox, with no further human engineering

3、与其他网络的性能比较

注：The Cascade R-CNN, based on FPN+ （FPN加了ROI align）and ResNet-101
backbone, is compared to state-of-the-art single-model object detectors in Table 5.
在这里插入图片描述

参考文献

1、Cascade R-CNN 详细解读

贾小树

发布了71 篇原创文章 · 获赞 56 · 访问量 6万+

私信关注

论文阅读：Cascade R-CNN: Delving into High Quality Object Detection

文章目录

1、论文总述

2、cascade RCNN与iterative BBox architecture的不同

3、与其他网络的性能比较

参考文献

猜你喜欢