1、论文总述

在这里插入图片描述

这个是RefineDet的检测框架。该框架由两个模块组成，即上面的Anchor Refinement Module（ARM）和下面的Object Detection Module（ODM），它俩是由中间的Transfer Connection Block（TCB）模块连接。
在这个框架中，ARM模块专注于二分类任务，为后续ODM模块过滤掉大量简单的负样本，作用类似与RPN；同时进行初级的边框校正，为后续的ODM模块提供更好的边框回归起点。ARM模块模拟的是二步法中第一个步骤，如Faster R-CNN的RPN。

（注：这里我现在仍有疑问，就是ARM模块后续没有接着ROI pooling，那ARM模块它是如何把微调过后的anchor传给后续的ODM模块的？？？ ，难道是：： ODM某一层输入WHchannel的tensor，输出WHN*(Class+4)的tensor，然后先舍去掉ARM已经filter out 的WHN个中的一些anchor，并且将两次的reg偏移量相加得到最终的reg。如果是这样的话那ODM和ARM其实是分别预测的，并不是ODM在ARM预测出的框上进行精调。感觉应该不是这样！！后续有缘的话希望能解决这个疑惑吧。。。）

ODM模块把ARM优化过的anchor作为输入，专注于多分类任务和进一步的边框校正。它模拟的是二步法中的第二个步骤，如Faster R-CNN的Fast R-CNN。
其中ODM模块没有使用类似逐候选区域RoIPooling的耗时操作，而是直接通过TCB连接，转换ARM的特征，并融合高层的特征，以得到感受野丰富、细节充足、内容抽象的特征，用于进一步的分类和回归。因此RefineDet属于一步法，但是具备了二步法的二阶段分类、二阶段回归、二阶段特征这3个优势。

后续还有发展： 2019年的AlignDet差不多就是在这篇RefineDet上进行改进的，就是利用DCN把one-stage中的feature进行了对齐。（RPDet认为AlignDet是他们的论文中的baseline）

现在没有什么时间写对这些论文的总结了，发现一篇写的比较好的博客，转移到这就行，我自己就不写特别详细的解读了：

RefineDet算法笔记

『计算机视觉』物体检测之RefineDet系列（这篇博主的博客写的很好很全面，比较对口，以后有时间把他的博客都得刷一下）

2、TCB模块的图示与功能

在这里插入图片描述

作用：

1、The features in the ARM focus on distinguishing positive anchors
from background. We design the TCB to transfer the features in the ARM
to handle the more challenging tasks in the ODM, i.e., predict accurate
object locations, sizes and multi-class labels.（我现在有点怀疑是由于这个对特征的转换模块，所以将ARM学到的偏移量传给了后续的ODM模块，是隐士的传，并没有显示的从anchor层面上传）
2、Notably, from the ARM, we only use the TCBs on the feature maps associated with anchors. Another function of the
TCBs is to integrate large-scale context [13, 27] by adding
the high-level features to the transferred features to improve
detection accuracy. To match the dimensions between them,
we use the deconvolution operation to enlarge the high-level
feature maps and sum them in the element-wise way. Then,
we add a convolution layer after the summation to ensure
the discriminability of features for detection. The architecture of the TCB is shown in Figure 2

注：在CNN之前，好像是DPM这个算法一直占据着VOC检测的榜首，后续要把这篇DPM算法论文看下！！

3、推理过程

Inference. At inference phase, the ARM first filters out the
regularly tiled anchors with the negative confidence scores
larger than the threshold θ, and then refines the locations
and sizes of remaining anchors. After that, the ODM takes
over these refined anchors, and outputs top 400 high con-
fident detections per image. Finally, we apply the nonmaximum suppression with jaccard overlap of 0.45 per
class and retain the top 200 high confident detections per
image to produce the final detection results.