I. Summary
refinement module comprises two modules (anchor refinement module and object detection module)
anchor refinement module --- remove anchor negative samples, narrowing the search space classifier
The location and size of the object detection module --- rough adjustment anchors, and provide better anchors for the following return
While the design a transfer connection block, wherein the transfer anchor refinement module object detection module to the target module to predict the location, size and category
Second, Introduction
Compared to a one-stage detector, Faster RCNN, R-FCN, FPN has three advantages:
1. two-stage structure of the sample can handle heuristic category does not balance.
2. The use of two-stage cascade regression parameters of the goal box.
3. two-stage target features described
40.2 FPS on Tian X GPU (320x320)
24.1FPS on Tian X GPU (512x512)
Third, related work
Classic target detection:
Using a window mechanism paddling, characteristics and classification by hand to locate the target image in a dense grid. Viola and Jones use haar features and adaBoost human face detection training series cascade classifier. DPM larger target processed member using the strain variation model.
two-stage method:
two-satge comprises two portions, the first portion generates sparse object proposals, position and a second portion defined category of target
one-stage method:
yolo1-yolo3
yolo1直接利用一个前向网络预测目标的位置和类别,yolo2在yolo1的基础上增加了bn, 使用高分辨率的分类器,anchor boxes等,SSD在不同的层采用不同尺度的anchors,不同的层进行预测。 Focal loss处理类别不均衡问题。
四、网络结构
1. Transfer connection block
TCBs的一个功能是将ARM中不同层的特征转换到ODM所需要的形式,另一个功能是融合大尺度的上下文信息到转换的特征中去提升检测精度,为了匹配他们的维度,采用deconv操作提高高层特征的尺度,然后与转换的特征对应元素相加,在相加之后增加一个卷积层确保特征的判别性。
2. two-stage 级联回归
使用ARM首先调整anchors的位置和大小,然后传递给ODM。在每个特征图的cell中,从原始的anchors 中预测4个偏移量和相应的置信度,得到refined的anchors之后,将其传递给ODM进一步的生成目标的类别和精确的目标位置和大小。每个refined anchors 生成c+4的输出。
3. Negative anchors filtering
在训练阶段,对于一个refined anchor, 如果他的负样本得分高于一个阈值(0.99),则在训练ODM时舍弃该样本,仅仅传递hard negative anchor和 refined positive anchor来训练ODM, 同时,在测试阶段,Negative anchor高于一个阈值,则舍弃。
五、 训练和推理
1. data augmentation
随机扩大和裁剪训练图像(方法参照SSD: single shot multibox detector)
2. backbone network
VGG-16, ResNet-101
3. anchors design 和matching
按照total stride size 8, 16, 32 ,64选择四个特征层,每个特征层有一个特定尺度的anchor(这个anchor 的大为该层total stride的4倍大小)和3个其他尺度的anchor(0.5, 1.0, 2.0)
4. hard negative mining
大部分的anchor为负样本Negative:positive= 3:1,选择loss值大的负样本
5. loss function
loss=loss_arm+loss_odm
在ARM阶段,给每个anchor一个二元标签(是目标或者不是目标),并且回归位置和大小得到refined anchors, 然后pass refined anchors(negative anchor 置信度高于阈值的舍弃)到ODM,然后得到精确的目标位置和大小