2018 CVPR
Acquisition of Localization Confidence for Accurate Object Detection
PreciseRoIPooling 代码
ECCV 2018 | 旷视科技 Oral 论文解读:IoU-Net 让目标检测用上定位置信度
建议先自己看一遍论文,然后再看下面的总结
IoU-Net
解决问题 : nms 过程中,是挑选 分类置信度最大的值的框,但是它不一定框的准
Two drawbacks in object localization
- the misalignment between classification confidence and localization accuracy
- the non-monotonic bounding box regression
joint training
-
Backbone
ResNet-FPN -
FPN
-
Precise RoI Pooling
-
Head
works in parallel
based on the same visual feature from the backbone- IoU predictor
- R-CNN
- classification and regression brance take 512 RoIs per image from RPNs
Training
- img (800,1200)
- batch size 16
- lr 0.01
- iteration 160k
- warm up 0.004 ,10k
Training the IoU detector
- smooth-L1 loss
- IoU labels
normalized , distributed over [-1,1]
Inference
- first apply bounding box regression for the initial coordinates
- IoU-guide NMS
on all detected bounding boxes - refine using optimization-based algorithm
100 bounding boxes with highest classification confidence
Predict IoU
IoU predictor
-
aim
- takes features from the FPN
- estimates the localization accuracy (IoU) for each bounding box
-
data generation
-
generate candidate bounding box set
generate bounding boxes and labels for training the IoU-Net : augmenting the ground-truth,instead of taking proposals from RPNs
for all ground-truth bounding box in training set , manually transform them with a set of randomized parameters -
remove the bounding box having an IoU < 0.5 with the matched ground-truth
-
-
feature
- extracted from the output of FPN with the proposed PrRoI-Pooling layers
- then fed into a two-layer feedforward network for the IoU prediction
-
use class-aware IoU predictors
IoU-guided NMS
-
use the predicted IoU instead of the classification confidence as the ranking keyword for bounding boxes.
-
to determine the classification scores
- select the box having the highest IoU with a ground-truth
- eliminate all other boxes having an overlap greater than threshold nms
- for a group of bounding boxes matching the same ground-truth, we take the most confident prediction for the class label.
highest IoU 的框的分类置信度 是其和他匹配同一gt的并大于阈值被滤掉的框的分类置信度的最大值
-
Algorithm
- ① 从bounding box集合 B 中依次选取预估IOU(localization confidence)最高的bounding box(记为 b m b_m bm)
- ② 将与其IOU高于一定阈值的bounding box一个个选出来,并将这些bounding box(包括最开始选的 b m b_m bm )的最高classification confidence记为 s s s
- ③ 将 ( b m , s ) (b_m,s) (bm,s) 二元组记录到集合 D 中 (本质是 bounding box和cls conf的重新分配)
Optimization-based bounding box refinement
- Algorithm
- 对于检测到的bounding box,利用 PrPool 提取内部特征并算出 IOUnet 预测的IOU,记其梯度为grad,这个IOU记为PrevScore
- 然后更新bounding box
- 更新之后重新进行IOU预测结果为NewScore
- 如果 prevscore 和 newscore 相差小于一个early-stop阈值或者 newscore 比 prevscore 低于一个“定位退化容忍度”,则认为该bounding box更新完毕。
PrPool
- 连续
- 可导