归一化系数

faster rcnn

$L(\{p_{i}\},\{t_{i}\}) = \frac{_{1}}{N_{cls}}(L_{cls}(p_{i},p_{i}^{*})) + \lambda \frac{_{1}}{N_{reg}}(L_{reg}(t_{i},t_{i}^{*}))$
其中呢， $N_{cls}$ 代表了batchsize的大小；而 $N_{reg}$ 则是anchor位置的数目，也即输入feature map的pixel数目。而 $\lambda$ 取值为1 - 100时，最终结果相差不大。
值得思考的是

$\lambda$	0.1	1	10	100
mAP(%)	67.2	68.9	69.9	69.1

faster rcnn每个anchor location重叠使用了9个anchor，也就是 $\lambda=0.1$ 时，第二项代表了anchor数目的归一化，实际论文中给出的效果却有些差别。
$N_{reg} != N_{p^{*}}$

R-FCN

$L(s,t_{x,y,w,h})=L_{cls}(s_{c^{*}})+\lambda[c^{*}>0]L_{reg}(t,t^{*})$
$c^{*}$ 是类别，背景为0； $t^{*}$ 是对应真值的坐标(x,y,w,h)。ROI与GroundTruth的IOU>0.5作为真值。

这里 $\lambda$ 取1，两loss项没有做归一化
R-FCN的输出可以是 $k^{2}(C+1)、 k^{2}(4)$ 或者 $k^{2}(4C)$ ，即前一种是两个并行分支，后一种是每一个自带回归框。不管怎么样，这个head是不预测confidence了（two-stage特有的？）
这里的OHEM困难样本挖掘也是和网络结构挺相匹配的，直接取loss中高的作为困难样本B/N，所以逻辑上快了一步

rpn on conv4; k*k=7*7; no hard example mining

R-FCN with ResNet-101 on	conv4,stride=16	conv5,stride=32	conv5,a trous, stride=16
mAP(%) on VOC07 test	72.5	74.0	76.6

还有一个好玩是实验

method	RoI ouput size(k*k)	mAP on VOC 07 (%)
naive Faster R-CNN	1*1	61.7
naive Faster R-CNN	7*7	68.9
ResNet 101 Faster R-CNN	7*7	76.4
ResNet 101 Faster R-CNN 300ROI forward for OHEM; 128ROI for backpropogation	7*7	79.3
class-specific RPN	-	67.6
R-FCN (w/o position-sensitivity)	1*1	fail
R-FCN	3*3	75.5
R-FCN	7*7	76.6
ResNet 101 R-RCN 300ROI forward for OHEM; 128ROI for backpropogation	7*7	79.5

RetinaNet

使用focal loss，使用了对anchor归一化系数：被认为是真值的框的个数

结构的发展

two stage的头部结构

faster rcnn
- 是在roipooling之后跟fc层
Resnet
- 在conv4之后接入全卷积的形式，是conv5的head
- 为保证fully convolution对分类任务的转变不变性，插入了roi pooling层
R-FCN
- 同为全卷积层，但引入了position-sensitive pool，对resnet生成的 $k^{2}(C+1)$ 维度的特征图分层的取average pooling。在这附近，(C+1)一直作为一个整体出现的，有变化的是取值窗口bin在k x k的窗口上滑动的同时，在input feature的深度维上也滑动。

训练策略

name	epoches	starting lr	rate decay policy	weight decay	momentum	data augmentation	positive example	negative ones
yolov2	160 epoches	$10^{−3}$	lr*0.1 at 60,90 epochs	0.0005	0.9	similar to SSD
RetinaNet	90k batches	0.01	lr*0.1 at 60,80 batches	0.0001	0.9	horizontal image flipping	IoU>0.5	Iou<0.4
R-FCN	30k batches	$10^{−3}$	{ $10^{−3}$ :20k batches; $10^{−4}$ :10k batches}	0.0005	0.9		IoU>=0.5	IoU<0.5
RPN		$10^{−3}$	{ $10^{−3}$ :60k batches; $10^{−4}$ :20k batches}	0.0005	0.9		IoU>0.7 or highest IoU	IoU<0.3
FPN	-	0.02	{0.02:30k batches;0.002:10k batches}	0.0001	0.9	*using anchors outside the image when training

物体检测碎片知识

物体检测碎片