Deep Neural Networks for Object Detection

转自:https://blog.csdn.net/u012420309/article/details/52763788

采用的是AlexNet,不过稍作修改。

原AlexNet网络:

具体改进:

1. 把最后一个层softmax改成a regession layer.

predict a mask of a fixed size. 1代表this pixel lies withon the bounding box. 0 没有。

训练阶段,优化目标函数:m属于[0,1]^N

一般的L2只有上式的后半段,我猜前半段是正则化。

m=d*d。论文中d选取24.

5 Precise Object Localization via DNN-generated Masks    //DNN 深度神经网络的意思。。

some challenge:

1. a single object mask not be sufficient to disambiguate objects which are placed next to each other. //当两个目标挨着,输出mask也是挨着,不能分辨,会被当成一个目标mask输出。

2. 由于输出的size的限制,we generate masks that are much smaller than the size of the original image.例如:输入400*400,输出24*24,each output would correspond to a cell of size 16*16, 不足以precisely localize an object,尤其当目标很小的时候。(不能精确定位)

3. 因为我们input the full image, 小目标will affect very few input neurons and thus will be hard to recognize.

5.1 Multiple Masks for Robust Localization

为了解决第一个问题,我们generate several masks, 每个代表the full object or part of it.

我们用一个网络来预测the object box mask, 额外的4个网络去预测 four halves of the box: bottom,top,left and right halves.五个predictions are over-complete 但可以帮助减少不确定并能有效处理某些预测错误的情况。

在训练阶段,我们需要convert the object box to these five masks。因为masks 比原图像小,我们需要对the ground truth mask 下采样到output size.

5.2 Object Localizeation from DNN Output

5.3 Multi-scale Refinement of DNN Localizer

6 DNN Training

本网络的优点之一是简单,然而需要大量的训练样本: objects of different sizes need to occur at almost every location.

we generate several thousand samples from each image divided into 60% negative and 40% positive samples。

因为定位比分类难,it is important to start with the weights of a model with high quality low-level filters. To achieve this, 我们先训练一个分类网络,然后用训练出的权重去定位,并对网络进行微调。

the networks were trained by stochastic gradient using ADAGRAD to estimate the learning rate of the layers automatically.

结果:voc2007是一个检测并分类的任务/数据集。

On a 12-core machine, our implementation took about 5-6 secs per image for each class. 这个时间也太慢了吧。。

猜你喜欢

转载自blog.csdn.net/qq_20481015/article/details/82694595