Object detection networks on convolutional feature maps

Abstract:

1. Introduction

Use convolutional layers to extract region-independent features, followed by region-wise MLPs for classification.

We focus on region-wise classifier architectures that are on top of the shared region-independent convolutional features.we call them ‘networks on convolutional feature maps’

我们使用了去掉全连接层的backbone，resnet，googlenet和vgg并没有直接的提升目标检测的准确率，因此networks on convolutional feature maps是一个更重要的因素。

2. Related work

Traditional object detection

ConvNet-based object detection

3. Ablation experiments

Mainly designed based on the sppnet system.we consider following setting:

the shared features maps are frozen so we can focus on the classifiers
the proposals are pre-computed from selective search(replaced by RPN)
the training step ends with a post-hoc SVM

Experimental settings

Outline of Method (More attention on NoCs)

2000 region proposals by selective search

Roi pooling prodduces a fixed-resolution (m*m) feature map for each region

我们把m*m的feature map作为一个新的数据源并设计Noc结构来分类这些数据。

NOC结构最后输出的是一个n+1维，n是目标，1是背景，使用SGD和反向传播

3.1Using MLP AS NoC

最简单的设计就是使用fc layers。我们讨论了2-4层fc，the last fc layer is always (n+1)-d with softmax, and the other fc layers are 4096d(with Relu).

3.2 Using ConvNet as Noc

In recent detection systems,预训练的卷积网络被认为是区域独立的特征提取器，在不区分roi上进行共享。尽管节约了计算，但是忽视了使用卷积网络去学习region-aware特征的机会。我们从Noc出发，Noc部分也需要自己的conv layer

the mAP is nearly unchanged when using 1 additional conv layer, but drops when using more conv layers. We observe that the degradation is a result of overfitting.

3.3 Maxout for Scale Selection

We incorporate a local competition operation(maxout) into NoCs to improve scale selection from the feature pyramid

To improve scale invariance, for each proposal region we select two adjacent scales in the feature pyramid. Two fixed-resolution (m × m) features are RoI-pooled, and the NoC model has two data sources.

3.4 Fine-tuning NoC

In the above,all Noc architectures are initialized randomly.对Noc部分进行迁移。

3.5 Deep features vs deep classifiers

This means that for exploiting very deep networks, the depth of features and the depth of classifiers are important

3.6 Error Analysis

The error can be roughly decomposed into two parts: localization error and recognition error.

Locaization error is defined as the false positives that are correctly categorized but have no sufficient overlapping with ground truth.

Recognition error

3.7Comparisons of results

3.8 Summary of observations

(i) A deeper region-wise classifier is useful and is in general orthogonal to deeper feature maps.
(ii) A convolutional region-wise classifier is more effective than an MLP-based region-wise classifier

4. NoC for faster rcnn with resnet

We demonstrate that the NoC design is an essential factor for Faster R-CNN [14] to achieve superior
results using ResNets

a deep and convolutional NoC is an essential factor for Faster R-CNN + ResNet to perform accurate object detection.

Object detection networks on convolutional feature maps

猜你喜欢