Towards Universal Object Detection by Domain Attention

Papers and Codes

Papers Address: https://arxiv.org/abs/1904.04402

Code: http://www.svcl.ucsd.edu/projects/universal-detection/

Outline

This paper presents a general target detection system, applicable to different areas of an image without the need for prior knowledge of the field. By introducing a new series adaptation layer (SE and based on new field - focus mechanism). In the proposed universal detector, and calculating all parameters are shared among the field, and a single network is always treated in all areas. The authors have done experiments (11 different data set consisting of target detection) on new data sets, detect more effective than a group of single field detector, a multi-field detector and a universal detector baseline is good.

Introduction

Target detection tasks are diverse, there are species differences (face, horse, medical injury, etc.), there are also differences in camera view (from the aircraft, the autopilot images shot on cars, etc.), as well as image style (such as cartoons, clip art, watercolor, medical images, etc.) and so on. Most of the conventional detector is tied to a specific field (for training and testing on a single data set), partly because the detection target data is set and there is a non-trivial conversion therebetween diverse art.

As we all know, each set up a dedicated detector for the task in different areas to achieve good detection results. However, the practical application, the system may need to process a plurality of image fields. A little simple and crude way, we have to deal with areas of the image D, then D detectors training processes each field. However, the system is not necessarily clear which areas of the image appear is a point in time, and the model will be great. So researchers have proposed two options (image classification), one is to solve the multi-tasking on a common model, the other is to solve the same task in a number of areas. Target detection is much more complicated than the classification task.

This paper established a new universal detection target benckmark (including 11 different sets of target detection data), shown in Figure 1:

 

And proposed a series of target detection architecture for general purpose / multi-domain (Figure 2):

 

Wherein D represents a different field, O for output, A represents a domain-specific adapter, DA represents a text field proposed attention module, blue is the generic art, other specific color representation fields. Figure 2 (a) and (b) is a multi-field detector, which obviously require a priori knowledge of the art. (A) a number of areas dedicated detector is not shared and calculated parameters; convolving shared base layer (a), the lightweight and achieving domain-specific adaptation layer, i.e. (b). FIG. 2 (c) and (d) is a universal detector, (c) and all inter-field calculating parameters (except the output layer), it is difficult to cover all areas of non-trivial conversion effect is worse than the detection (B); ( d) the proposed method is the addition of the DA (domain attention) module, added first set of common SE adapter, based on the feature of interest is then introduced mechanisms to achieve sensitive areas. The SE module through a universal adapter library learning assigned to different network domains activated, and by art - to determine their response to the attention mechanism, so adapter can focus on their respective areas. Since this process is data-driven, the number of domains does not have to match the number of data sets, a data set can span multiple domains. Network can use the shared knowledge of cross-cutting.

Multi-domain Object Detection

General object detection benchmark (UODB): Pascal VOC, WiderFace, KITTI, LISA, DOTA, COCO, Watercolor, Clipart, Comic, Kitchen and DeepLesions.

Single-domain Detector Bank

Faster R-CNN as the Baseline, train detectors in each data set, respectively, to obtain 11 detectors. Each detector corresponding mean and variance of the convolution activation follows:

 

COCO VOC distribution and activation are similar, and DOTA, DeepLesion CrossDomain distribution and a relatively large difference. In addition, different statistical results of different layers. Field of the foregoing offset correction layer contribute more, it is more obvious than differences layer behind, the RPN is also obvious differences layer (although they are class independent). Layer and many different sets of data on similar statistics, in particular the intermediate layer, indicating that at least in some areas where they can share data.

Adaptive Multi-domain Detector

 

The model and the output layer are layers RPN domain-specific, partially shared network area (e.g. convolution all layers). In order to adapt to the new art, the paper presents some additional domain-specific layer (transfer compensation field, weight).

All modules are built using SE detector adapted art, for the following reasons: Adaptation field associated with feature-based attention mechanism, SE module adjusts the response of each channel according to the channel dependence, which may be seen as a feature-based attention mechanism; SENet but also on the SE module has a good effect on the classification ImageNet, and is a lightweight model.

SE Adapters

 

SE adapter comprises the following operations: First, using a global pool of Squeeze layer, and then through the bottleneck structure composed of two fully connected layers modeled correlation (the dimension between the first feature of the channel is reduced to 1 / r inputs, and after ReLU after activation layer through a fully-connected back to the original dimension l), which can have a more linear, and computation parameters significantly reduce the amount.

 

R 16 to take the text. Refers FSE FC + ReLU + FC.

Its target detecting a multi-field (referred to as SE adapter bank), shown in Figure 4b, add a branch SE adapter and a switching field for each field, you may be selected SE adapter associated art. 2b is a block diagram implemented, the model size is 1/5 FIG. 2a.

Universal Object Detection

In the above method, it requires a priori information field, in the automatic system and this is not desirable, such as robot system or autopilot. We designed a universal detector to solve this problem.

universal Detector

 

The simplest implementation as shown above, i.e. all the detectors share the same task, and the output of task-specific layer. This method is simple and crude, there is no specific parameter field force with the same parameters / expressed in all areas, poor detection results.

Domain-attentive Universal Detector

Ideally, a universal detector should have a certain sensitivity to the art, and can adapt to different areas. Difference is that the multi-field detection: first, the field must be automatically inferred; second, no binding areas and tasks.

And a common area often have many sub-areas to traffic scene, for example, the environment (urban, rural) and other sub-areas include weather conditions (sunny, rainy, etc.). In fact, the field may not be clear semantics that they can be data-driven. In this case, each detector is not necessarily required to work in a single domain, and assign the soft domains makes more sense. This paper presents DA (domain adaptation) module to break the limitations of a single network separate treatment areas. As shown below.

 

Universal SE Adapter Bank

Switch areas not, be achieved by connecting the output of each domain adapter, to form a common representation space.

 

Where N is the number of the adapter.

Each branch (nonlinearity) along a mapping input subspace and statistics match a particular domain. Then, the attention-sensitive field components to generate a set of weights for a combination of data-driven way these maps. In this case, no prior knowledge of the operation field, since the input image can excite a plurality of adapters SE branches.

 

Domain Attention to generate a set of weights sensitive field, for combining SE bank mapping. First, the input assembly DA global pool of features and Softmax application layer (layer plus Softmax linear function), i.e.

 

The resulting vector is then output to the SDA XUSE weighting USE obtain adaptation field response vector:

 

Finally, activation of the channel by rescale, namely:

 

Wherein the channel-wise multiplication is Fscale.

Experiments

实验backbone:Faster R-CNN + SE-ResNet-50(pretrained on ImageNet)

Data set, parameter setting and super single field detector mAP:

 

mAP compared as follows:

 

Only five data sets.

Multi art target detector (Adaptive) The average accuracy of 0.7% compared to baseline improved, and significantly better than BN Adapter and residual adapter (RA). Common Object detector 0.5M parameters only increased, but the accuracy is poor average (only 72.5%). Note that common mechanisms detector field effect Preferably, each parameter field increased by about 7%, the average accuracy of 1.6% relative to baseline improved. Note that if the parameter field fixing mechanism (i.e. averaged SE adapter direct response), it will decrease the average accuracy of 0.5% (1.1% relative to baseline lift).

The number of SE adapter influence

 

5 adapter is about the most appropriate.

The results:

 

domain attention module learned anything? The following figure shows the fourth and fifth block residual first stage and the final stage of the residual residual learned weights.

 

official evaluation

 

Table shows the results after universal + DA official test model on each set of adaptive field increases, mAP on multiple sets of data have different degrees of improvement.

 

Guess you like

Origin www.cnblogs.com/SuperLab/p/11608078.html