[CVPR 2022] QueryDet: Accelerating high-resolution small target detection

Dalian has lived up to the expectations. The epidemic has caused us to close the school again. It may be initially closed for 5678 days. Smile jpg

Paper address: https://arxiv.org/pdf/2103.09136.pdf
Project address: https://github.com/ChenhongyiYang/QueryDet-PyTorch

1 Introduction

Background: Unsatisfied with the performance and effect of small target detection

Solution: First use low-resolution pictures to predict the rough positioning of small targets; use the high-resolution features guided by these coarse positions to calculate accurate prediction results.

Reasons for performance degradation in small target detection:

(1) Due to the downsampling operation, the features of the guided small target disappear, or are polluted by noise in the background.

(2) The receptive fields corresponding to low-resolution features cannot match the scale of small objects.

(3) The small deviation of the small target will lead to a large disturbance on the IoU, which makes the detection of small targets inherently difficult to detect large targets.

Existing small target detection methods usually maintain larger resolution features by enlarging the input image size or reducing the downsampling rate, thereby improving the performance of small target detection. The introduction of FPN can alleviate the problem of high resolution introducing a large amount of calculation to a certain extent, but the computational complexity of detection on low-level features is still high.

This paper presents the basis for:

(1) The feature calculation in the high-resolution, low-level feature map is highly redundant, and the spatial distribution of small objects is sparse, accounting for only a small part of the feature map.
(2) In the FPN structure, even if the low-resolution (high-level) feature layer cannot accurately detect small targets, it can roughly determine whether a small target exists and the corresponding area with a high degree of confidence. The sampling characteristics of the feature pyramid are similar to the convolution characteristics of the convolutional neural network (translation, scaling, and distortion invariance), and feature inference can be performed based on its downsampling and upsampling characteristics.

Digression: I found that adding a picture in the Abstract can raise the question very clearly, which is very good and can be used as a reference when writing a thesis.

2. What did you do?

The goal of this paper is to ensure the light weight of the calculation while introducing shallower high-resolution features to help small target detection.

Based on the previous findings, QueryDet proposed a cascade sparse query (Cascade Sparse Query) mechanism. Among them, Query represents using the query passed from the previous layer (higher-level feature with lower resolution) to guide the small target detection of this layer, and then predicts that the query of this layer is further passed to the next layer, and the next layer’s small target The process of target detection guidance; Cascade represents the idea of ​​this cascade; Sparse represents the use of sparse convolution (sparse convolution) to significantly reduce the computational overhead of the detection head on the low-level feature layer.

To put it bluntly, the feature map of the previous layer has high-level features and low resolution, and is responsible for the initial screening of small targets; this kind of query is transmitted to the lower layer with high-resolution information and then refined, this "glance and focus" The two-stage structure can effectively perform dynamic reasoning and detect the final result.

Speed ​​up inference with sparse queries:

In previous designs of feature pyramid-based detectors, small objects tend to be detected from high-resolution low-level feature maps. However, since small objects are usually sparsely distributed in space, the intensive computational paradigm on high-resolution feature maps is very inefficient. Inspired by this observation, the authors propose a coarse-to-fine approach to reduce the computational cost of low-level pyramids: first, predict the coarse locations of small objects on the coarse feature maps, and then centrally compute the corresponding locations on the fine feature maps. This process can be regarded as a query process: the rough location is the query key, and the high-resolution features used to detect small objects are the query value. The whole process is shown in the figure below.

For the graph in the introduction, it contains two cascading query operations, namely: Large->Medium and Medium->Small. Taking Large->Medium as an example, first, the network will query the image in the Large level. Small targets are marked (objects whose size is smaller than the preset threshold s are defined as small targets), and the Large-level network will predict the confidence of small targets during the prediction process, and obtain grid information containing small targets; secondly, in During the reasoning process, the network selects the position whose prediction score is greater than the threshold s as the query, and maps this position to the feature map of Medium. Finally, the three heads corresponding to Medium will only calculate the head and For the queries of the next layer, this calculation process is implemented by sparse convolution.

3. Conclusion

QueryDet uses the high-resolution feature to improve the performance of small target detection, and at the same time, it uses a novel query mechanism cascade sparse query (CSQ) to accelerate the reasoning of dense object detectors based on feature pyramids, and uses high-level low-resolution features for initial screening In the area containing small targets, the position obtained by the preliminary screening is used on the high-resolution feature layer, and the sparse convolution operation is used, which greatly saves the calculation consumption.

To be added to v7, it needs to be resolved, and it is estimated that it will not be resolved...

おすすめ

転載: blog.csdn.net/Zosse/article/details/128032496