[CVPR2023] Selective Query Recollection: Enhanced training of query-based target detection

Thesis title: Enhanced Training of Query-Based Object Detection via Selective Query Recollection

bb0ee014136f38e0770b8c07b591d2c6.png

Code: https://github.com/Fangyi-Chen/SQR

a91049c0bbe2808507427769a7d9f37c.png

The author's introduction on Zhihu: https://zhuanlan.zhihu.com/p/610347565

Introduction

Traditional object detection methods require a large number of manual processing steps, limiting end-to-end optimization. This paper explores a compelling area - query-based object detection. In query-based object detection, models exhibit different prediction accuracies at different stages of the decoding process . This involves a difficult problem in query-based target detection: when predicting the target, as the decoding process progresses, errors occur in the later decoding stage, while the intermediate decoding stage can predict accurately.

The paper raises two key issues: first, the training burden is unevenly distributed in different stages, and second, the sequential structure of the decoder causes the correction of intermediate queries to cascade to subsequent stages, increasing the difficulty of training. To address these issues, this paper introduces "Selective Query Recollection (SQR)" as a training strategy that improves training results by accumulating intermediate queries and selectively feeding them to subsequent stages. This strategy successfully solves the performance problem in query-based target detection and provides new ideas and methods for achieving more accurate target detection technology.

Contributions to this article

  • Quantitative research on the problem phenomenon : This paper studies an important phenomenon in query-based target detection in detail for the first time, that is, the model has different prediction accuracy at different stages of the decoding process. This article quantitatively expresses this phenomenon through experiments and data analysis, providing a basis for further research.

  • Identify training limitations : The paper points out that this overlooked phenomenon stems from two training limitations: the uneven distribution of the training burden in different stages, and the sequential structure of the decoder that causes modifications to intermediate queries to cascade to subsequent stages, increasing the difficulty of training.

  • Propose an effective training strategy SQR : In order to solve the above problems, the paper proposes "Selective Query Recollection (SQR)" as a training strategy. SQR improves model training by accumulating intermediate queries and selectively feeding them to subsequent stages. This strategy significantly improves the performance of query-based target detection without increasing the computational cost of inference.

  • Experimental verification : The paper verifies the effectiveness of the SQR strategy by testing different query-based target detection models under multiple experimental settings. Experimental results show that SQR significantly improves the performance of the model, bringing an average precision (AP) improvement of 1.4 to 2.8.

Related work

Training strategy for object detection: Traditional object detection methods are usually based on dense prior information, such as anchor points or anchor boxes, for matching ground-truth objects, relying on their IoU values ​​or other soft scoring factors. The multi-stage model iteratively refines bounding boxes and categories stage by stage. For example, Cascade RCNN uses the output of the intermediate stage to train the next stage, where the IoU threshold is gradually increased to ensure that the target detection results are gradually improved. Recent DETR treats target detection as a set prediction problem, training the model by matching a certain number of target queries, and gradually improving the query through multiple decoding stages.

Query target detection: In recent years, many algorithms have begun to adopt the idea of ​​DETR and regard query target detection as a new paradigm. These methods include Deformable DETR, Conditional DETR, Anchor-DETR, DAB-DETR, DN-DETR, Adamixer, etc. They introduce different variations and improvements, such as using deformable attention modules, decoupling queries, using anchors, etc., to improve the performance and convergence speed of the model.

Methods of this article

The paper hopes to design a training strategy that meets the following expectations:

  • Uneven supervision distribution, with heavy emphasis on late decoding stages to improve the final results.

  • Directly introduce diverse early queries into later stages to mitigate the impact of cascading errors.

To this end, we design a concise training strategy called query recollection (QR). Compared to existing techniques, it collects intermediate queries at each stage and forwards them along the original path. Dense Query Recollection (DQR) is a basic form, while Selective Query Recollection (SQR) is an advanced variant.

018685f667da7ab6ec7650a20b5acb39.png
Dense Query Recollection

Symbolic representation : the paper uses a set of queries

52b311d3ae9fa820cf10c5c81d6c2e0a.png
, where n is typically 100, 300, or 500, these queries are used for a single decoding stage (self-attention, cross-attention, FFN), ground truth assignment, and loss calculation.

Base path : The paper introduces the concept of base path, where the query is refined through all decoding stages. Taking a decoder with 4 stages as an example, we can represent it as the final query, which is improved in all stages. This base path is calculated by applying the improvements at each stage in cascade, as shown in equations (3) and (4).

de2d6644d7190478ceb8271364e4c3a9.png

Formalization of DQR : DQR densely collects each intermediate query and passes them independently to each subsequent stage. After each stage, a set C is formed in which the number of queries grows exponentially, each set containing the queries of the previous stage and half of the queries generated by the current stage. This approach ensures that the number of supervisory signals at each stage is doubled, as shown in (6).

eeedea7aeea9227eb83a8a010844d632.png

Inference process: In the inference phase, only the basic path is used and does not affect the inference process. For a standard 6-stage decoder, the inference path is:

05a76ec2b79d76bb9e51bad2655637e6.png

DQR builds a collection where the number of queries grows exponentially by tightly collecting intermediate queries at each stage and passing them independently to subsequent stages. This allows the number of supervision signals to multiply as the stages increase, and ensures that previous queries are visible in subsequent stages.

Selective Query Recollection

The author believes that Dense Query Recollection has two problems: high computational cost and queries that span too many stages may have negative effects. Therefore, Selective Query Recollection proposes a more intelligent query collection method. SQR chooses to introduce queries at each stage on a case-by-case basis, taking into account the query's contribution to the goal. In this way, SQR can reduce the computational burden while improving performance.

In order to find a better Query Recollection scheme, the author conducted a detailed analysis of the TP decay rate and FP intensification rate introduced in Section 3. They found that most of the better alternatives came from stages 4 and 5, where the TP decay rate and FP intensification rate reached 23.9% and 40.8% respectively, close to the results of stages 1 to 5, while Stages 1 to 3 produced only 11.2% and 32.4%. This suggests that queries from adjacent stages and before adjacent stages are more likely to have positive effects.

Before each stage D s starts, queries are collected from the two closest stages (D s-1 and D s-2 ) and then used as input to D s .

Formalization of SQR :

fe15321280c3de3e9ecdcd5749b9e4c6.png

Effects of Selective Query Recollection :

Selective Query Recollection still meets expectations, and the number of supervision signals increases in the Fibonacci sequence (1,2,3,5,8,13). Compared with dense query recycling, SQR reduces the computational burden to a great extent and even outperforms dense recycling in terms of accuracy. This validates the authors' hypothesis that queries that skip too many stages may generate noise on remote stages, masking their positive effects.

Recollection initial phase: In addition to starting to collect queries from phase 1, the initial phase can also be changed according to actual needs, thereby further reducing the total number of queries in each collection and reducing the computational burden. This can be considered a hyperparameter for selective query recycling.

experiment

Experimental results

Comparison with SOTA, as shown in Table 8:

8212ab4dce56e193163b5871485f0480.png
  • On DAB-DETR, SQR improved AP by +2.3 and +2.6 under R50 and SwinB respectively

  • On Deformable-DETR, SQR improved AP by 2.7 at 12e and 1.4 AP at 50e.

  • In Adamixer on R50, SQR achieved +1.9 AP on basic settings (100 queries, 12e)

  • By adding an extra stage, the gap between with/without SQR increases by +2.8 AP

ablation experiment
9d15b94f634bddf6371097b8aabe483f.png
  • Comparison of baseline with DQR and SQR : Table 4 shows that both DQR and SQR significantly improve the baseline performance. DQR achieved a result of 44.2 (+1.7 AP), while SQR achieved a slightly higher result of 44.4 (+1.9 AP). It should be noted that SQR is far more efficient than DQR. Table 5 shows that under the same training settings, SQR reduces training time by a large amount and still achieves equal or higher AP.

  • Changes in the initial phase of SQR : The authors present the performance of SQR when changing the initial phase in Table 5. When query recycling starts from phase 1, the best performance is obtained, but the computational cost is the highest. As can be seen, starting from stage 2 the performance is similar to starting from stage 1, but with a modest reduction in computational burden. As recycling starts later, the benefits of SQR diminish as expected due to a gradual balancing of training focus due to fewer queries being recycled from earlier stages.

66bc8343a434e5ebb802fd595faba61f.png
  • Table 6 is used to verify the reduction of TP decay rate and FP intensification rate after applying SQR, which is due to the training effect.

in conclusion

In this work, we study the phenomenon that the best detection results of query-based object detectors do not always come from the last decoding stage, but can sometimes come from intermediate decoding stages. We first identify two limitations that cause this problem, namely lack of training focus and cascading errors from query sequences. We address this problem through Selective Query Recycling (SQR) as a simple yet effective training strategy. SQR significantly improves the performance of Adamixer, DAB-DETR and Deformable-DETR under various training settings.

☆ END ☆

If you see this, it means you like this article, please forward it and like it. Search "uncle_pn" on WeChat. Welcome to add the editor's WeChat "woshicver". A high-quality blog post will be updated in the circle of friends every day.

Scan the QR code to add editor↓

39159e3ce23ad0c5b567de2069a23bcd.jpeg

Guess you like

Origin blog.csdn.net/woshicver/article/details/133503074