Thesis title: Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Code: https://github.com/Fangyi-Chen/SQR
The author's introduction on Zhihu: https://zhuanlan.zhihu.com/p/610347565
Introduction
Traditional object detection methods require a large number of manual processing steps, limiting end-to-end optimization. This paper explores a compelling area - query-based object detection. In query-based object detection, models exhibit different prediction accuracies at different stages of the decoding process . This involves a difficult problem in query-based target detection: when predicting the target, as the decoding process progresses, errors occur in the later decoding stage, while the intermediate decoding stage can predict accurately.
The paper raises two key issues: first, the training burden is unevenly distributed in different stages, and second, the sequential structure of the decoder causes the correction of intermediate queries to cascade to subsequent stages, increasing the difficulty of training. To address these issues, this paper introduces "Selective Query Recollection (SQR)" as a training strategy that improves training results by accumulating intermediate queries and selectively feeding them to subsequent stages. This strategy successfully solves the performance problem in query-based target detection and provides new ideas and methods for achieving more accurate target detection technology.
Contributions to this article
Quantitative research on the problem phenomenon : This paper studies an important phenomenon in query-based target detection in detail for the first time, that is, the model has different prediction accuracy at different stages of the decoding process. This article quantitatively expresses this phenomenon through experiments and data analysis, providing a basis for further research.
Identify training limitations : The paper points out that this overlooked phenomenon stems from two training limitations: the uneven distribution of the training burden in different stages, and the sequential structure of the decoder that causes modifications to intermediate queries to cascade to subsequent stages, increasing the difficulty of training.
Propose an effective training strategy SQR : In order to solve the above problems, the paper proposes "Selective Query Recollection (SQR)" as a training strategy. SQR improves model training by accumulating intermediate queries and selectively feeding them to subsequent stages. This strategy significantly improves the performance of query-based target detection without increasing the computational cost of inference.
Experimental verification : The paper verifies the effectiveness of the SQR strategy by testing different query-based target detection models under multiple experimental settings. Experimental results show that SQR significantly improves the performance of the model, bringing an average precision (AP) improvement of 1.4 to 2.8.
Related work
Training strategy for object detection: Traditional object detection methods are usually based on dense prior information, such as anchor points or anchor boxes, for matching ground-truth objects, relying on their IoU values or other soft scoring factors. The multi-stage model iteratively refines bounding boxes and categories stage by stage. For example, Cascade RCNN uses the output of the intermediate stage to train the next stage, where the IoU threshold is gradually increased to ensure that the target detection results are gradually improved. Recent DETR treats target detection as a set prediction problem, training the model by matching a certain number of target queries, and gradually improving the query through multiple decoding stages.
Query target detection: In recent years, many algorithms have begun to adopt the idea of DETR and regard query target detection as a new paradigm. These methods include Deformable DETR, Conditional DETR, Anchor-DETR, DAB-DETR, DN-DETR, Adamixer, etc. They introduce different variations and improvements, such as using deformable attention modules, decoupling queries, using anchors, etc., to improve the performance and convergence speed of the model.
Methods of this article
The paper hopes to design a training strategy that meets the following expectations:
Uneven supervision distribution, with heavy emphasis on late decoding stages to improve the final results.
Directly introduce diverse early queries into later stages to mitigate the impact of cascading errors.
To this end, we design a concise training strategy called query recollection (QR). Compared to existing techniques, it collects intermediate queries at each stage and forwards them along the original path. Dense Query Recollection (DQR) is a basic form, while Selective Query Recollection (SQR) is an advanced variant.
Dense Query Recollection
Symbolic representation : the paper uses a set of queries
Base path : The paper introduces the concept of base path, where the query is refined through all decoding stages. Taking a decoder with 4 stages as an example, we can represent it as the final query, which is improved in all stages. This base path is calculated by applying the improvements at each stage in cascade, as shown in equations (3) and (4).
Formalization of DQR : DQR densely collects each intermediate query and passes them independently to each subsequent stage. After each stage, a set C is formed in which the number of queries grows exponentially, each set containing the queries of the previous stage and half of the queries generated by the current stage. This approach ensures that the number of supervisory signals at each stage is doubled, as shown in (6).
Inference process: In the inference phase, only the basic path is used and does not affect the inference process. For a standard 6-stage decoder, the inference path is:
DQR builds a collection where the number of queries grows exponentially by tightly collecting intermediate queries at each stage and passing them independently to subsequent stages. This allows the number of supervision signals to multiply as the stages increase, and ensures that previous queries are visible in subsequent stages.
Selective Query Recollection
The author believes that Dense Query Recollection has two problems: high computational cost and queries that span too many stages may have negative effects. Therefore, Selective Query Recollection proposes a more intelligent query collection method. SQR chooses to introduce queries at each stage on a case-by-case basis, taking into account the query's contribution to the goal. In this way, SQR can reduce the computational burden while improving performance.
In order to find a better Query Recollection scheme, the author conducted a detailed analysis of the TP decay rate and FP intensification rate introduced in Section 3. They found that most of the better alternatives came from stages 4 and 5, where the TP decay rate and FP intensification rate reached 23.9% and 40.8% respectively, close to the results of stages 1 to 5, while Stages 1 to 3 produced only 11.2% and 32.4%. This suggests that queries from adjacent stages and before adjacent stages are more likely to have positive effects.
Before each stage D s starts, queries are collected from the two closest stages (D s-1 and D s-2 ) and then used as input to D s .
Formalization of SQR :
Effects of Selective Query Recollection :
Selective Query Recollection still meets expectations, and the number of supervision signals increases in the Fibonacci sequence (1,2,3,5,8,13). Compared with dense query recycling, SQR reduces the computational burden to a great extent and even outperforms dense recycling in terms of accuracy. This validates the authors' hypothesis that queries that skip too many stages may generate noise on remote stages, masking their positive effects.
Recollection initial phase: In addition to starting to collect queries from phase 1, the initial phase can also be changed according to actual needs, thereby further reducing the total number of queries in each collection and reducing the computational burden. This can be considered a hyperparameter for selective query recycling.
experiment
Experimental results
Comparison with SOTA, as shown in Table 8:
On DAB-DETR, SQR improved AP by +2.3 and +2.6 under R50 and SwinB respectively
On Deformable-DETR, SQR improved AP by 2.7 at 12e and 1.4 AP at 50e.
In Adamixer on R50, SQR achieved +1.9 AP on basic settings (100 queries, 12e)
By adding an extra stage, the gap between with/without SQR increases by +2.8 AP
ablation experiment
Comparison of baseline with DQR and SQR : Table 4 shows that both DQR and SQR significantly improve the baseline performance. DQR achieved a result of 44.2 (+1.7 AP), while SQR achieved a slightly higher result of 44.4 (+1.9 AP). It should be noted that SQR is far more efficient than DQR. Table 5 shows that under the same training settings, SQR reduces training time by a large amount and still achieves equal or higher AP.
Changes in the initial phase of SQR : The authors present the performance of SQR when changing the initial phase in Table 5. When query recycling starts from phase 1, the best performance is obtained, but the computational cost is the highest. As can be seen, starting from stage 2 the performance is similar to starting from stage 1, but with a modest reduction in computational burden. As recycling starts later, the benefits of SQR diminish as expected due to a gradual balancing of training focus due to fewer queries being recycled from earlier stages.
Table 6 is used to verify the reduction of TP decay rate and FP intensification rate after applying SQR, which is due to the training effect.
in conclusion
In this work, we study the phenomenon that the best detection results of query-based object detectors do not always come from the last decoding stage, but can sometimes come from intermediate decoding stages. We first identify two limitations that cause this problem, namely lack of training focus and cascading errors from query sequences. We address this problem through Selective Query Recycling (SQR) as a simple yet effective training strategy. SQR significantly improves the performance of Adamixer, DAB-DETR and Deformable-DETR under various training settings.
☆ END ☆
If you see this, it means you like this article, please forward it and like it. Search "uncle_pn" on WeChat. Welcome to add the editor's WeChat "woshicver". A high-quality blog post will be updated in the circle of friends every day.
↓ Scan the QR code to add editor↓