ICCV 2023 Random Boxes Are Open-world Object Detectors Paper Interpretation

ICCV: Random Boxes Are Open-world Object Detectors

1. Paper information

insert image description here

2. Introduction

insert image description here

Target detection is one of the basic tasks of computer vision, the purpose is to locate and classify the target objects in the image. The current mainstream target detection algorithms are based on the closed world assumption, that is, it is assumed that all categories to be detected are labeled in the training set. However, in practical applications, the visual world faced by object detection systems is open and dynamic, and unknown types of objects may appear at any time. Therefore, it has become an important research topic to build a more robust open-world object detection model for unknown categories of objects, which can not only continuously detect known categories, but also classify unknown categories as "unknown".
insert image description here

Although the existing open-world object detection methods can detect both known and unknown categories of objects, they have obvious shortcomings when detecting unknown categories of objects. Specifically, the way to generate detection boxes in existing methods usually relies on training data with only limited known category labels, so these boxes are biased to known category objects, and it is difficult to cover the location of unknown category objects. As a result, unknown category objects are often misclassified as background, resulting in low detection recall for unknown categories. For example, on the Pascal VOC and MS-COCO datasets, the recall rate of unknown category targets in existing methods is usually around 5%, which is seriously confused with the background. This becomes a significant weakness of existing open-world object detection methods.

The core design motivation of the RandBox method in this paper is to remove the known category bias in the training data through random boxes to explore more regions that may contain unknown objects. Specifically, the training process of RandBox uses independently randomly generated proposal boxes in each iteration, instead of the proposal box generation mechanism that relies on known category training data in existing methods. The generation of random boxes is independent of training data and known categories, so it can effectively avoid the bias of proposed boxes to known categories. Compared with existing methods, the special features of the RandBox method are:

1. The random box is used to remove the deviation of known categories, and the randomized controlled experiment is simulated to learn the causal relationship between the target and the label, so as to avoid overfitting limited known categories.

2. A new matching mechanism is proposed, which will not mistakenly punish unknown category boxes, and can better evaluate the possibility of each box containing a foreground target.

And why the proposed method can solve the mentioned deficiencies? First, the use of random boxes can effectively eliminate the bias of training data to known categories, and the position distribution of random boxes reflects all possible target locations, not just limited to the distribution of known category targets. This greatly increases the probability of covering instances of unknown classes. In addition, the new matching mechanism can more accurately evaluate the probability of each box containing an object of unknown category, avoiding the mistaken classification of unknown object boxes as background like existing methods. This allows for more reliable identification of instances of unknown classes.

The RandBox open-world object detector proposed in this paper is an original contribution in this field. Specifically, based on the analysis of existing problems in existing methods, the author of the paper proposes a new perspective, that is, to establish a more robust detection model for unknown categories from the perspective of eliminating training data bias. Based on this perspective, the author designed the use of random boxes and matching mechanisms in the training process. By constructing a detection process that does not rely on limited known class data, RandBox can more fully cover unknown class instances, and can more accurately evaluate the possibility of each box containing an unknown class target. Extensive experiments on multiple datasets verify the effectiveness of RandBox and achieve state-of-the-art open-world detection performance. This not only promotes the technical development of this specific task, but also means that the open world object detection can realize the mutual benefit of enhancing the detection ability of unknown categories and improving the generalization ability of known categories by eliminating bias. Therefore, this paper provides a new idea and paradigm of SIGNIFICANT for building a computer vision system that is more intelligent and robust to unknown targets. The original ideas of the paper provide strong support for the further evolution of related technologies in coping with complex dynamic environments.

3. Method

insert image description here

3.1 Concept review

First review some basic concepts. Existing open-world object detection methods are mainly based on two detector frameworks: two-stage Faster R-CNN and end-to-end DETR. The main difference between these two detectors is the way proposals are generated. Faster R-CNN uses a region proposal network (RPN) pre-trained on a dataset trained with known categories to generate proposals; DETR uses a transformer decoder to generate proposals directly on image features. The current method divides the predictions generated by the detector into three parts: Known-FG, Unknown-FG and BG. Known-FG is selected by calculating the matching score between proposals and ground truth. Like Faster R-CNN, IoU is used, and DETR uses bipartite matching that considers category probability. Unknown-FG consists of the highest scoring unmatched proposals. BG is the remaining unmatched proposals. Look at the training objectives of existing methods, including category cross-entropy loss and bounding box regression loss on Known-FG, and category cross-entropy loss on Unknown-FG and BG

Known-FG is the predictions corresponding to the ground truth matched by the detector. The paper formula (1) gives the loss on Known-FG as the sum of classification loss and regression loss:
insert image description here

Among them, cross-entropy loss or focal loss is usually used, which is a balanced weight and smooth L1 loss is used. and are the category and bounding box of the ground truth, respectively. and are the prediction results.

Unknown-FG is the highest score among unmatched predictions. BG is the remaining unmatched predictions. Since neither has a ground truth bounding box, only the classification loss is calculated:
insert image description here

3.2 RandBox

Detector with Random Proposals

During training, RandBox randomly generates 500 boxes for each image as detection proposals. Specifically, the four coordinates (central point x, y, height and width) of each box are randomly sampled from the standard normal distribution, and then truncated and scaled to the range [0,1]. Randomness was removed during testing, using a predefined set of 10,000 boxes.

Known-FG

Known-FG is obtained using a dynamic K matcher. That is, each ground truth box matches topk proposals, and topk is dynamically selected as the number of proposals with the largest IoU.

Unknown-FG

Unknown-FG selects the topk proposals with the highest scores. The key is to propose a new matching score calculation method:

insert image description here

That is, the sigmoid value is calculated and summed for each category (including "unknown"). This evaluates the likelihood that a proposal contains a foreground object, without misclassifying unknown object boxes as BGs.

4. Experiment

insert image description here

Analyzing the experimental results, we can draw the following conclusions:

1. After the traditional methods Faster R-CNN and DETR added new categories (Task 2-4), Known-mAP dropped significantly, indicating that they suffered from catastrophic forgetting. Using data recurrence can significantly improve Known-mAP.

2. Comparing existing open-world detection methods with traditional methods, we can see that all existing methods have a decrease in Known-mAP, which means that they detect unknown categories by sacrificing the accuracy of known categories.

3. Comparing RandBox with the existing open world detection methods in Unknown Recall and Absolute Open-Set Error, it can be seen that RandBox has been significantly improved, which means that RandBox can explore and learn more unknown category instances, improving the unknown Evaluation metrics for categories.

4. Most importantly, RandBox not only achieves the best Unknown Recall, but also achieves the best Known-mAP, even surpassing traditional detection methods. This proves that RandBox eliminates the bias of training data through random boxes, infers the target category it contains from any box, and completely avoids overfitting to known categories.
insert image description here

The paper conducts detailed ablation experiments to verify the effectiveness of the RandBox method. The experimental results show that the remarkable effect of RandBox does come from the combination of its two innovative components, namely the random box and the new matching mechanism. Specifically, only using random boxes or matching mechanisms can partially improve the indicators, but the integration of the two can produce a stronger synergistic effect, and each indicator has been greatly improved. This proves that both the region exploration of random boxes and the new matching mechanism are essential for the accurate recognition of unknown objects. In the random box generation strategy, it is better to use more boxes, and the Gaussian distribution is similar to the uniform distribution. In the inference stage, expanding the number of detection boxes can also further enhance the recall rate of known and unknown categories. In summary, the ablation experiments fully verified that the innovative design in the RandBox method proposed in the paper played a key role, and the components cooperated with each other, making RandBox significantly improved compared with the existing methods.

5. Discussion

insert image description here

The RandBox open-world object detector proposed in this paper has several outstanding advantages as follows. First, RandBox uses randomly generated proposals, which can effectively eliminate the known category bias in the training data, and explore a wider sample space through randomness, so that it can more fully cover potential target areas of unknown categories. Second, RandBox designs a new matching mechanism to evaluate the probability that a proposal contains an unknown target, avoiding the problem of misclassifying the unknown target box as the background. Thirdly, RandBox can not only significantly improve the detection recall rate of unknown categories, but also reduce the overfitting of the model to limited known categories, and finally achieve the excellent effect of overall improvement of classification and positioning indicators on open world detection tasks. Finally, the RandBox algorithm is exquisite in design, efficient in training and deployment, and the content of the paper is rich and sufficient. Multiple sets of experiments have verified the excellent performance of RandBox. In summary, RandBox is an efficient and practical open world detection technology, which provides a valuable reference for building a more intelligent and robust computer vision system.

Although RandBox has made significant progress in open-world object detection tasks, as a preliminary exploratory research, this method still has certain limitations. Specifically, the random box generation strategy of RandBox is relatively simple, which may not fully adapt to the diverse distribution of unknown category targets. In addition, the assumption that the matching mechanism is based on feature migration also has certain vulnerabilities. From the perspective of computational efficiency, the training process of RandBox will increase the amount of calculation due to the frequent generation of random boxes. From the perspective of algorithm comparison, the paper still lacks a comparison with more transformer-based detectors. From the perspective of method scalability, when the amount of unknown categories increases further, the stability of RandBox still needs to be verified. In addition, it is necessary to further refine the theoretical analysis around the random frame.

6 Conclusion

This paper proposes the RandBox algorithm for open world object detection tasks. The key innovations are the use of randomly generated boxes as detection proposals, and the design of new matching mechanisms. The random box can effectively eliminate the known category bias of the training data, and cooperate with the new matching mechanism to more accurately evaluate the probability of the proposal containing an unknown category target. On the Pascal VOC and MS-COCO datasets, RandBox achieves state-of-the-art open-world detection performance, while significantly improving the detection accuracy of known and unknown categories. Ablation experiments further verify that the design improvements of RandBox have played an important role. Overall, RandBox provides an effective paradigm for building open-world detection models that are more robust to unknown targets. Future research can expand RandBox to handle more categories, explore more advanced random box generation strategies, etc.

Guess you like

Origin blog.csdn.net/limingmin2020/article/details/132334810