[Transfer-type target detection] A general target detection algorithm

I would like to introduce to you a very interesting new CVPR 2023 paper published today. Compared with the traditional target detection algorithm, if several categories are marked during training, only a few categories can be detected. This paper belongs to general target detection. category.

By aligning images and text during training, it can be automatically extended to detect categories that are not present in the visual annotations.

This will effectively help the migration of the vision system's target detection capabilities, and it feels like a very promising technical direction.

Paper information:
insert image description here
Most of the authors of this paper are domestic scholars.

Traditional object detection algorithms are limited by cumbersome manual labeling. When new categories appear in the open world, they often need to "start from scratch". Even if only one new category is added, the entire process of labeling, training, and deployment must be completed. Limiting its versatility is obviously not "scientific".

The author of the paper proposed UniDetector, which is to enable the target detector to have the ability to identify a large number of categories in the open world.

Its core key points:

1) Based on the alignment of image and text spaces, training with images from multiple sources and heterogeneous label spaces ensures sufficient information for general representations.

2) Due to the rich information of visual and linguistic modalities, it is easy to generalize to the open world while maintaining a balance between known and unknown categories.

3) In order to cope with new challenges in training, the author also proposes the proposed decoupled training method and probability calibration, which further improves the generalization ability to new categories.

In the paper, only 500 categories are used for training, and UniDetector can detect more than 7k categories! And this does not mean that UniDetector can only detect 7K categories, but that the existing public data sets can only allow this research to detect and evaluate up to 7K categories!

(Well, this world limits the use of UniDetector~)

The UniDetector algorithm shows:
insert image description here

UniDetector algorithm flow:
insert image description here
heterogeneous label space during training:

insert image description here
Through experiments, it is found that on the target detection datasets LVIS, ImageNetBoxes and VisualGenome with a large number of categories, UniDetector shows a strong zero-sample generalization ability (that is, the number of image samples participating in training in the dataset is 0), exceeding the average of traditional supervised algorithms. 4% more! While on another 13 object detection datasets with different scenes, UniDetector achieves the state-of-the-art performance using only 3% of the training data!
Detection performance on open world datasets:

insert image description here

Performance on the COCO dataset:

insert image description here
Detection performance on 13 open-world datasets in the zero-shot setting:
insert image description here
Comparison with other open-class object detection methods on the COCO dataset:
insert image description here

Comparison with other open class object detection methods on the LVIS dataset:

insert image description here

Guess you like

Origin blog.csdn.net/weixin_42468475/article/details/129725192