CutLER: A Method for Unsupervised Object Detection and Instance Segmentation

This article is shared from Huawei Cloud Community " CutLER: A Method for Unsupervised Target Detection and Instance Segmentation ", author: Hint.

Object detection is an important task in computer vision, enabling AI systems to perceive, reason, and understand objects. Training the positioning model requires special annotations, such as target boxes, masks, and positioning points. This work investigates unsupervised object detection and instance segmentation without the use of human annotations. Firstly, the MaskCut method is proposed, which can automatically generate the initial rough mask, and then a simple loss function is proposed to help detect the targets missed by MaskCut. Finally, it is found that the predicted mask is finer than the initial mask, so the detector can be iteratively trained for further optimization.

The overall framework of the method is shown in the figure above. The model first uses the self-supervised features of DINO [2] to generate a binary mask, and combines the Normalizes Cuts technology to improve the technology that can only generate a single foreground target mask. Defects, see formulas 2 and 3 for details. This results in a coarse mask for multiple foreground objects in one image.

However, in the standard detection training loss function, it penalizes the location where the predicted region and the ground truth region do not overlap. This restricts the detector to discover new targets, so the author proposes a new loss function. When the IoU of the prediction and the rough mask exceeds a certain threshold, the prediction participates in the calculation of loss. Finally, the author adopted multiple rounds of iterative training to further improve the performance of the model.

The author conducts experiments on multiple data sets. From the following experimental results, the performance of this method reaches the SOTA effect in the Zero-shot task. Compared with other unsupervised methods, the improvement effect is significant. The author also proves various innovation points effectiveness.

Visualization:

[1] Wang X, Girdhar R, Yu S X, et al. Cut and learn for unsupervised object detection and instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 3124-3134.

[2] Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ́e J ́egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021. 1, 2, 3, 4, 6, 12

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

 

The Indian Ministry of Defense self-developed Maya OS, fully replacing Windows Redis 7.2.0, and the most far-reaching version 7-Zip official website was identified as a malicious website by Baidu. Go 2 will never bring destructive changes to Go 1. Xiaomi released CyberDog 2, More than 80% open source rate ChatGPT daily cost of about 700,000 US dollars, OpenAI may be on the verge of bankruptcy Meditation software will be listed, founded by "China's first Linux person" Apache Doris 2.0.0 version officially released: blind test performance 10 times improved, More unified and diverse extremely fast analysis experience The first version of the Linux kernel (v0.01) open source code interpretation Chrome 116 is officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10097309