CVPR 2022 | Peking University & Byte AI Propose DA-WSOL: A New Framework for Weakly Supervised Object Localization

Click the card below to follow the " CVer " public account

AI/CV heavy dry goods, delivered as soon as possible

Reprinted from: Heart of the Machine | Author: Zhu Lei

Considering weakly supervised object localization as a domain adaptation task between image and pixel feature domains, Peking University and ByteDance propose a new framework to significantly enhance the performance of weakly supervised image localization based on image-level labels.

As a basic problem of computer vision, object localization can provide important target location information for scene understanding, automatic driving, intelligent diagnosis and treatment and other fields. However, the training of object localization models relies on dense annotation information such as object target boxes or object masks. The acquisition of these dense labels relies on the class judgment of each pixel in the image, thus greatly increasing the time and labor required for the labeling process.

To reduce the burden of annotation work, Weakly Supervised Object Localization (WSOL) trains object localization models by using image-level labels (such as image categories) as supervision signals to get rid of the need for pixel-level annotations in the training process. Most of these methods use the classification activation map (CAM) process to train an image-level feature classifier, and then apply the classifier to pixel-level features to obtain object localization results. However, image-level features usually retain sufficient object information, and only identifying discriminative object features in it is to correctly classify the image. Therefore, when the classifier is applied to the pixel-level features with insufficient object information for object localization, the final localization map can often only perceive part of the object area rather than the entire object.

To solve this problem, this paper regards the CAM-based weakly supervised object localization process as a special domain adaptation task, that is, it guarantees that the classifier trained on the source image-level feature domain still has good performance when applied to the target pixel domain. classification performance, thus making it better for target localization during testing. From this perspective, we can naturally transfer the domain adaptation method to the weakly supervised object localization task, so that the model trained only on the image labels can locate the target object more accurately.

19108e54c5f99b1635d24b2241e1dbac.png

  • Weakly Supervised Object Localization as Domain Adaption

  • Article address: https://arxiv.org/abs/2203.01714

  • Project address: https://github.com/zh460045050/DA-WSOL_CVPR2022

At present, this research has been accepted by CVPR2022, and the complete training code and model are open source. It was mainly discussed and developed by Zhu Lei and ByteDance Sheqi from Peking University Molecular Imaging/Medical Intelligence Laboratory, and Lu Yanye from Peking University Molecular Imaging/Medical Intelligence Laboratory gave guidance.

method

c8182054f6ec592c266c82011caaa9b8.png

 Figure 1 - The overall idea of ​​the method

Weakly supervised object localization can actually be seen as fully supervised training of the model e(∙) in the image feature domain (source domain S) based on image-level labels (source domain gold label Y^s), and the The model acts on the pixel feature domain (target domain T) to obtain object localization heatmaps. In general, our method hopes to introduce a domain adaptation method to assist in this process to narrow the feature distribution of the source domain S and the target domain T, thereby enhancing the classification effect of the model e(∙) for the target domain T. , so our loss function can be expressed as:

897497e25a9a72c32af6df7b1c8c4488.png

where L_c is the source domain classification loss and L_a is the domain adaptation loss.

Since the source and target domains in weakly supervised localization are the image domain and the pixel domain, respectively, the domain adaptation task we face has some unique properties: (1) The number of target domain samples and source domain samples is not balanced (target domain samples It is N times of the source domain, N is the number of image pixels); ② There are samples in the target domain with different labels from the source domain (the background pixels do not belong to any object category); ③ There is a certain relationship between the target domain samples and the source domain samples (image features obtained by aggregating pixel features). To better consider these three properties, we further propose a Domain Adaptive Localization Loss (DAL Loss) as L_a(S,T) to narrow the feature distribution of image domain S and pixel domain T.

c1abe1fd3f7ec3611c3d63ad8d27cc73.png

Figure 2 - Division of source and target domains in weakly supervised localization and its role in weakly supervised localization

First, as shown in Figure 2-A, we further divide the target domain samples T into three subsets: ① "pseudo source domain sample set T^f" represents target domain samples with similar feature distribution to the source domain; ② "unknown class sample set" T^u" represents the l target domain samples whose categories do not exist in the source domain; ③ "true target domain sample set T^t" represents the remaining samples. According to these three subsets, our proposed domain-adaptive localization loss can be expressed as:

f3e84d2ec1783782565d8855dac55864.png

It can be seen from the above formula that in the domain adaptive localization loss, the pseudo-source domain samples are regarded as the complement of the source domain samples rather than the target domain samples to solve the problem of sample imbalance. At the same time, in order to reduce the interference of samples T^U with unknown categories in the source domain on the classification accuracy, we only use the traditional adaptive loss L_d (such as the maximum mean difference MMD) to narrow the amplified source domain sample set S∪T^ The feature distribution of f and the real target domain sample set T^t. These samples T^u excluded from the domain adaptation process can be used as the Universum regular L_u to ensure that the class boundaries defined by the classifier can also better sense the target domain.

Figure 2-B also vividly shows the expected effect of the source domain classification loss and domain adaptive localization loss, where L_c ensures that different categories of source domain samples can be correctly distinguished, L_d narrows the source domain target domain distribution, and L_u will The class boundaries are pulled closer to the target domain samples with unknown labels.

538820485d2302c0bea7514a441a1d27.png

Figure 3 - Overall Workflow and Target Sample Distributor Structure

We propose that the domain-adaptive localization loss can easily embed domain-adaptive methods into existing weakly supervised localization methods to greatly improve their performance. As shown in Figure 3, embedding our method on the existing weakly supervised localization model only needs to introduce a target sample assigner (Target Sample Assigner) to divide the target domain sample subsets. The assigner uses the memory matrix M in the training process. Update the anchor points of the unknown target domain sample set T^u and the real target domain sample set T^r in real time, and perform three-way K-means clustering with the two and the source domain features as the clustering centers to obtain each target. The subset to which the domain sample belongs. Finally, according to this sample subset, we can obtain the domain adaptive loss L_d and the Universum regularization L_u, and use the two to supervise the training process together with the source domain classification loss L_c, so that the accuracy of the source domain classification can be guaranteed as much as possible. It is possible to narrow the source domain and target domain features, and reduce the impact of unknown class samples. In this way, when the model is applied to the target domain (ie, pixel features) for object localization, the quality of the resulting localization heatmap will be significantly improved.

experiment

9b3425be636bc776495501b324876e82.png

Figure 3 - Object localization heatmap and final localization/segmentation results

We validate the effectiveness of our method on three weakly supervised object localization datasets:

From the perspective of visual effects, our method can more comprehensively grasp the object region due to ensuring the distribution consistency between the image and the pixel feature domain. At the same time, since the Universum regularization pays attention to the influence of background pixels on the classifier, the localization heatmap generated by our method can better close to the edge of the object and suppress the responsiveness of the category-related background, such as the water surface to the duck.

It can also be seen from the quantitative results that in terms of target localization performance, our method achieves very good results on all three data, especially for non-fine-grained target localization (ImageNet and OpenImages datasets), All our methods achieve the best localization performance. In terms of image classification performance, the introduction of domain adaptation will lead to the loss of accuracy in the source domain, but by drawing on the multi-stage strategy and using an additional classification model (only using L_c training) to generate classification results, the domain adaptation can be solved. side effects.

In addition, we also have good generalization and can be compatible with multi-class domain adaptation and a variety of weakly supervised object localization methods to improve localization performance.

bda5ee5248718e1fcfced728aa469cc1.png

d7dfe9003036ee5792711acc7d171939.png

ICCV和CVPR 2021论文和代码下载

后台回复:CVPR2021,即可下载CVPR 2021论文和代码开源的论文合集

后台回复:ICCV2021,即可下载ICCV 2021论文和代码开源的论文合集

后台回复:Transformer综述,即可下载最新的3篇Transformer综述PDF
CVer-Transformer交流群成立
扫码添加CVer助手,可申请加入CVer-Transformer 微信交流群,方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch和TensorFlow等群。
一定要备注:研究方向+地点+学校/公司+昵称(如Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲长按加小助手微信,进交流群
CVer学术交流群(知识星球)来了!想要了解最新最快最好的CV/DL/ML论文速递、优质开源项目、学习教程和实战训练等资料,欢迎扫描下方二维码,加入CVer学术交流群,已汇集数千人!

▲扫码进群
▲点击上方卡片,关注CVer公众号

整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/123625738