CVPR 2023 | UniMatch: Revisiting strong and weak consistency in semi-supervised semantic segmentation

Here we would like to share with you our work "Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation" that was accepted by CVPR 2023. In this work, we revisit the “ strong and weak consistency ” approach in semi-supervised semantic segmentation . We first found that the most basic method of constraining strong and weak consistency, FixMatch[1] (proposed three years ago in 2020), can achieve performance comparable to the current SOTA. Inspired by this, we further expanded the perturbation space of FixMatch and used dual-channel perturbations to more fully explore the original perturbation space.

Our final method, UniMatch, is very simple and effective, and has achieved far better results than before in natural images (Pascal VOC, Cityscapes, COCO), remote sensing image change detection (WHU-CD, LEVIR-CD), and medical images (ACDC). The result of the method. We have open sourced the code and training logs in all scenarios to facilitate better reproduction and hope it can serve as a baseline for everyone.
Insert image description here
Article link (this is the CVPR camera-ready version. Compared with arXiv-V1, we have added and updated some experimental results):

https://link.zhihu.com/?target=https%3A//arxiv.org/abs/2208.09910

Code and experiment log links:

https://link.zhihu.com/?target=https%3A//github.com/LiheYoung/UniMatch

We have also compiled an awesome list of semi-supervised semantic segmentation:

https://link.zhihu.com/?target=https%3A//github.com/LiheYoung/UniMatch/blob/main/docs/SemiSeg.md

Background

Semi-supervised semantic segmentation hopes to use as few labeled images as possible and a large number of unlabeled images to learn a better segmentation model. Among them, learning of labeled images is generally similar to fully supervised semantic segmentation (calculating the cross-entropy loss between the prediction results and manual annotations). The key to the problem is how to utilize unlabeled images.

Insert image description here

Observations

When reproducing the above very simple FixMatch into semi-supervised semantic segmentation, we found that in multiple settings, FixMatch can achieve results that are close to or even significantly better than the current SOTA method. The comparison results are as follows: So we
Insert image description here
further The core module of FixMatch, the strong perturbation part, conducted ablation experiments, as shown in the table below. We find that strong perturbations are also crucial for FixMatch in semi-supervised semantic segmentation tasks. The strong perturbations we use by default include color jitter, blur, grayscale and CutMix. When all these strong perturbations are removed (w/o any SP), the performance of FixMatch becomes extremely poor; in addition, some previous methods such as CPS[3 ] Only use CutMix, a single strong perturbation strategy. However, we found that when only CutMix is ​​used (w/ CutMix), compared to using all strong perturbations (w/ whole SP), the performance is also significantly reduced.

The Importance of Strong Perturbations (SP) in FixMatch
Insert image description here

Our UniMatch

Unified Perturbations (UniPerb)
According to the above results, strong perturbations can bring great performance gains. However, FixMatch only performs strong perturbations on the input space (image), so we propose to further expand the perturbation space of FixMatch and add a training branch to perform strong perturbations on the feature space, as shown in Figure (a) below.
Insert image description here
It should be noted that there are also some works such as PS-MT [4] that perform strong perturbations on images and features at the same time. However, they apply these strong perturbations to the same branch, making learning too difficult; and we will Strong perturbations of different properties are separated into different branches for learning respectively. We demonstrated the superiority of this approach in the ablation experiment. In addition, we also show that compared with some feature perturbation work such as VAT [5], our feature dropout strategy is simpler and more effective.

Dual-stream Perturbations (DusPerb)

Insert image description here

Overall UniMatch

Insert image description here

Experiments

Comparison with SOTA method: Pascal VOC 2012
Pascal VOC 2012 includes a total of 10,582 training images, covering 21 categories. Among them, 1,464 images have relatively high annotation quality. Therefore, there are three different strategies for selecting annotated images: (1) Select from 1,464 finely annotated images; (2) Randomly select from 10,582 images; (3) Prioritize selection from 1,464 finely annotated images, if more are needed of labeled images, select from the remaining coarsely labeled images. The results are shown below. Under various selection strategies, data partitioning, and backbone, our method achieved optimal performance.

The result of the first data division:

Insert image description here
Results under the second (w/o) and third (w/) data division:
Insert image description here
comparison with SOTA method: Cityscapes
Cityscapes contains a total of 2,975 training images, covering 19 categories.
Insert image description here
Comparison with the SOTA method: COCO
follows the existing work PseudoSeg [8]. What we use here is the COCO-Things data set (excluding the Stuff category), which contains a total of 81 categories and 118,287 training images.
Insert image description here

Expand to more scenarios: semi-supervised remote sensing image change detection

This scenario requires identifying changing land parcels through remote sensing images of the same location but at different times, which can be roughly regarded as a two-class segmentation problem. The usual framework is as follows. The encoder is used to extract two influencing features, and then their features are subtracted and then sent to the decoder for prediction.
Insert image description here
We verified it on two mainstream data sets, WHU-CD and LEVIR-CD, tried two network structures, PSPNet and DeepLabv3+, and achieved significant improvements in all data divisions.
Insert image description here

Expand to more scenarios: semi-supervised medical image segmentation

We performed validation on the ACDC dataset and tried using only 1/3/7 labeled cases. With only one annotation case, UniMatch can far outperform other methods that use three annotation cases.
Insert image description here
Ablation experiments
Only part of the ablation experiments are shown here. For more ablation experiments, please refer to the paper.

Advantages of UniMatch compared with FixMatch

Below we have verified the superiority of UniMatch compared to FixMatch on Pascal VOC, Cityscapes, and COCO. We have verified that
Insert image description here
Insert image description herethe gain of dual-channel strong perturbation (DusPerb) is non-trivial and is not equivalent to double the batch size or double the batch size. number of training rounds
Insert image description here

Verify the necessity of separating strong perturbations with different characteristics into different branches.
Insert image description hereDifferent feature perturbation strategies.
Insert image description here

Summarize

In this work, we demonstrate the effectiveness of FixMatch transfer to semi-supervised semantic segmentation tasks. We verified the importance of strong perturbations through ablation experiments, further expanded the perturbation space of FixMatch through characteristic strong perturbations, and more fully explored the original perturbation space through dual-path strong perturbations. Our final method is very simple and effective, achieving optimal performance in natural images, remote sensing image change detection, and medical image segmentation.

Guess you like

Origin blog.csdn.net/qq_39523365/article/details/133140800