CVPR2023 Semantic Segmentation Paper Collection

The International Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top conferences in the field of computer science and an interdisciplinary conference in image processing, machine learning, artificial intelligence and other fields.

Every year, the CVPR conference will have a large number of paper submissions and academic exchange activities, covering multiple research directions including image processing, computer vision, pattern recognition, machine learning, deep learning, artificial intelligence, etc. It is the most influential and influential in this field. One of the representative academic conferences.

AMiner uses AI technology to classify and sort out the conference papers included in CVPR2023. Today, we will share 72 papers on the theme of semantic segmentation. We will show the ten most popular papers here. Welcome to download and collect!

1. Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP paper details page
Authors: Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu Link: https://
www .aminer.cn/pub/6344dede90e50fcafd24d0b0/
AI Review (Large Model Driven): The goal of open semantic segmentation is to divide images into semantic regions. Recent two-stage approaches first generate class-safe hypotheses, and then leverage a previously trained visual-linguistic model (e.g., CLIP) to divide the segmented regions into discriminative regions. We define a performance bottleneck in this paradigm that CLIP models do not perform well on hidden images. To address this issue, we propose an improved CLIP training method that utilizes previously trained CLIP features. Experimental results show that the F-measure of the best CLIP system improves by 8.8% compared with the previous best CLIP system.

2. LaserMix for Semi-Supervised LiDAR Semantic Segmentation paper details page
Authors: Lingdong Kong, Jiawei Ren, Liang Pan, Ziwei Liu
Link: https://www.aminer.cn/pub/62c2a9595aee126c0fcf0a45/
AI review (large model drive): We investigate the potential of unknown semi-supervised learning in LiDAR segmentation. Our central idea is to take full advantage of unlabeled data using linear features. We propose a laser mixer that mixes laser beams from different LiDAR scans. The model is then encouraged to make consistent and convincing predictions across the mix. Our framework has three exciting properties: 1) Liveness: Laser combinations are safe for ray representations (e.g., views and matrices), so we can apply them universally.

3. The paper details page of Understanding
Imbalanced Semantic Segmentation Through Neural Collapse Authors: Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia
Link: https://www.aminer.cn/pub/63b63fd190e50fcafd8f584f/
AI Survey (Large Model Driven): In this paper, we explore the underlying feature centers and the corresponding structures of classifiers in their semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic word segmentation naturally produces contextual relevance and imbalanced distribution. However, this coherent structure is beneficial for small classes. To preserve these advantages, we introduce a randomizer on the feature center to encourage the network to learn features closer to attractive structures. Experimental results show that the method can achieve significant improvements in performance improvement on both 2D and 3D datasets. Furthermore, our method ranks first and breaks a new record on the ScanNet200 test leaderboard.

4. Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision paper details page
Authors: Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie
Link: https://www.aminer.cn/ pub/640559c290e50fcafddb3868/
AI Review (Large Model Driven): In this paper, we consider the Open Vocabulary Semantic Segmentation (OVS) problem, which aims to partition specified entities of an arbitrary class of objects into predefined closure categories. The main contributions include: First, we propose a transformation model-based OVS system called OVSegmentor, which is trained only on graph-text pairs from web searches without using any hidden markers. OVSegmentor assembles image segments into a learnable set of single-unit labels and maps them to corresponding caption embeddings. Second, we propose two induction tasks, hidden entity completion and cross-graph hidden consistency. The former tries to generalize to all hidden entities in a given label, which enables the model to learn fine-grained view-entity alignment.

5. Dynamic Focus-aware Positional Queries for Semantic Segmentation paper details page
Authors: Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, Dacheng Tao, Bohan Zhuang
Link: https://www.aminer.cn/pub/ 624bb3a25aee126c0fea4e5a/
AI Review (Large Model Driven): This paper proposes a semantic segmentation problem design called Dynamic Focus aware Positional Queries (DFPQ), which dynamically generates positional queries, relying on the visual attention scores of previously decoded blocks and the corresponding Positional encoding of image features. Therefore, our method is able to efficiently handle high-resolution cross-focal information by incorporating only contextual labels for local relation clustering. Extensive experiments on Ade20K and Cityscapes show that this framework exhibits excellent performance in SOTA and shows significant competitive advantage in Mask2former.

6. Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation paper details page
Authors: Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, Yinghuan Shi
Link: https://www.aminer.cn/pub/6304456b90e50fcafd12fe39/
AI Survey (Large Model Driven): This paper reviews popular weak-to-strong consensus frameworks from semi-supervised classifiers. We argue that this simple pipeline already achieves competitive results against recent state-of-the-art work, and when translating it to the partitioning scenario, it already achieves performance comparable to the current state-of-the-art work. Based on this, we propose an auxiliary feature interference flow as a complement to expand the interference space. Furthermore, we propose a bidirectional perturbation technique capable of guiding two powerful views simultaneously, thereby outperforming all existing methods on Pascal, Cityscapes and COCO benchmarks. This study demonstrates that the method exhibits excellent performance in both remote sensing interpretation and medical image analysis.

7. Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation paper details page
Authors: Zhen Zhao, Lihe Yang, Sifan Long, Jimin Pi, Luping Zhou, Jingdong Wang
Link: https://www.aminer. cn/pub/63969ba790e50fcafdcf1c76/
AI Review (Large Model Driven): This paper proposes AugSeg, a simple and clean semi-supervised semantic grouping method that focuses on data noise to improve SSS performance. We employ a simplified strength extension by choosing an arbitrary number of data transformations, randomly injecting labeled information from the continuous space, and estimating the performance of different unlabeled samples based on the model. In addition, we also randomly inject labeled information to improve unlabeled samples, resulting in new state-of-the-art under different partitioning protocols.

8. PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers paper details page Author: Jiacong
Xu, Zixiang Xiong, Shankar P. Bhattacharyya
Link: https://www.aminer.cn/pub/629ec1f85aee126c0fb6e78d/
Model-driven): The fusion of two types of networks will make the task of accurate segmentation difficult. However, directly merging low-level details and high-level semantics produces a phenomenon, the graph defect, which limits the performance improvement of existing two-class models. In this paper, we break the link between the neurotransmitter network (CNN) and the maximum likelihood inductive generator algorithm, and reveal that the two branch networks are not actually the controllers of the total likelihood resolver. To address this problem, we propose a new three-class network architecture: pidnet, which holds three branches to analyze detailed, contextual and boundary information (semantic artifacts).

9. Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning paper details page
Authors: Jishnu Mukhoti, Tsung-Yu Lin, Omid Poursaeed, Rui Wang, Ashish Shah, Philip HS Torr, Ser-Nam Lim Link: https:
//www.aminer .cn/pub/63969ba790e50fcafdcf1cbd/
AI Review (Large Model Driven): We introduce an improved compatibility function for transformation-based projection-based perceptual learning (CLIP), aimed at training alignments for glasses encoders and text encoders. By doing so, the model can identify the image region corresponding to a given text input and thus pass it efficiently into the open-vocabulary semantic segmentation task without requiring any segmentation annotations during training. Using a pre-trained CLIP decoder, we evaluate this task on 4 different classification criteria, including PASCAL VOC, PASCAL Context, COCO Stuff and ADVD20K. Furthermore, we also show that when applied on the backside of CLIP, PATL is also suitable for graph prediction and achieves better accuracy than CLIP for a full system with 12 datasets.

10. Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation paper details page
Authors: Zicheng Wang, Zhen Zhao, Xiaoxia Xing, Dong Xu, Xiangyu Kong, Luping Zhou
Link: https://www.aminer.cn/pub /640166a590e50fcafd68b4fb/
AI review (large model driven): Semi-supervised semantic segmentation has received increasing research attention in recent years. In this paper, we propose a new collision-based cross-view consistency (CCVC) method. Our work aims to encourage two subnetworks to learn useful informative features from uncorrelated observations. In particular, we first propose a novel cross-view consistency (CVC) strategy, which encourages two subnetworks to learn different features from the same input, and these different features are all expected to generate prediction scores consistent with the input. Furthermore, we also propose a method based on adversarial pseudo-labeling (CPL) to ensure that the model will learn more useful information from conflicting predictions. We evaluate our new method on the widely used baseline datasets PASCAL VOC2012 and Cityscapes.

——————————————————————————————————————

To view all semantic segmentation papers, click here :
https://www.aminer.cn/conf/5eba43d8edb6e7d53c0fb8a1/CVPR2023

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/130771790