[Computer Vision | Image Segmentation] arxiv Computer Vision Academic Express on Image Segmentation (July 6 Collection of Papers)

Article Directory

1. Segmentation | Semantic Correlation (15 articles)

1.1 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

Hint Diffusion Representation for Cross-Domain Semantic Segmentation

https://arxiv.org/abs/2307.02138

insert image description here
Although originally designed for image generation, diffusion models have recently been shown to provide excellent pretrained feature representations for semantic segmentation. This result intrigued us, and we set out to explore how diffusive pretrained representations generalize to new domains, a key capability of any representation. We find that diffusion pre-training achieves extraordinary domain generalization results for semantic segmentation, outperforming both supervised and self-supervised backbone networks. Based on this, we study how to exploit the model's unique ability to take input hints to further improve its cross-domain performance. We introduce a scene hint and a hint randomization strategy to help further disentangle domain-invariant information when training the segmentation head. Furthermore, we propose a simple yet efficient test-time domain adaptation method based on learning scene cues on the target domain in an unsupervised manner. Extensive experiments on four synthetic-to-real and explicit-to-adverse weather benchmarks demonstrate the effectiveness of our approach. Without employing any complex techniques, such as image translation, augmentation or rare class sampling, we set a new state-of-the-art on all benchmarks. Our implementation will be publicly available at \url{https://github.com/ETHRuiGong/PTDiffSeg}.

1.2 ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

ZJU ReLER Submission to the Epic Kitchen Challenge 2023: Semi-Supervised Video Object Segmentation

https://arxiv.org/abs/2307.02010

insert image description here
The Associating Objects with Transformers (AOT) framework has shown outstanding performance in various complex scenarios of video object segmentation. In this study, we introduce MSDeAOT, a variant of the AOT family that employs Transformers at multiple functional scales. With a hierarchical gated propagation module (GPM), MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16. Furthermore, we employ GPM at a finer feature scale with a stride of 8, which improves the accuracy of detecting and tracking small objects. With the implementation of test-time augmentation and model ensemble techniques, we achieve the top ranking in the EPIC-KITCHEN VISOR Semi-Supervised Video Object Segmentation Challenge.

1.3 Multi-Modal Prototypes for Open-Set Semantic Segmentation

A Multimodal Prototype for Open Set Semantic Segmentation

https://arxiv.org/abs/2307.02003

insert image description here
Adapting the visual system to new object categories at inference time has been both valuable and challenging in semantic segmentation. To achieve such generalization, existing methods rely on providing several supporting examples as visual cues or class names as textual cues. By developing relatively optimistically, these two lines have been studied in isolation, ignoring the inherent complementarity of low-level visual information and high-level linguistic information. In this paper, we define a unified setting called Open Set Semantic Segmentation (O3S), which aims to learn both visible and unseen semantics from visual examples and textual names. Our pipeline extracts the multimodal prototype segmentation task, first with single-modal self-augmentation and aggregation, then with multi-modal complementary fusion. Specifically, we aggregate visual features into several tokens as visual prototypes, and augment class names with detailed descriptions to generate textual prototypes. These two modalities are then fused to generate multimodal prototypes for final segmentation. On Pascal and Coco datasets, we conduct extensive experiments to evaluate the effectiveness of the framework. Even on the more detailed partial segmentation Pascal-Animals, state-of-the-art results can be achieved by training only on coarse-grained datasets. Perform a thorough ablation study to quantitatively and qualitatively dissect each component.

1.4 The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

KiTS21 Challenge: Automatic Segmentation of Kidneys, Renal Tumors, and Renal Cysts by CT in the Corticomedullary Phase

https://arxiv.org/abs/2307.01984

insert image description here
This paper presents the challenge report of the Kidney and Kidney Tumor Segmentation Challenge 2021 (KiTS21), which is held jointly with the 2021 International Conference on Computing and Computer-Assisted Intervention in Medical Images (MICCAI). KiTS21 is the sequel to the first edition in 2019, and in addition to a larger dataset, it also introduces various innovations in the design of the challenge. A novel annotation method was used to collect three individual annotations for each region of interest in a fully transparent setting using a web-based annotation tool. Furthermore, the KiTS21 test set was collected from an external institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams have made significant improvements to the state-of-the-art in 2019, and this performance has shown increasingly close to human-level performance. An in-depth meta-analysis describing which methods were used and how they topped the leaderboard, and which cases were characterized generally seeing good performance and which did not. Overall, KiTS21 contributed to a significant advance in the state of the art in kidney tumor segmentation and provided useful insights applicable to the field of semantic segmentation as a whole.

1.5 Advancing Wound Filling Extraction on 3D Faces: A Auto-Segmentation and Wound Face Regeneration Approach

Improved 3D facial wound filling extraction: an automatic segmentation and wound regeneration method

https://arxiv.org/abs/2307.01844

insert image description here
Facial wound segmentation plays a vital role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient method for automated 3D facial wound segmentation using a two-stream graph convolutional network. Our method utilizes the Cir 3D-FaIR dataset and addresses the challenge of data imbalance through extensive experiments with different loss functions. To achieve accurate segmentation, we conduct thorough experiments and select a high-performance model from the trained models. The selected models exhibit exceptional segmentation performance for complex 3D facial wounds. Furthermore, based on the segmentation model, we propose an improved method for extracting 3D facial wound fillers and compare with previous findings. Our method achieves a remarkable accuracy of 0.9999986% of the test suite, surpassing the performance of previous methods. Based on this result, we used 3D printing technology to illustrate the shape of the wound filler. The findings of this study have important implications for physicians involved in preoperative planning and intervention design. By automating facial wound segmentation and improving the accuracy of wound filling extraction, our method can help carefully evaluate and optimize interventions to improve patient outcomes. Additionally, it helps advance facial reconstruction technology by using machine learning and 3D bioprinting to print skin tissue implants. Our source code can be found at \url{https://github.com/SIMOGroup/WoundFilling3D}.

1.6 Synchronous Image-Label Diffusion Probability Model with Application to Stroke Lesion Segmentation on Non-contrast CT

Image-Label Simultaneous Diffusion Probability Model and Its Application to Segmentation of Non-enhanced CT Stroke Lesions

https://arxiv.org/abs/2307.01740

insert image description here
Stroke lesion volume is a key radiological indicator for assessing the prognosis of patients with acute ischemic stroke (AIS), and automatic measurement of stroke lesion volume in non-contrast CT (NCCT) scans is challenging. Recent diffusion probability models have shown potential to be used for image segmentation. This paper proposes a novel simultaneous image-labeled diffusion probability model (SDPM) for stroke lesion segmentation in NCCT using a Markovian diffusion process. The proposed SDPM is fully based on latent variable modeling (LVM), providing a complete probabilistic formulation. An additional net flow is introduced in parallel with the noisy prediction flow to obtain initial noisy label estimates to efficiently infer final labels. By optimizing a specified variational bound, a trained model can infer multiple label estimates for reference given an input image with noise. The model is evaluated on three stroke lesion datasets, including one public dataset and two private datasets. Compared with several U-net and Transformer based segmentation methods, our proposed SDPM model is able to achieve state-of-the-art performance. Code is public.

1.7 Augment Features Beyond Color for Domain Generalized Segmentation

Extended Features Beyond Color in Domain Generalized Segmentation

https://arxiv.org/abs/2307.01703

insert image description here
Domain Generalized Semantic Segmentation (DGSS) is an important yet challenging task in which models are trained only on source data without any target data. Previous DGSS methods can be classified into augmentation-based and normalization-based. The former either introduces additional bias data or only performs channel adjustment for data augmentation, while the latter may discard beneficial visual information, both of which lead to limited performance of DGSS. In contrast, our method performs inter-channel transformations while avoiding domain-specific biases, thereby diversifying data and improving model generalization performance. Specifically, our method consists of two modules: Random Image Color Augmentation (RICA) and Random Feature Distribution Augmentation (RFDA). RICA converts an image from RGB to the CIELAB color model and randomizes the colormap in a perceptually-based manner for image enhancement purposes. We extend it to feature spaces beyond color by using a CycleGAN-based generative network, which complements RICA and further improves generalization. We conduct extensive experiments, generalization results from synthetic GTAV and SYNTHIA to real Cityscapes, BDDS and Mapillary datasets show that our method achieves state-of-the-art DGSS performance.

1.8 EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity

EffSeg: Efficient Fine-grained Instance Segmentation Based on Structure Preserving Sparsity

https://arxiv.org/abs/2307.01545

insert image description here
Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask, resulting in higher quality segmentations. However, both approaches have limitations, either not having access to neighboring features (PointRend), or performing computations at all spatial locations instead of sparsely (RefineMask). In this work, we propose EffSeg to perform fine-grained instance segmentation in an efficient manner by using our structure-preserving sparsity (SPS) method based on separately stored active features, passive features and a dense 2D index map containing index of features. The goal of index maps is to preserve the 2D spatial configuration or structure between features such that any 2D operations can still be performed. EffSeg achieves similar performance to RefineMask on COCO while reducing the number of FLOPs by 71% and increasing FPS by 29%. Code will be released.

1.9 Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation

Leveraging the richness of learned image compression representations for semantic segmentation

https://arxiv.org/abs/2307.01524

insert image description here
Autonomous vehicles and advanced driver assistance systems (ADAS) have the potential to fundamentally change the way we travel. Many of these vehicles currently rely on segmentation and object detection algorithms to detect and track objects around them. Data collected from vehicles is usually sent to cloud servers to facilitate continuous/lifelong learning of these algorithms. To account for bandwidth constraints, data is compressed before being sent to the server, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression codec to reduce the latency overhead incurred by decompression operations in standard pipelines. We demonstrate that learned compressed representations can also be used to perform tasks such as semantic segmentation, in addition to decompression to obtain images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor of up to 66\times$ while preserving the information required to perform the segmentation compared to a dice factor of 0.84, 0.88 Using decompressed images while reducing the overall computation by $11.

1.10 Semantic Segmentation on 3D Point Clouds with High Density Variations

Semantic Segmentation of 3D Point Clouds with High Density Changes

https://arxiv.org/abs/2307.01489

insert image description here
LiDAR scanning for surveying applications acquires measurements over wide areas and long distances, which produce large-scale 3D point clouds with significant local density variations. While existing 3D semantic segmentation models downsample and upsample to build robustness against varying point densities, they are less effective under the large local density variation characteristics of point clouds from surveying applications. To mitigate this weakness, we propose a new architecture, called HDVNet, that consists of a nested set of encoder-decoder paths, each processing a specific range of point densities. Restricting the interconnections between feature maps enables HDVNet to measure the reliability of each feature based on the density of points, e.g., downweighting high-density features that do not exist in low-density objects. By efficiently handling input density variations, HDVNet outperforms state-of-the-art models in segmentation accuracy on real point clouds with inconsistent densities, using only more than half the weights.

1.11 AxonCallosumEM Dataset: Axon Semantic Segmentation of Whole Corpus Callosum cross section from EM Images

AxonCallosum EM Dataset: Semantic Segmentation of Axons Based on EM Images

https://arxiv.org/abs/2307.02464

insert image description here
Electron microscopy (EM) remains the dominant technique for elucidating the intricate details of animal nervous systems at the nanoscale. However, accurately reconstructing the complex morphology of axons and myelin sheaths poses significant challenges. Furthermore, the lack of publicly available large-scale EM datasets covering complete cross-sections of the corpus callosum, with dense ground-truth segmentation of axons and myelin, hampers the progress and evaluation of whole-body corpus callosum reconstructions. To overcome these obstacles, we introduce the AxonCallosumEM dataset, consisting of 1.83x 5.76 mm EM images captured from the corpus callosum of a mouse model of Rett syndrome (RTT), which requires extensive axon bundles. We carefully proofread more than 600,000 patches at 1024 × 1024 resolution, thereby providing a comprehensive ground truth for myelinated axons and myelin sheaths. Furthermore, we extensively annotate three different regions in the dataset for training, testing and validation purposes. Using this dataset, we develop a fine-tuning method that adapts a Segment Any Model (SAM) to the task of EM image segmentation, called EM-SAM, which outperforms other state-of-the-art methods. Furthermore, we propose the evaluation results of EM-SAM as a baseline.

1.12 Direct segmentation of brain white matter tracts in diffusion MRI

Direct segmentation of midbrain white matter tracts for magnetic resonance diffusion imaging

https://arxiv.org/abs/2307.02223

insert image description here
The white matter of the brain is made up of a group of nerve bundles that connect different regions of the brain. Segmentation of these bundles is often required clinically and for research. Diffusion-weighted MRI provides unique contrast to delineate these tracts. However, existing segmentation methods rely on intermediate calculations such as tractography or estimation of fiber orientation density. These intermediate calculations in turn require complex calculations, which can lead to unnecessary errors. Furthermore, these intermediate calculations often require dense multi-husk measurements, which are not available in many clinical and research applications. Therefore, current methods suffer from low accuracy and poor generalization. Here, we propose a novel deep learning approach to segment these bundles directly from diffusion MRI data, avoiding intermediate computation errors. Our experiments show that the proposed method can achieve segmentation accuracy that is at the level of the state-of-the-art methods (mean dice similarity coefficient is 0.826). Our method provides far superior generalization ability compared to the state of the art, the undersampled data is typical of clinical studies and data obtained with different acquisition protocols. Furthermore, we propose a new method for detecting inaccurate segmentations and show that it is more accurate than standard methods that are based on estimation uncertainty quantification. These new methods can serve many very important clinical and scientific applications that require accurate and reliable non-invasive segmentation of white matter tracts.

1.13 ToothSegNet: Image Degradation meets Tooth Segmentation in CBCT Images

ToothSegNet: Image Degradation and Tooth Segmentation in CBCT Images

https://arxiv.org/abs/2307.01979

insert image description here
In computer-aided orthodontics, three-dimensional tooth models are required for many medical treatments. Segmenting teeth from cone-beam computed tomography (CBCT) images is a critical step in building a model. However, CBCT image quality issues, such as metal artifacts and blurring caused by the shooting equipment and the patient's tooth condition, make segmentation difficult. In this paper, we propose ToothSegNet, a new framework that familiarizes segmentation models with generated degraded images during training. ToothSegNet uses channel cross-fusion fusion technology to fuse high-quality and low-quality image information in the degradation simulation module, reduces the semantic gap between encoder and decoder, and refines the shape prediction of teeth through structural constraint loss. Experimental results show that ToothSegNet produces more accurate segmentations and outperforms state-of-the-art medical image segmentation methods.

1.14 Edge-aware Multi-task Network for Integrating Quantification Segmentation and Uncertainty Prediction of Liver Tumor on Multi-modality Non-contrast MRI

Application of an edge-aware multi-task network for quantitative segmentation and uncertainty prediction of liver tumors in multimodal non-contrast magnetic resonance imaging

https://arxiv.org/abs/2307.01798

insert image description here
Simultaneous multi-indicator quantification, segmentation, and uncertainty estimation of liver tumors on multimodal non-contrast magnetic resonance imaging (NCMRI) are crucial for accurate diagnosis. However, existing methods lack an efficient mechanism for multimodal NCMRI fusion and accurate boundary information capture, making these tasks challenging. To address these issues, this paper proposes a unified framework, namely edge-aware multi-task network (EaMtNet), to associate multi-indicator quantification, segmentation and uncertainty in multimodal NCMRI of liver tumors. EaMtNet employs two parallel CNN encoders and Sobel filters to extract local features and edge maps, respectively. A newly designed edge-aware feature aggregation module (EaFA) is used for feature fusion and selection, making the network edge-aware by capturing the long-range dependencies between features and edge maps. Multi-task exploits prediction differences to estimate uncertainty and improve segmentation and quantification performance. Extensive experiments were performed on multimodal NCMRI with 250 clinical subjects. The proposed model outperforms the state-of-the-art by a large margin, achieving a dice similarity coefficient of 90.01$\pm 1.23 and a mean absolute error of 2.72 1.23 and a mean absolute error of 2.721.23 and a mean absolute error of 2.72 \pm$0.58 mm MD. The results demonstrate the potential of EaMtNet as a reliable clinical aid in medical image analysis.

1.15 ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance

ARHNet: Adaptive Region Harmonization for Lesion-Aware Enhancement to Improve Segmentation Performance

https://arxiv.org/abs/2307.01220

insert image description here
Accurate segmentation of brain lesions on MRI scans is critical for providing patient prognosis and neuromonitoring. However, the performance of CNN-based segmentation methods is limited by the limited training set size. Advanced data augmentation is an effective strategy to improve model robustness. However, they often introduce intensity differences between foreground and background regions as well as boundary artifacts, which weaken the effectiveness of this strategy. In this paper, we propose a foreground harmonization framework (ARHNet) to address intensity differences and make synthetic images look more realistic. In particular, we propose an Adaptive Region Harmonization (ARH) module to dynamically align the background and attention mechanism of foreground feature maps. We demonstrate the efficacy of our method in improving segmentation performance using real and synthetic images. Experimental results on the ATLAS 2.0 dataset demonstrate that ARHNet outperforms other methods for image coordination tasks and improves downstream segmentation performance. Our code is publicly available at www.example.com https://github.com/King-HAW/ARHNet.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/131603571