1. Segmentation | Semantic Correlation (6 articles)

1.1 Semantic Image Synthesis via Class-Adaptive Cross-Attention

Semantic image synthesis based on class-adaptive cross-attention

https://arxiv.org/abs/2308.16071

In semantic image synthesis, the state of the art is dominated by methods that use spatially adaptive normalization layers, which allow excellent visual generation quality and editing versatility. Given their efficacy, recent research efforts have focused on fine-grained local style control and multi-modal generation. However, by construction, such layers tend to ignore global image statistics, resulting in unconvincing local style editing and causing global inconsistencies such as color or lighting distribution shifts. Additionally, mapping styles in generators require semantic layout that imposes strict alignment constraints on features. In response, we design a new architecture where a cross-attention layer is used instead of a denormalization layer to regulate image generation. Our model inherits the advantages of both solutions, retaining state-of-the-art reconstruction quality, as well as improved global and local style transfer. Code and models are available at https://github.com/TFonta/CA2SIS.

1.2 Semi-supervised Domain Adaptation with Inter and Intra-domain Mixing for Semantic Segmentation

Semi-supervised domain-adaptive semantic segmentation based on inter- and intra-domain hybridization

https://arxiv.org/abs/2308.15855

Despite new advances in semantic segmentation techniques, performance degradation due to domain shift is an unavoidable challenge in practical applications. The current main approach to this problem is Unsupervised Domain Adaptation (UDA). However, unlabeled target data in UDA is overly restrictive and limits performance. To overcome this limitation, a more realistic scenario called semi-supervised domain adaptation (SSDA) has been proposed. Existing SSDA methods originate from the UDA paradigm and mainly focus on exploiting unlabeled target and source data. In this paper, we emphasize the significance of exploiting the in-domain information between limited labeled and unlabeled target data, as it greatly benefits domain adaptation. Instead of just using scarce labeled data for supervision, we propose a new SSDA framework that combines inter-domain mixing and intra-domain mixing. Inter-domain mixing alleviates the source-target domain gap and intra-domain mixing enriches the available target domain information. By learning inter-domain mixture and intra-domain mixture simultaneously, the network can capture more domain-invariant features and improve its performance in the target domain. We also explore different domain mixing operations to better utilize the information of the target domain. Comprehensive experiments on the GTA5toCityscapes and SYNTHIA2Cityscapes benchmarks demonstrate the effectiveness of our approach, significantly outperforming previous methods.

1.3 Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

Breaking and gathering: Learning text-supervised referential image segmentation

https://arxiv.org/abs/2308.15512

Referring to image segmentation, the task of segmenting free-form text described by arbitrary entities opens up a variety of vision applications. However, manual labeling of training data for this task is expensive, resulting in a lack of labeled data for training. We address this problem through a weakly supervised learning approach that uses textual descriptions of training images as the only source of supervision. To this end, we first propose a new model that discovers semantic entities in input images, and then combines these entities with references to the predicted masks from relevant textual queries. We also propose a new loss function that allows the model to be trained without any further supervision. Our method is evaluated on four public benchmarks for reference image segmentation, where it significantly outperforms existing methods on the same task and all benchmarks of recent open vocabulary segmentation models.

1.4 Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRI

Unsupervised anomaly segmentation method in magnetic resonance based on masked conditional diffusion

https://arxiv.org/abs/2308.16150

Unsupervised anomaly segmentation aims to detect patterns that are different from any patterns processed during training, often called anomalies or out-of-distribution patterns, without providing any associated manual segmentation. Since anomalies during deployment can lead to model failure, detecting anomalies can improve model reliability, which is valuable in high-risk fields such as medical imaging. This paper introduces Masked Modality Periodicity with Conditional Diffusion (MMCCD), a method that enables segmentation of abnormalities in different modalities in multimodal MRI. The method is based on two basic ideas. First, we propose the use of recurrent modal translation as a mechanism to enable anomaly detection. The image translation model learns tissue-specific modality maps, which are characteristic of tissue physiology. Therefore, these learned maps cannot translate tissue or image patterns that were never encountered during training, and errors enable their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to “imagine” what tissue underlies the masked region, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices from BraTS2021 multimodal MRI and testing on slices with tumors. We show that our method compares favorably to previous unsupervised methods based on image reconstruction and denoising with autoencoders and diffusion models.

1.5 Attention-based CT Scan Interpolation for Lesion Segmentation of Colorectal Liver Metastases

Application of attention-based CT scan interpolation method in colorectal cancer liver metastasis segmentation

https://arxiv.org/abs/2308.15932

Small liver lesions common in colorectal liver metastases (CRLM) are challenging for convolutional neural network (CNN) segmentation models, especially when we have a wide range of slice thicknesses in computed tomography (CT) scans. The slice thickness of CT images may vary depending on the clinical indication. For example, thinner sections are used for preoperative planning when fine anatomical detail of small vessels is required. While keeping the effective radiation dose to the patient as low as possible, various slice thicknesses are employed in CRLM due to its limitations. However, differences in slice thickness across CTs lead to significant performance degradation of CNN-based CT segmentation models. In this paper, we propose a novel unsupervised attention-based interpolation model to generate intermediate slices from consecutive triple-slice CT scans. We integrate a segmentation loss during the training of the interpolation model to leverage segmentation labels from existing slices to generate intermediate slices. Unlike common interpolation techniques in CT volumes, our model highlights regions of interest (liver and lesions) within abdominal CT scans in interpolated slices. Furthermore, the output of our model is consistent with the original input slices while improving segmentation performance in two cutting-edge 3D segmentation pipelines. We tested the proposed model on the CRLM dataset, upsampling thick slices of subjects and creating isotropic volumes for our segmentation model. The resulting isotropic dataset increases the Dice score in lesion segmentation and outperforms other interpolation methods on the interpolation metric.

1.6 Interpretability-guided Data Augmentation for Robust Segmentation in Multi-centre Colonoscopy Data

Interpretability-guided data augmentation for robust segmentation of multicenter colonoscopy data

https://arxiv.org/abs/2308.15881

Multicenter colonoscopy images from different medical centers exhibit different complicating factors and overlays affecting image content, depending on the specific acquisition center. Existing deep segmentation networks struggle to achieve sufficient generalization in such datasets, and currently available data augmentation methods do not effectively address these sources of data variability. As a solution, we introduce an innovative data augmentation method centered on interpretable saliency maps, aiming to enhance the generalization ability of deep learning models in the field of multicenter colonoscopy image segmentation. The proposed enhancement technique shows higher robustness in different segmentation models and domains. Thorough testing on a publicly available multicenter polyp detection dataset demonstrates the effectiveness and versatility of our approach, which can be observed in both quantitative and qualitative results. The code is publicly available at: https://github.com/nki-radiology/interpretability_augmentation

[Computer Vision | Image Segmentation] arxiv Computer Vision Academic Express on Image Segmentation (Collection of Papers on August 31)

Article directory