[Computer Vision | Image Segmentation] arxiv Computer Vision Academic Express on Image Segmentation (Collection of Papers on September 13)

1. Segmentation | Semantic correlation (13 articles)

1.1 Semantic and Articulated Pedestrian Sensing Onboard a Moving Vehicle

Semantic and articulated pedestrian perception on moving vehicles

https://arxiv.org/abs/2309.06313

Due to the large forward motion of the vehicle, it is difficult to perform 3D reconstruction from videos collected onboard. Even the object detection and human perception models perform significantly worse on onboard video as objects often appear farther away from the camera and image quality often suffers from motion blur compared to standard object detection benchmarks. And lower, and occlusion often occurs. This has led to the popularity of specific benchmarks for traffic data. Recently, light detection and ranging (LiDAR) sensors have become popular to directly estimate depth without performing 3D reconstruction. However, compared with image-based methods, LiDAR-based methods still lack articulated human detection at a certain distance. We hypothesize that benchmarking human body perception in LiDAR data can increase research on human body perception and prediction in traffic and potentially improve traffic safety for pedestrians.

1.2 360 ∘ ^\circ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation

360 ∘ ^\circ from a single camera : Few-shot method for LiDAR segmentation

https://arxiv.org/abs/2309.06197

Deep learning applications on LiDAR data encounter strong domain gaps when applied to different sensors or tasks. In order for these methods to achieve similar accuracy on diverse data compared to the values ​​reported by public benchmarks, large-scale annotated datasets are necessary. However, in practical applications, labeled data is expensive and time-consuming to obtain. These factors have triggered various studies on efficient methods for labeling, but there is still a large gap with their fully supervised counterparts. Therefore, we propose ImageTo 360, an effective and streamlined Few-Shot method for label-efficient LiDAR segmentation. Our method utilizes an image teacher network to generate semantic predictions on LiDAR data in a single camera view. The teacher is used to pre-train the LiDAR segmentation student network and then perform 360 ∘ ^\circ∘Data for optional fine-tuning. Our approach is point-level in a modular manner and is therefore generalizable to different architectures. We improve the results of current state-of-the-art label-efficient methods and even surpass some traditional fully supervised segmentation networks.

1.3 Active Label Refinement for Semantic Segmentation of Satellite Images

Application of active label refinement in semantic segmentation of satellite images

https://arxiv.org/abs/2309.06159

Remote sensing through semantic segmentation of satellite images contributes to the understanding and utilization of the Earth's surface. For this purpose, semantic segmentation networks are typically trained on large sets of labeled satellite images. However, obtaining expert labels for these images is expensive. Therefore, we recommend low-cost methods such as crowdsourcing or pre-trained networks to label images in the first step. Since these initial labels are partially wrong, we use an active learning strategy to refine the labels in a second step in a cost-effective manner. We evaluate active learning strategies using satellite imagery, land cover and land use labels of Bangalore, India. Our experimental results show that aggressive label refinement is beneficial to improve the performance of semantic segmentation networks.

1.4 Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing

Real-time semantic segmentation: a brief review and comparative study in the field of remote sensing

https://arxiv.org/abs/2309.06047

Real-time semantic segmentation of remote sensing images is a challenging task that requires a trade-off between effectiveness and efficiency. It has many applications, including tracking forest fires, detecting land use and land cover changes, crop health monitoring, and more. With the success of efficient deep learning methods (i.e., efficient deep neural networks) for real-time semantic segmentation in computer vision, researchers have adopted these efficient deep neural networks for remote sensing image analysis. This article first summarizes basic compression methods for designing efficient deep neural networks and provides a brief but comprehensive survey outlining recent developments in real-time semantic segmentation of remote sensing images. We study several pioneering efficient deep learning methods and place them into a classification based on network architecture design methods. Furthermore, we evaluate the quality and efficiency of some existing effective deep neural networks on the publicly available remote sensing semantic segmentation benchmark dataset, OpenEarthMap. Experimental results from extensive comparative studies show that most existing efficient deep neural networks have good segmentation quality, but their inference speed is low (i.e., high latency rate), which may limit their real-time application in remote sensing image segmentation deployment capabilities. This article provides an outlook on the research status and future research directions of real-time semantic segmentation of remote sensing images.

1.5 FLDNet: A Foreground-Aware Network for Polyp Segmentation Leveraging Long-Distance Dependencies

FLDNet: a foreground-aware polyp segmentation network exploiting long-range dependencies

https://arxiv.org/abs/2309.05987

Given the close relationship between colorectal cancer and polyps, the diagnosis and identification of colorectal polyps plays a key role in the detection and surgical intervention of colorectal cancer. In this context, automatic detection and segmentation of polyps from various colonoscopy images has become an important issue and attracted widespread attention. Current polyp segmentation technology faces several challenges: first, polyps vary in size, texture, color, and pattern; second, the boundary between polyps and mucosa is often blurred, and existing research focuses on learning the local features of polyps, while ignoring It ignores the long-range dependence of features and also ignores the local context and global context information of combined features. To address these challenges, we propose FLDNet (Foreground-Long-Distance Network), a Transformer-based neural network that can capture long-distance dependencies for accurate polyp segmentation. Specifically, the proposed model includes three main modules: a pyramid-based Transformer encoder, a local context module, and a foreground-aware module. Multi-level features with long-distance dependency information are first captured by the pyramid-based Transformer encoder. On the high-level features, the local context module obtains local features related to polyps by constructing different local context information. The coarse image obtained by decoding the reconstructed highest-level features guides the feature fusion process in the foreground-aware module of high-level features to achieve foreground enhancement of polyps. Our proposed method, FLDNet, is evaluated on common datasets using seven metrics and is demonstrated to outperform state-of-the-art methods on widely used evaluation measures.

1.6 Medical Image Segmentation with Belief Function Theory and Deep Learning

Medical image segmentation based on trust function theory and deep learning

https://arxiv.org/abs/2309.05914

Deep learning shows powerful learning and feature representation capabilities in medical image segmentation. However, it has limitations in reasoning and combining imperfect (imprecise, uncertain and partial) information. This paper studies medical image segmentation methods based on belief function theory and deep learning, focusing on information modeling and fusion based on uncertain evidence. First, we review existing medical image segmentation methods based on confidence function theory and discuss their advantages and challenges. Second, we propose a semi-supervised medical image segmentation framework to reduce the uncertainty caused by the lack of annotations in evidence segmentation and evidence fusion. Third, we compare two evidence classifiers, evidence neural networks and radial basis function networks, demonstrating the effectiveness of belief function theory in uncertainty quantification; we use two evidence classifiers with deep neural networks to build a deep evidence model for lymphoma segmentation. Fourth, we propose a multimodal medical image fusion framework that takes into account the reliability of each MR image source when performing different segmentation tasks, using quality functions and context discounting.

1.7 Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation

Semantic segmentation of Few-Shot remote sensing images based on autocorrelation and cross-correlation learning

https://arxiv.org/abs/2309.05840

Semantic segmentation of remote sensing images is an important issue in remote sensing image interpretation. Although significant progress has been made, existing deep neural network methods suffer from reliance on large amounts of training data. The purpose of Few-Shot remote sensing image semantic segmentation is to learn to segment target objects from query images using only a small number of annotation support images of the target class. Most existing Few-Shot learning methods mainly stem from the fact that they only focus on extracting information from supporting images, thereby failing to effectively address large variations in appearance and scale of geographic objects. To address these issues, we propose an autocorrelation and cross-correlation learning network for semantic segmentation of Few-Shot remote sensing images. Our model enhances generalization capabilities by considering autocorrelation and cross-correlation between support and query images for segmentation prediction. To further explore autocorrelation with the query image, we propose a classical spectral approach to generate class-agnostic segmentation masks based on the underlying visual information of the image. Experiments on two remote sensing image data sets demonstrate the effectiveness and superiority of this model in semantic segmentation of Few-Shot remote sensing images. Code and models will be accessed at https://github.com/linhanwang/SCCNe.

1.8 Lung Diseases Image Segmentation using Faster R-CNNs

Lung disease image segmentation based on fast R-CNN

https://arxiv.org/abs/2309.06386

Lung disease is the leading cause of death among children in developing countries, and India accounted for approximately half of global pneumonia deaths in 2016 (370,000). Prompt diagnosis is critical to reducing mortality. This paper introduces a low-density neural network structure to alleviate topological challenges in deep networks. The network incorporates parameters into the feature pyramid, enhancing data extraction and minimizing information loss. Soft non-maximum suppression optimizes the region proposals generated by the region proposal network. The study evaluated the model on chest X-ray images, calculating a confusion matrix to determine accuracy, precision, sensitivity and specificity. We analyze the loss function, emphasizing its trends during training. Region proposal loss and classification loss evaluate model performance during training and classification phases. This article analyzes lung disease detection and neural network structure.

1.9 Improving Generalization Capability of Deep Learning-Based Nuclei Instance Segmentation by Non-deterministic Train Time and Deterministic Test Time Stain Normalization

Normalization of non-deterministic training time and deterministic test time improves the generalization ability of kernel instance segmentation based on deep learning

https://arxiv.org/abs/2309.06143

With the advent of digital pathology and microscopy systems that can automatically scan and save histological images of entire slides, there is a growing trend to use computerized methods to analyze acquired images. Among different histopathological image analysis tasks, nuclei instance segmentation plays an important role in a wide range of clinical and research applications. Although many semi- and fully automatic computerized methods have been proposed for nuclei instance segmentation, deep learning (DL)-based methods have been shown to have the best performance. However, the performance of this approach often degrades when tested on unseen datasets. In this work, we propose a new method to improve the generalization ability of DL-based automatic segmentation methods. In addition to utilizing a state-of-the-art DL-based model as a baseline, our approach combines non-deterministic training-time and deterministic test-time strain normalization. We train the model with a single training set and evaluate its segmentation performance on seven test datasets. Our results show that the proposed method provides up to 5.77%, 5.36%, and 5.27% better performance in segmentation based on the Dice score, aggregated Jaccard index, and panoramic quality score, respectively, compared to baseline segmentation Model.

1.10 A2V: A Semi-Supervised Domain Adaptation Framework for Brain Vessel Segmentation via Two-Phase Training Angiography-to-Venography Translation

A2V: a semi-supervised domain-adaptive cerebral blood vessel segmentation framework based on two-stage trained vessel-to-vein transformation

https://arxiv.org/abs/2309.06075

We propose a semi-supervised domain adaptation framework for brain vessel segmentation from different image modalities. Existing state-of-the-art methods focus on a single modality, despite the wide range of available cerebrovascular imaging techniques. This can lead to significant distribution shifts that negatively impact generalization across modalities. By relying on annotated angiograms and a limited number of annotated venograms, our framework enables image-to-image translation and semantic segmentation, exploits an unraveled and semantically rich latent space to represent heterogeneous data, and performs image-level adaptation from source to target domain. Furthermore, we reduce the typical complexity of period-based architectures and minimize the use of adversarial training, which allows us to build an efficient and intuitive model with stable training. We evaluate our method on magnetic resonance angiography and venography. While achieving state-of-the-art performance in the source domain, our method achieves a Dice score coefficient of only 8.9% lower in the target domain, highlighting its promise for powerful cerebrovascular image segmentation in different modalities.

1.11 Introducing Shape Prior Module in Diffusion Model for Medical Image Segmentation

Introducing shape prior model into diffusion model for medical image segmentation

https://arxiv.org/abs/2309.05929

Medical image segmentation is key to diagnosing and treating spinal diseases. However, the presence of high noise, ambiguity, and uncertainty makes this task extremely challenging. Factors such as unclear anatomical boundaries, similarities between classes, and unreasonable annotations contribute to this challenge. Achieving accurate and diverse segmentation templates is crucial to support radiologists in clinical practice. In recent years, denoising diffusion probabilistic model (DDPM) has become a prominent research topic in the field of computer vision. It has demonstrated effectiveness in a variety of vision tasks, including image deblurring, super-resolution, anomaly detection, and even pixel-level semantic representation generation. Despite the robustness of existing diffusion models in vision generation tasks, they still struggle with discrete masks and their various effects. To meet the demand for accurate and diverse spine medical image segmentation templates, we propose an end-to-end framework called VerseDiff-UNet, which utilizes the Denoising Diffusion Probabilistic Model (DDPM). Our method integrates a diffusion model into a standard U-shaped architecture. At each step, we combine the noisy image with a labeled mask to accurately guide the diffusion direction toward the target area. Furthermore, to capture specific anatomical prior information in medical images, we incorporate a shape prior module. This module efficiently extracts structural semantic information from input spine images. We evaluate our method on a single dataset of spine images via X-ray imaging. Our results show that VerseDiff-UNet significantly outperforms other state-of-the-art methods in terms of accuracy while preserving natural features and anatomical variations.

1.12 Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation

Deep evidence fusion based multi-modal medical image segmentation based on uncertainty quantification and context discounting

https://arxiv.org/abs/2309.05919

Single-modality medical images often do not contain enough information to achieve accurate and reliable diagnosis. For this reason, physicians often diagnose diseases based on multimodal medical images, e.g., PET/CT. Effective fusion of multimodal information is critical to making reliable decisions and explaining how decisions are made. In this paper, we propose a fusion framework for multimodal medical image segmentation based on deep learning and evidence-based Dempster-Shafer theory. In this framework, the reliability of each single-modality image is considered by a contextual discounting operation when segmenting different objects. Evidence from each modality is then combined according to Dempster's rules to arrive at a final decision. Experimental results with PET-CT datasets with lymphoma and multiple MRI datasets with brain tumors demonstrate that our method outperforms state-of-the-art methods in accuracy and reliability.

1.13 LUNet: Deep Learning for the Segmentation of Arterioles and Venules in High Resolution Fundus Images

LUNet: deep learning for segmentation of arterioles and venules in high-resolution fundus images

https://arxiv.org/abs/2309.05780

The retina is the only part of the human body where blood vessels can be accessed non-invasively using imaging techniques such as digital fundus imaging (DFI). The spatial distribution of retinal microvessels can change with cardiovascular disease, so the eye can be considered a window to our heart. In silico segmentation of retinal arterioles and venules (A/Vs) is essential for automated microvascular analysis. Using active learning, we create a new DFI dataset of 240 crowdsourced manual A/V segmentations performed by 15 medical students and reviewed by ophthalmologists, and develop LUNet, a method for high A novel deep learning architecture for resolution A/V segmentation. The LUNet architecture includes a double dilated convolution block designed to enhance the model's receptive field and reduce its parameter count. Furthermore, LUNet has a long tail that operates at high resolution to refine segmentations. The custom loss function emphasizes the continuity of blood vessels. LUNet significantly outperforms two state-of-the-art segmentation algorithms on local test sets, as well as on four external test sets simulating distributional changes in race, comorbidity, and annotation. We make newly created datasets open access (at the time of publication).

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132868754