BoundaryCAM, a boundary-based weakly supervised semantic segmentation and refinement framework for medical images

BoundaryCAM: A Boundary-based Refinement Framework for Weakly Supervised Semantic Segmentation of Medical Images

Summary

Paper Link
Code Link

  • Most state-of-the-art techniques lack understanding of geometric features embedded in images, since networks cannot obtain any object boundary information from image-level labels alone
  • This paper defines a boundary as a line that separates an object from its background or two different objects
  • BoundaryCAM: Deploys state-of-the-art class activation maps combined with various post-processing techniques to achieve fine-grained higher-precision segmentation masks. A state-of-the-art unsupervised semantic segmentation network that can be used to build boundary maps, enabling BoundaryCAM to predict the location of objects with sharper boundaries.

The effect is as shown in the figure:
insert image description here

Method in this paper

insert image description here
First, an Unsupervised Segmentation (USS) network is used to cluster image pixels into larger groups, based on the Quickshift and SLIC algorithms. Second, a trained
classifier is used to generate a CAM, based on the clusters from the USS, for a given input image Each class in , serves as an initial segmentation mask,
and then, the BoundaryFit module combines the two to generate a refined segmentation mask for each object in the image. If necessary, these pseudo-labels can then be used to train a fully supervised segmentation network to improve its accuracy

Unsupervised Segmentation Detection

Our method focuses on pre-refining the CAM predictions to the nearest edge within the mask to reduce the model's identification of false positive pixel predictions. The
USS network is used to generate a useful edge map that decomposes the image into simpler parts.
USS clusters similar pixels together, removing unimportant details in the input image, leaving us with a less detailed image.
The USS method consists of two convolutional layers with ReLU, batch normalization, continuity loss and similarity loss.

  • The continuity loss prevents the network from using an arbitrary number of pixel clusters and constrains them with a user-defined maximum value q.
  • The similarity loss encourages the network to only cluster pixels that are close to each other in the feature space.
    For the BoundaryCAM framework, we use SLIC and Quickshift as the continuity loss, and perform an extended hyperparameter search to generate suitable simplified images.
    Pixels that are clustered too much will blend the object with the background, while pixels that are clustered too little will not sufficiently simplify the image.

Class Activation Map - CAM

  • First train a traditional classifier model on the target dataset
  • This model is subsequently used by GradCAM to extract a response map for each input edge map obtained from the USS stage
  • The segmentation masks generated by each class n are considered together as the initial response map, defined as MCAM := {M0CAM, M1CAM, ..., MnCAM}
  • Whereas the original CAM predictions highlight the most discriminative object parts and a large number of background pixels around them, our CAM initial response map covers a large part of the image to ensure that the complete object and some background pixels are inside

BoundaryFit module

Once we have successfully generated an edge map, we can use it as an additional source of supervision, which provides geometric guidance for us to extract relevant boundaries. Therefore, we can combine it with the initial response mapping of CAMs, which usually capture full objects at the cost of adding many background pixels. Next, combining the edge map obtained from the previous stage and the Floodfill algorithm, we successfully remove some irrelevant background pixels from the mask.

The Floodfill algorithm does this by starting with any negative pixel (x, y) in the initial response map, and then making all pixels belonging to the same cluster negative (x, y) as well. Note that we define a cluster as the set of pixels within a region with a closed boundary in the edge map IEM (IEM is a collection obtained by an unsupervised method), including pixels demarcating the image boundary.

Since some pixel clusters also arrive at the CAM prediction, only pixel clusters that are completely within the CAM mask will remain positive. We define this step as follows:
insert image description here

Experimental results

insert image description here

insert image description here

Summarize

  • The new BoundaryCAM framework is proposed, and a new BoundaryFit module is introduced between the CAM and the final FSSS model to achieve finer segmentation masks, which can be used to improve the overall accuracy of the model.
  • The BoundaryFit module provides additional saliency by using an unsupervised semantic segmentation model that refines CAM predictions to obtain higher quality training labels for state-of-the-art FSSS models.
  • The BoundaryFit module can be integrated into any existing WSSS framework to improve the quality of its predictions, as demonstrated on three real-world medical imaging datasets.

Guess you like

Origin blog.csdn.net/qq_45745941/article/details/129912807