Exploration of Weakly Supervised Semantic Segmentation Integration in Medical Imaging Systems

Exploring Weakly Supervised Semantic Segmentation Ensembles for Medical Imaging Systems

Summary

  • Leveraging low-quality CAM predictions from complex datasets to improve the accuracy of results
  • Cover target objects with high certainty using low-threshold CAMs
  • By combining multiple low-threshold cams, target objects are highlighted while their error
    codes are evenly eliminated Link
    Article Link

Method in this paper

insert image description here
First, train a classifier model (resnet) on the target dataset.
Second, use Grad-CAM to create the first masks for the different classifiers.
Next, test ensemble methods to combine two or more sets of predictions.
Ultimately, the version of the ensemble that provides the best possible results is chosen, followed by a calibration step to determine the best ensemble of thresholds for the highest detection score.

Classifier model instance training and exploration

This framework aims to create an ensemble of different CAM methods, as their ensemble offsets their shortcomings and thus yields more accurate predictions than their single components. The best collections are generated by high-quality cams, and high-quality cams are generated by high-quality classifiers, and we must study classifiers for the target dataset.


What to explore in this article

Instead of striving to test more complex networks, we took some inspiration from other methods such as self-supervised Swav. Instead of using annotations, Swav tries to learn to distinguish images without guidance. To do this, Swav uses a contrastive loss function to compare pairs of images. The goal of the loss function is to push apart images that are different in feature space, while pulling translations of the same image in feature space, or images in views, together.
The authors' interest in the Swav method relies on two reasons: first, using a pre-trained unsupervised model on medical datasets may improve the quality of Grad-CAM results. This may be because unsupervised models tend to discriminate shapes more than just guessing the correct class. Second, many state-of-the-art methods add additional regularization to the classification loss. Those regularizations, such as the affine transformation in SEAM, are largely the same as Swav's contrastive learning loss. Therefore, we hypothesize that the activations of an unsupervised trained model allow for more complete object recognition. This paper has evaluated multiple trained Swav models, but observed that their contrastive loss methods did not produce higher quality CAMs compared to traditionally trained classifiers.


Evaluate the grad-cam of the trained model

After creating candidate classifiers, we can focus on generating CAM predictions. For this we will apply grad-cam. Gradient Weighted Class Activation Mapping (Grad-CAM) takes a network and an image as input and returns a coarse mask.
However, it turns out that Grad-CAM cannot correctly localize objects in an image if the image contains multiple occurrences of the same class. In addition, it was also found that since the average value of the partial derivatives is not considered, the localization often does not correspond to the whole object, but only to a part of it. Therefore, grad-cam++ was introduced, which addresses these issues by using a more complex importance score formula. As a further optimization, SmoothGrad-CAM++ was introduced.
However, these improvements aim to improve the sharpness of detected object boundaries and alleviate the original method's problem of having multiple objects of the same kind in the same image. However, these problems do not occur in the observed medical datasets. None of the images in the BraTS and DECATHLON datasets contain multiple instances of the target object. Furthermore, these target objects are usually circular, and their boundaries are ambiguous even to experts. Nevertheless, we also tested smoothgrad-cam++ as it is the latest version of this method.
For CAM generation, our trained candidate model and images will be run through the candidate Grad-CAM, creating a MASK for all images

Integration method

insert image description here
Candidate models and grad-cam masks are collected, and our goal is to combine them to obtain higher quality results.
First, we have the "or" set, which summarizes the predictions for the candidate regions. This approach works best when both masks have a high true positive rate, yielding the largest possible activation region between the combined masks. The second is the "and" method, which multiplies the predictions of the proposals. Compared to the "or" method, the "and" method minimizes the possible detection regions of the two masks. This method works best when both models have a high true negative rate. The
"min" and "max" methods only use the MASK of the minimum or maximum positive classification pixels respectively.
In this way, the model tends to predict the target. The problem is that the size of the object is too large or too small, and the complete prediction of one candidate object is still better than the "and" or "or" of all candidates.
Since the Grad-CAM method returns predictions from 0 to 1, we can determine the threshold value at which a pixel is considered positive or negative. This hyperparameter lets us have a lot of leeway in the false positive and false negative rates for any given prediction. Using a very high threshold greatly reduces the regions that are classified as positive, resulting in a high rate of false negatives. Vice versa, using a very low threshold greatly increases the regions classified as positive, resulting in a high false positive rate. The optimal threshold varies from candidate model to model. Therefore, we decided to test the chosen ensemble method using all combinations of thresholds from 0 to 1 with a step size of 0.1. We run these tests on the training set to determine the thresholds we will use on the validation set. Our experiments show that the optimal combination of thresholds for the training set is also one of the optimal combinations for the validation set.

Result display

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_45745941/article/details/129925344