[Computer Vision | Image Segmentation] What is the performance of the general AI large model Segment Anything in medical image segmentation?

I saw a paper recently:

insert image description here
The paper address is:

https://arxiv.org/pdf/2304.14660.pdf

This article is used to explore the effect of the recently popular large model SA on medical images.

I. Introduction

In the past six months, ChatGPT, DALL E, etc. have triggered a frenzy of large-scale basic AI models. In early April, Meta AI released the first large-scale basic model Segment Anything Model (SAM) for image segmentation.

The biggest highlight of SAM is that it has good zero-shot segmentation performance for unknown data sets and tasks.

The segmentation process can be fully automatic (Everything mode) or driven by different manual prompts (Prompt mode), for example, text, points and boxes.

insert image description here
insert image description here

Although SAM has achieved impressive results on various natural image segmentation tasks, medical image segmentation has great limitations due to diverse imaging modalities, fine anatomical structures, ambiguous and complex boundaries, and a wide range of object scales. Challenging, its performance on large medical imaging datasets has yet to be verified.

The intelligent ultrasound team of Professor Ni Dong from the School of Biomedical Engineering of Shenzhen University, together with ETH Zurich, Shenzhen People's Hospital, Zhejiang University and Shenzhen Duying Medical Technology, compiled a collection of 553,000 images, including 16 image modalities, 68 COSMOS 553K, an ultra-large-scale medical image segmentation data set for segmentation targets in the biomedical field, and based on this data set, it is the first to conduct a comprehensive, multi-angle, and large-scale detailed evaluation of SAM, aiming to promote the development of medical image analysis and answer a question. Important question: What is the performance of SAM for medical image segmentation?

2. Data set display

In order to comprehensively evaluate and analyze the performance of SAM in medical image segmentation, the team collected and standardized 52 public datasets, and finally constructed a large-scale medical dataset containing 16 image modalities and 68 biomedical field segmentation targets (Table 1). The image segmentation data set COSMOS 553K, the display of the data set is shown in Figure 1, and the statistical information is shown in Figure 2:

insert image description here
Table 1. Segmentation targets included in COSMOS 553K. H: head and neck; C: chest; A: abdomen; P: pelvis; B: bone; O: other.

insert image description here
Figure 1 COSMOS 553K covers most medical imaging modalities and segmentation targets in the biomedical field. For example, brain tumors, fundus blood vessels, thyroid nodules, spine, lung, heart, abdominal organs or tumors, cells, polyps, and surgical instruments, etc. The human body image comes from Freepik, the author is brgfx (URL)

insert image description here
Figure 2 Statistics of COSMOS 553K. (a) The amount of processed data collected from public datasets; (b) Histogram distribution of target categories; c) Histogram distribution of image modality; (d) Histogram distribution of image resolution.

3. Method display

SAM provides different types of segmentation prompts, including points and boxes.

Point hints include positive samples representing the foreground and negative points representing the background.

Boxes represent regions of objects that need to be segmented.

Our testing strategy includes Everything mode: automatic segmentation (S1H, S1B) and Prompt mode: single positive sample point (S2), five positive sample points (S3), five positive sample points and five negative sample points (S4) , a single box (S5), a single box and a single positive sample point (S6), Figure 3 shows the SAM test framework we designed.

insert image description here
Fig. 3 The detailed testing framework of SAM designed in this study.

4. Results Analysis

This study comprehensively evaluates the segmentation performance of various modes of SAM on large-scale and diverse medical imaging datasets, and the evaluation results of the DICE index are shown in Figure 4.

insert image description here
insert image description here
Figure 4 DICE boxplots for different testing strategies. From top to bottom: S1H, S2, S3, S4, S5, S6.

Based on the experimental analysis, our main conclusions are as follows:

  1. Everything mode is not suitable for most medical image segmentation tasks. In this mode, SAM is poorly aware of medical segmentation objects and outputs a large number of false positive prediction masks (Fig. 5).
  2. In the Everything mode, the number of grid sampling points as a hint will affect the segmentation performance to a certain extent, as shown in Figure 6. This is a trade-off between splitting performance and testing efficiency.
  3. In Prompt mode, adding more foreground points can significantly improve the segmentation results of SAM. However, the foreground and background in medical images are easily confused, and random addition of negative sample points may cause a decrease in segmentation performance. In addition, box hints (S5) contain rich object location information. Therefore, box hints perform better than point hints in most medical segmentation tasks in our study. In the current study, the performance of the mixed strategy (adding point hints and box hints at the same time) did not improve significantly. This may be related to SAM's ability to encode mixed cues. Figures 7 and 8 show the visualization results of SAM under various testing strategies.
  4. Different attributes of segmented objects may affect SAM's ability to perceive medical segmented objects. In particular, SAM may not perform well on objects with complex shapes, small areas, or low contrast. Figure 9 demonstrates the relationship between DICE and different attributes of the target.

insert image description here
Figure 5 Visualization results of the Everything mode.

insert image description here
Figure 6 The effect of the number of grid sampling points on segmentation performance in Everything mode

insert image description here
Figure 7 Typical good case of SAM

insert image description here
Figure 8 Typical SAM failure cases

In general, although SAM has the potential to become a general medical image segmentation model, its performance in medical image segmentation tasks is currently unstable. Therefore, future research should focus on how to effectively use a small number of medical images to fine-tune SAM to improve the reliability of the model and build a Segment Anything model belonging to medical images. In addition, it is also an interesting direction to extend 3D-SAM and explore its segmentation performance on 3D volume data. We hope that this report can help readers and the community better understand the performance of SAM in medical image segmentation in more detail, and ultimately promote the development of a new generation of basic large models for medical image segmentation.

Sincerely thank all the organizers and owners of public datasets for their open source contributions, and we will also prepare open source compiled datasets to promote the development of the field and the community. At the same time, I am very grateful to Meta AI for publicly releasing the source code of SAM.

Source link:

https://github.com/facebookresearch/segment-anything

Guess you like

Origin blog.csdn.net/wzk4869/article/details/130493847