1. Introduction to SAM model
- Segment Anything Model, or SAM for short, is the first basic image segmentation model in history released by Meta in early April. It is a large model formed by combining three interrelated elements: Task, Model and Data. The composition of the Task is as shown in the figure below. By inputting segmentation prompts and pictures, the mask
SAM is generated through model operation. The input prompts can be marked points, regular/irregular frame boundaries, or input words. If you enter "Cat", the model will recognize the cat and generate a mask.
The main operations in the Model are words and images. The architecture is shown in the figure below:
Finally, there is Data, which is used as input for model training. During the training process, the data is annotated to achieve model optimization. SAM’s explanation of Data is as follows:
2. Model use
- set-up
import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
# 用来显示掩膜
def show_anns(anns):
if len(anns) == 0:
return
sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
ax = plt.gca()
ax.set_autoscale_on(False)
img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
img[:,:,3] = 0
for ann in sorted_anns:
m = ann['segmentation']
color_mask = np.concatenate([np.random.random(3), [0.65]])
img[m] = color_mask
ax.imshow(img)
- example imageImport pictures
image = cv2.imread('images/lzu.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
- automatic mask generationautomatic mask generator
import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
sam_checkpoint = "sam_vit_b_01ec64.pth" # 模型
model_type = "vit_b"
device = "cuda" # 使用GPU
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
mask_generator = SamAutomaticMaskGenerator(sam)
Note: SamAutomaticMaskGenerator
The generator has several adjustable parameters that can be used to control sampling density or remove low-quality and duplicate masks, as well as set the generator to run on smaller objects after cropping to improve performance, and post-process to remove stray pixels. and generated holes, etc. SamAutomaticMaskGenerator
The parameter settings are as follows:
# SamAutomaticMaskGenerator 参数输入
mask_generator_2 = SamAutomaticMaskGenerator(
model=sam,
points_per_side=32,
pred_iou_thresh=0.86,
stability_score_thresh=0.92,
crop_n_layers=1,
crop_n_points_downscale_factor=2,
min_mask_region_area=100, # Requires open-cv to run post-processing
)
- Call
generate
the method to generate the mask. If you use the generator of custom parameters heremask_generator_2
, just replace it with tomasks = mask_generator_2.generate(image)
generate the mask.
masks = mask_generator.generate(image)
- show image image display, including segmentation pictures and masks
plt.figure(figsize=(20,20))
plt.imshow(image)
show_anns(masks2)
plt.axis('off')
plt.show()
3. Comparison of results
1. Urban complex image segmentation: the goal is building extraction
Fig1. Default parameter segmentation result ( mask_generator
generator)
Fig2. Custom parameter segmentation result ( mask_generator_2
generator)
2. Segmentation of high-resolution remote sensing images with simple categories: the goal is water extraction
Fig1. Segmentation results with default parameters ( mask_generator
generator)
Fig2. Customized parameter segmentation results ( mask_generator_2
generator)
3. Conclusion
It can be seen that:
(1) For complex urban images: default parameters may miss the recognition of buildings, and the integrity of custom parameter recognition is relatively high; (2)
For images with simple content: use default parameters or custom parameters There is not much difference in the results;
(3) Comparing Scheme 1 and Scheme 2, it can be seen that the accuracy of segmentation of complex images (such as cities) by the model is not as high as that of simple images; (4) In short,
this model can be used for automatic segmentation, and then artificial segmentation The results are processed. On the one hand, it can improve the efficiency of image segmentation and recognition, on the other hand, it can improve the accuracy compared to manual digitization;
(5) Judging from the SAM official examples, the data sets used in the official examples are also relatively complex, but the segmentation effect is not very good. The difference between our experiment and the official example is that the official ones are all taken pictures, while in this example we use high-resolution remote images. Combining the results of Example 1 and Example 2, we guess that this may be related to the pixel problem of these two images, so the segmentation effect is not as good as the official example. Later, you can try to compare the segmentation effect of remote sensing images with the effect of pictures.