Image Segmentation-Segment Anything Practice

1. Model introduction

The Segment Anything model is a new image segmentation model that can segment any object in an image without requiring a large amount of annotated data. This approach can help researchers and developers in the field of computer vision train models more easily, thereby improving the performance of computer vision applications. The model uses a "self-supervised learning" method to train the model without requiring a large amount of labeled data. The model uses an algorithm called "Contrastive Predictive Coding (CPC)" to learn useful features from unlabeled images and use these features for image segmentation tasks.

The Segment Anything model can be used in many application scenarios such as:

Self-driving cars: Self-driving cars need to be able to recognize objects such as roads, vehicles, and pedestrians, and segment them. Using the Segment Anything model allows for more accurate object segmentation, improving the performance of self-driving cars.
Medical image analysis: Medical images often contain many different types of tissues and organs. Using the Segment Anything model can more accurately segment these tissues and organs, helping doctors better diagnose diseases.
Video Surveillance: Video surveillance systems need to be able to identify and track different objects and segment them. Using the Segment Anything model allows for more accurate object segmentation, thereby improving the performance of video surveillance systems.

Compared with traditional image segmentation methods, the advantages and differences of the Segment Anything model mainly include the following points:

You don’t need a large amount of labeled data to train the model: Traditional image segmentation methods require a large amount of labeled data to train the model;
Any object can be segmented: Traditional image segmentation methods can usually only segment specific types of objects;
More accurate: Compared with traditional image segmentation methods, the Segment Anything model can segment objects in images more accurately;
Faster: Since the Segment Anything model does not require large amounts of labeled data, the model can be trained faster.

2. How to use

Segment Anything can segment and mask any object in any photo or video with one click, including objects and image types not seen during training. A supporting data set was also released, which is 400 times larger than existing data sets. It generates high-quality object masks from input cues and is used to generate masks for all objects in the image. It has been trained on a dataset consisting of 11 million images and 1.1 billion masks and achieves strong performance on various segmentation tasks.

To use the Segment Anything model for image segmentation, Facebook's Segment Anything library can be used. This library is a PyTorch library that provides many pre-trained models, including Segment Anything models. Use these pre-trained models for image segmentation and integrate them into your application.

Facebook official example

segment-anything online demo experience

SAM data set address

SAM-paper

3. Code practice

Segment Anything Model (SAM) predicts the object mask and gives the prompt input required to identify the object. The model first converts images into image embeddings, and then the decoder generates high-quality masks based on user-input cues.

The SamPredictor class provides a simple interface for model calls to prompt model input. It first lets the user set an image using the "set_image" method, which transforms the image input into a feature space embedding. Hint information can then be entered via the "predict" method to efficiently predict the mask based on these hints. The predict function supports point and box hints as well as the mask from the previous prediction iteration as input.

#加载SAM模型
import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamPredictor

# 模型在官方示例链接中下载，有三个模型，任意使用一个
sam_checkpoint = "sam_vit_b_01ec64.pth"

model_type = "vit_b"

device = "cuda"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)

predictor = SamPredictor(sam)

#处理图像以生成图像嵌入特征向量，将其用于后续掩码预测
predictor.set_image(image)

1. Single point input prediction mask

Single points are input into the model in (x, y) format and are labeled 1 (foreground point) or 0 (background point). Multiple points can be entered. The selected points will appear as stars on the image. SAM outputs 3 masks, where "scores" give the model's estimate of the quality of these masks. The model returns masks (masks), scores for the masks (scores), and low-resolution masks (logits) that can be passed to the next prediction iteration, choosing the one with the highest score returned in "scores" to select the best mask.

input_point = np.array([[250, 187]])
input_label = np.array([1])

masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True,
)

print(masks.shape)  # (number_of_masks) x H x W  | output (3, 600, 900)

for i, (mask, score) in enumerate(zip(masks, scores)):
    plt.figure(figsize=(10,10))
    plt.imshow(image)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize=18)
    plt.axis('off')
    plt.show()

Model output:

2. Multi-point input prediction mask

Multipoints are input in the form (x, y) with labels 1 (foreground points) or 0 (background points). The selected points will appear as stars on the image. When using multiple hints to specify a single object, you can request a single mask by setting "multimask_output=False".

#(2)多点输入生成mask
input_point = np.array([[250, 284], [362, 422]])
input_label = np.array([1, 1])

mask_input = logits[np.argmax(scores), :, :]  # Choose the model's best mask

masks, _, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    mask_input=mask_input[None, :, :],
    multimask_output=False,
)
print(masks.shape) #output: (1, 600, 900)

plt.figure(figsize=(10,10))
plt.imshow(image)
show_mask(masks, plt.gca())
show_points(input_point, input_label, plt.gca())
plt.axis('off')
plt.show()

Model output:

3. Enter the box prediction mask

Segment anything supports taking a box in xyxy format as input and identifying the subject target in the box (similar to instance segmentation)

input_box = np.array([70, 140, 500, 610])
masks, _, _ = predictor.predict(
    point_coords=None,
    point_labels=None,
    box=input_box[None, :],
    multimask_output=False,
)
plt.figure(figsize=(10, 10))
plt.imshow(image)
show_mask(masks[0], plt.gca())
show_box(input_box, plt.gca())
plt.axis('off')
plt.show()

Model output:

4. Automatically predict and generate masks

import numpy as np
import torch
import matplotlib.pyplot as plt
import cv2
import sys
sys.path.append("..")
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:,:,3] = 0
    for ann in sorted_anns:
        m = ann['segmentation']
        color_mask = np.concatenate([np.random.random(3), [0.35]])
        img[m] = color_mask
    ax.imshow(img)

image = cv2.imread('dog11.jpg')
image = cv2.resize(image,None,fx=0.5,fy=0.5)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

sam_checkpoint = "sam_vit_b_01ec64.pth"
model_type = "vit_b"

device = "cuda"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
#自动生成采样点对图像进行分割
mask_generator = SamAutomaticMaskGenerator(sam)

masks = mask_generator.generate(image)

print(len(masks))
print(masks[0].keys())
print(masks[0])

plt.figure(figsize=(16,16))
plt.imshow(image)
show_anns(masks)
plt.axis('off')
plt.show()

Model output:

OK, my recent work involves some image segmentation applications. For large visual models, Meta’s Segment Anything must be practiced. Segment Anything can be used for image segmentation in a variety of scenarios, and can also involve multiple methods. , you can restrict it according to your own application scenarios, everyone is welcome to communicate~