【计算机视觉 | 目标检测 | 图像分割】Grounding DINO + Segment Anything Model (SAM)源代码分享(含源代码)

一、Grounding DINO + Segment Anything Model (SAM)

在本教程中,我们将学习如何使用两个突破性的模型自动注释图像 - Grounding DINO 和 Segment Anything Model (SAM)。

然后,我们可以使用此数据集来训练实时对象检测或实例分割模型。 以传统方式使用多边形对图像进行注释极其耗时且昂贵。 借助 Grounding DINO 和 SAM,初始注释仅需几分钟,我们的工作将简化为手动验证所获得的标签。

在这里插入图片描述

1.1 开始

让我们确保我们可以访问 GPU。 我们可以使用 nvidia-smi 命令来做到这一点。

!nvidia-smi
Tue Sep 19 21:10:12 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

注意:为了让我们更容易管理数据集、图像和模型,我们创建了一个 HOME 常量。

import os
HOME = os.getcwd()
print("HOME:", HOME)

输出为:

HOME: /content

1.2 Install Grounding DINO and Segment Anything Model

我们的项目将使用两种突破性的设计:Grounding DINO(用于零样本检测)和 Segment Anything Model(SAM)(用于将盒子转换为分段)。 我们必须先安装它们。

%cd {
    
    HOME}
!git clone https://github.com/IDEA-Research/GroundingDINO.git
%cd {
    
    HOME}/GroundingDINO
!git checkout -q 57535c5a79791cb76e36fdb64975271354f10251
!pip install -q -e .

部分输出结果为:

/content
Cloning into 'GroundingDINO'...
remote: Enumerating objects: 267, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 267 (delta 18), reused 16 (delta 16), pack-reused 226
Receiving objects: 100% (267/267), 12.33 MiB | 22.16 MiB/s, done.
Resolving deltas: 100% (122/122), done.
/content/GroundingDINO
  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 40.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.9/200.9 kB 14.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 32.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.1/200.1 kB 16.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 62.7 MB/s eta 0:00:00
%cd {
    
    HOME}

import sys
!{
    
    sys.executable} -m pip install 'git+https://github.com/facebookresearch/segment-anything.git'

部分输出结果为:

/content
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/facebookresearch/segment-anything.git
  Cloning https://github.com/facebookresearch/segment-anything.git to /tmp/pip-req-build-e2789hn3
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/segment-anything.git /tmp/pip-req-build-e2789hn3
  Resolved https://github.com/facebookresearch/segment-anything.git to commit 567662b0fd33ca4b022d94d3b8de896628cd32dd
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: segment-anything
  Building wheel for segment-anything (setup.py) ... done
  Created wheel for segment-anything: filename=segment_anything-1.0-py3-none-any.whl size=36610 sha256=ed90a8a550d5948a1b3b9c5f2a173970657a518009512fb20e89dc5ec5240160
  Stored in directory: /tmp/pip-ephem-wheel-cache-xtm37yc6/wheels/d5/11/03/7aca746a2c0e09f279b10436ced7175926bc38f650b736a648
Successfully built segment-anything
Installing collected packages: segment-anything
Successfully installed segment-anything-1.0

注意:为了将演示的所有元素粘合在一起,我们将使用监督 pip 包,它将帮助我们处理、过滤和可视化我们的检测以及保存我们的数据集。

然而,在这个演示中,我们需要最新版本中引入的功能。 因此,我们卸载当前的supervsion版本并安装0.6.0版本。

!pip uninstall -y supervision
!pip install -q supervision==0.6.0

import supervision as sv
print(sv.__version__)

输出结果为:

Found existing installation: supervision 0.4.0
Uninstalling supervision-0.4.0:
  Successfully uninstalled supervision-0.4.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
groundingdino 0.1.0 requires supervision==0.4.0, but you have supervision 0.6.0 which is incompatible.
0.6.0

让我们安装 roboflow pip 包。

!pip install -q roboflow

部分结果如下所示:

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.2/56.2 kB 2.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.8/67.8 kB 8.0 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 4.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.8/58.8 kB 6.2 MB/s eta 0:00:00
  Building wheel for wget (setup.py) ... done

1.3 Download Grounding DINO Model Weights

要运行 Grounding DINO,我们需要两个文件 - 配置和模型权重。 配置文件是 Grounding DINO 存储库的一部分,我们已经克隆了它。 另一方面,我们需要下载权重文件。 我们将这两个文件的路径写入 GROUNDING_DINO_CONFIG_PATH 和 GROUNDING_DINO_CHECKPOINT_PATH 变量,并验证路径是否正确以及文件是否存在于磁盘上。

import os

GROUNDING_DINO_CONFIG_PATH = os.path.join(HOME, "GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py")
print(GROUNDING_DINO_CONFIG_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CONFIG_PATH))

输出结果为:

/content/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py ; exist: True
%cd {
    
    HOME}
!mkdir -p {
    
    HOME}/weights
%cd {
    
    HOME}/weights

!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

输出结果为:

/content
/content/weights
import os

GROUNDING_DINO_CHECKPOINT_PATH = os.path.join(HOME, "weights", "groundingdino_swint_ogc.pth")
print(GROUNDING_DINO_CHECKPOINT_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CHECKPOINT_PATH))

输出结果为:

/content/weights/groundingdino_swint_ogc.pth ; exist: True

1.4 Download Segment Anything Model (SAM) Weights

与 Grounding DINO 一样,为了运行 SAM,我们需要一个权重文件,我们必须首先下载该文件。 我们将本地权重文件的路径写入 SAM_CHECKPOINT_PATH 变量,并验证路径是否正确以及文件是否存在于磁盘上。

%cd {
    
    HOME}
!mkdir -p {
    
    HOME}/weights
%cd {
    
    HOME}/weights

!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

输出为:

/content
/content/weights
import os

SAM_CHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")
print(SAM_CHECKPOINT_PATH, "; exist:", os.path.isfile(SAM_CHECKPOINT_PATH))

输出结果为:

/content/weights/sam_vit_h_4b8939.pth ; exist: True

1.5 Load models

import torch

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

1.5.1 Load Grounding DINO Model

%cd {
    
    HOME}/GroundingDINO

from groundingdino.util.inference import Model

grounding_dino_model = Model(model_config_path=GROUNDING_DINO_CONFIG_PATH, model_checkpoint_path=GROUNDING_DINO_CHECKPOINT_PATH)

在这里插入图片描述

1.5.2 Load Segment Anything Model (SAM)

SAM_ENCODER_VERSION = "vit_h"
from segment_anything import sam_model_registry, SamPredictor

sam = sam_model_registry[SAM_ENCODER_VERSION](checkpoint=SAM_CHECKPOINT_PATH).to(device=DEVICE)
sam_predictor = SamPredictor(sam)

1.6 Download Example Data

f"{
      
      HOME}/data"

输出为:

/content/data
%cd {
    
    HOME}
!mkdir {
    
    HOME}/data
%cd {
    
    HOME}/data

!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-5.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-6.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-7.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-8.jpeg

输出结果为:

/content
/content/data

1.7 Single Image Mask Auto Annotation

在我们自动注释整个数据集之前,让我们先关注单个图像。

SOURCE_IMAGE_PATH = f"{
      
      HOME}/data/dog-3.jpeg"
CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

1.7.1 Zero-Shot Object Detection with Grounding DINO

注意:为了获得更好的Grounding DINO检测,我们将使用下面定义的enhance_class_name函数来利用一些提示工程。

from typing import List

def enhance_class_name(class_names: List[str]) -> List[str]:
    return [
        f"all {
      
      class_name}s"
        for class_name
        in class_names
    ]
import cv2
import supervision as sv

# load image
image = cv2.imread(SOURCE_IMAGE_PATH)

# detect objects
detections = grounding_dino_model.predict_with_classes(
    image=image,
    classes=enhance_class_name(class_names=CLASSES),
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

# annotate image with detections
box_annotator = sv.BoxAnnotator()
labels = [
    f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
    for _, _, confidence, class_id, _ 
    in detections]
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)

%matplotlib inline
sv.plot_image(annotated_frame, (16, 16))

输出结果为:
在这里插入图片描述

1.7.2 Prompting SAM with detected boxes

import numpy as np
from segment_anything import SamPredictor


def segment(sam_predictor: SamPredictor, image: np.ndarray, xyxy: np.ndarray) -> np.ndarray:
    sam_predictor.set_image(image)
    result_masks = []
    for box in xyxy:
        masks, scores, logits = sam_predictor.predict(
            box=box,
            multimask_output=True
        )
        index = np.argmax(scores)
        result_masks.append(masks[index])
    return np.array(result_masks)
import cv2

# convert detections to masks
detections.mask = segment(
    sam_predictor=sam_predictor,
    image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
    xyxy=detections.xyxy
)

# annotate image with detections
box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()
labels = [
    f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
    for _, _, confidence, class_id, _ 
    in detections]
annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

import math

grid_size_dimension = math.ceil(math.sqrt(len(detections.mask)))

titles = [
    CLASSES[class_id]
    for class_id
    in detections.class_id
]

sv.plot_images_grid(
    images=detections.mask,
    titles=titles,
    grid_size=(grid_size_dimension, grid_size_dimension),
    size=(16, 16)
)

在这里插入图片描述

1.8 Full Dataset Mask Auto Annotation

import os

IMAGES_DIRECTORY = os.path.join(HOME, 'data')
IMAGES_EXTENSIONS = ['jpg', 'jpeg', 'png']

CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear', 'coffee', 'backpack', 'cap']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

1.8.1 Extract labels from images

import cv2
from tqdm.notebook import tqdm

images = {
    
    }
annotations = {
    
    }

image_paths = sv.list_files_with_extensions(
    directory=IMAGES_DIRECTORY, 
    extensions=IMAGES_EXTENSIONS)

for image_path in tqdm(image_paths):
    image_name = image_path.name
    image_path = str(image_path)
    image = cv2.imread(image_path)

    detections = grounding_dino_model.predict_with_classes(
        image=image,
        classes=enhance_class_name(class_names=CLASSES),
        box_threshold=BOX_TRESHOLD,
        text_threshold=TEXT_TRESHOLD
    )
    detections = detections[detections.class_id != None]
    detections.mask = segment(
        sam_predictor=sam_predictor,
        image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
        xyxy=detections.xyxy
    )
    images[image_name] = image
    annotations[image_name] = detections

输出结果为:

在这里插入图片描述
注意:在以 Pascal VOC XML 格式保存检测之前,让我们看一下我们获得的注释。 此步骤是可选的,请随意跳过。

plot_images = []
plot_titles = []

box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()

for image_name, detections in annotations.items():
    image = images[image_name]
    plot_images.append(image)
    plot_titles.append(image_name)

    labels = [
        f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
        for _, _, confidence, class_id, _ 
        in detections]
    annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
    annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
    plot_images.append(annotated_image)
    title = " ".join(set([
        CLASSES[class_id]
        for class_id
        in detections.class_id
    ]))
    plot_titles.append(title)

sv.plot_images_grid(
    images=plot_images,
    titles=plot_titles,
    grid_size=(len(annotations), 2),
    size=(2 * 4, len(annotations) * 4)
)

在这里插入图片描述

1.8.2 Save labels in Pascal VOC XML

在将注释上传到 Roboflow 之前,我们必须首先将它们保存到硬盘上。 为此,我们将使用最新的监督功能之一(在 0.6.0 更新中提供) - 数据集保存。

ANNOTATIONS_DIRECTORY = os.path.join(HOME, 'annotations')

MIN_IMAGE_AREA_PERCENTAGE = 0.002
MAX_IMAGE_AREA_PERCENTAGE = 0.80
APPROXIMATION_PERCENTAGE = 0.75
sv.Dataset(
    classes=CLASSES,
    images=images,
    annotations=annotations
).as_pascal_voc(
    annotations_directory_path=ANNOTATIONS_DIRECTORY,
    min_image_area_percentage=MIN_IMAGE_AREA_PERCENTAGE,
    max_image_area_percentage=MAX_IMAGE_AREA_PERCENTAGE,
    approximation_percentage=APPROXIMATION_PERCENTAGE
)

1.8.3 Upload annotations to Roboflow

现在我们准备使用 API 将注释上传到 Roboflow。

PROJECT_NAME = "auto-generated-dataset-7"
PROJECT_DESCRIPTION = "auto-generated-dataset-7"
import roboflow
from roboflow import Roboflow

roboflow.login()

workspace = Roboflow().workspace()
new_project = workspace.create_project(
    project_name=PROJECT_NAME,
    project_license="MIT",
    project_type="instance-segmentation", 
    annotation=PROJECT_DESCRIPTION)

部分结果如下:

visit https://app.roboflow.com/auth-cli to get your authentication token.
Paste the authentication token here: ··········
loading Roboflow workspace...
loading Roboflow project...
import os

for image_path in tqdm(image_paths):
    image_name = image_path.name
    annotation_name = f"{
      
      image_path.stem}.xml"
    image_path = str(image_path)
    annotation_path = os.path.join(ANNOTATIONS_DIRECTORY, annotation_name)
    new_project.upload(
        image_path=image_path, 
        annotation_path=annotation_path, 
        split="train", 
        is_prediction=True, 
        overwrite=True, 
        tag_names=["auto-annotated-with-grounded-sam"],
        batch_name="auto-annotated-with-grounded-sam"
    )

在这里插入图片描述

1.8.4 Convert Object Detection to Instance Segmentation Dataset

如果您已经拥有数据集,但它带有边界框注释,您可以快速轻松地将其转换为实例分割数据集。 SAM 允许使用框作为提示。

1.8.5 Download Object Detection Dataset from Roboflow

要使用本教程中的数据集,请确保以 Pascal VOC XML 格式下载。 我们将使用 BlueBerries 数据集作为示例。

%cd {
    
    HOME}

import roboflow
from roboflow import Roboflow

roboflow.login()

rf = Roboflow()

project = rf.workspace("inkyu-sa-e0c78").project("blueberries-u0e84")
dataset = project.version(1).download("voc")

在这里插入图片描述

dataset.location

位置为:/content/BlueBerries-1

!ls {
    
    dataset.location}

输出为:README.dataset.txt README.roboflow.txt test train valid

1.8.6 Load and Visualize Object Detection Dataset with Supervision

object_detection_dataset = sv.Dataset.from_pascal_voc(
    images_directory_path=f"{
      
      dataset.location}/train",
    annotations_directory_path=f"{
      
      dataset.location}/train"
)
import random
random.seed(9001)

注意:重新运行下面的单元格以查看数据集中的不同图像。

image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)

image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]

box_annotator = sv.BoxAnnotator()

annotated_image = box_annotator.annotate(scene=image.copy(), detections=detections, skip_label=True)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

1.8.7 Run SAM convert Boxes into Masks

from tqdm.notebook import tqdm

for image_name, image in tqdm(object_detection_dataset.images.items()):
    detections = object_detection_dataset.annotations[image_name]
    detections.mask = segment(
        sam_predictor=sam_predictor,
        image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
        xyxy=detections.xyxy
    )

在这里插入图片描述

image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)

image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]

mask_annotator = sv.MaskAnnotator()

annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/wzk4869/article/details/133064328