文章目录

一、Grounding DINO + Segment Anything Model (SAM)

一、Grounding DINO + Segment Anything Model (SAM)

在本教程中，我们将学习如何使用两个突破性的模型自动注释图像 - Grounding DINO 和 Segment Anything Model (SAM)。

然后，我们可以使用此数据集来训练实时对象检测或实例分割模型。以传统方式使用多边形对图像进行注释极其耗时且昂贵。借助 Grounding DINO 和 SAM，初始注释仅需几分钟，我们的工作将简化为手动验证所获得的标签。

在这里插入图片描述

1.1 开始

让我们确保我们可以访问 GPU。我们可以使用 nvidia-smi 命令来做到这一点。

!nvidia-smi

Tue Sep 19 21:10:12 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

注意：为了让我们更容易管理数据集、图像和模型，我们创建了一个 HOME 常量。

import os
HOME = os.getcwd()
print("HOME:", HOME)

输出为：

HOME: /content

1.2 Install Grounding DINO and Segment Anything Model

我们的项目将使用两种突破性的设计：Grounding DINO（用于零样本检测）和 Segment Anything Model（SAM）（用于将盒子转换为分段）。我们必须先安装它们。

%cd {
    
    HOME}
!git clone https://github.com/IDEA-Research/GroundingDINO.git
%cd {
    
    HOME}/GroundingDINO
!git checkout -q 57535c5a79791cb76e36fdb64975271354f10251
!pip install -q -e .

部分输出结果为：

/content
Cloning into 'GroundingDINO'...
remote: Enumerating objects: 267, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 267 (delta 18), reused 16 (delta 16), pack-reused 226
Receiving objects: 100% (267/267), 12.33 MiB | 22.16 MiB/s, done.
Resolving deltas: 100% (122/122), done.
/content/GroundingDINO
  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 40.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.9/200.9 kB 14.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 32.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.1/200.1 kB 16.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 62.7 MB/s eta 0:00:00

%cd {
    
    HOME}

import sys
!{
    
    sys.executable} -m pip install 'git+https://github.com/facebookresearch/segment-anything.git'

部分输出结果为：

/content
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/facebookresearch/segment-anything.git
  Cloning https://github.com/facebookresearch/segment-anything.git to /tmp/pip-req-build-e2789hn3
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/segment-anything.git /tmp/pip-req-build-e2789hn3
  Resolved https://github.com/facebookresearch/segment-anything.git to commit 567662b0fd33ca4b022d94d3b8de896628cd32dd
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: segment-anything
  Building wheel for segment-anything (setup.py) ... done
  Created wheel for segment-anything: filename=segment_anything-1.0-py3-none-any.whl size=36610 sha256=ed90a8a550d5948a1b3b9c5f2a173970657a518009512fb20e89dc5ec5240160
  Stored in directory: /tmp/pip-ephem-wheel-cache-xtm37yc6/wheels/d5/11/03/7aca746a2c0e09f279b10436ced7175926bc38f650b736a648
Successfully built segment-anything
Installing collected packages: segment-anything
Successfully installed segment-anything-1.0

注意：为了将演示的所有元素粘合在一起，我们将使用监督 pip 包，它将帮助我们处理、过滤和可视化我们的检测以及保存我们的数据集。

然而，在这个演示中，我们需要最新版本中引入的功能。因此，我们卸载当前的supervsion版本并安装0.6.0版本。

!pip uninstall -y supervision
!pip install -q supervision==0.6.0

import supervision as sv
print(sv.__version__)

输出结果为：

Found existing installation: supervision 0.4.0
Uninstalling supervision-0.4.0:
  Successfully uninstalled supervision-0.4.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
groundingdino 0.1.0 requires supervision==0.4.0, but you have supervision 0.6.0 which is incompatible.
0.6.0

让我们安装 roboflow pip 包。

!pip install -q roboflow

部分结果如下所示：

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.2/56.2 kB 2.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.8/67.8 kB 8.0 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 4.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.8/58.8 kB 6.2 MB/s eta 0:00:00
  Building wheel for wget (setup.py) ... done

1.3 Download Grounding DINO Model Weights

要运行 Grounding DINO，我们需要两个文件 - 配置和模型权重。配置文件是 Grounding DINO 存储库的一部分，我们已经克隆了它。另一方面，我们需要下载权重文件。我们将这两个文件的路径写入 GROUNDING_DINO_CONFIG_PATH 和 GROUNDING_DINO_CHECKPOINT_PATH 变量，并验证路径是否正确以及文件是否存在于磁盘上。

import os

GROUNDING_DINO_CONFIG_PATH = os.path.join(HOME, "GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py")
print(GROUNDING_DINO_CONFIG_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CONFIG_PATH))

输出结果为：

/content/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py ; exist: True

%cd {
    
    HOME}
!mkdir -p {
    
    HOME}/weights
%cd {
    
    HOME}/weights

!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

输出结果为：

/content
/content/weights

import os

GROUNDING_DINO_CHECKPOINT_PATH = os.path.join(HOME, "weights", "groundingdino_swint_ogc.pth")
print(GROUNDING_DINO_CHECKPOINT_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CHECKPOINT_PATH))

输出结果为：

/content/weights/groundingdino_swint_ogc.pth ; exist: True

1.4 Download Segment Anything Model (SAM) Weights

与 Grounding DINO 一样，为了运行 SAM，我们需要一个权重文件，我们必须首先下载该文件。我们将本地权重文件的路径写入 SAM_CHECKPOINT_PATH 变量，并验证路径是否正确以及文件是否存在于磁盘上。

%cd {
    
    HOME}
!mkdir -p {
    
    HOME}/weights
%cd {
    
    HOME}/weights

!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

输出为：

/content
/content/weights

import os

SAM_CHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")
print(SAM_CHECKPOINT_PATH, "; exist:", os.path.isfile(SAM_CHECKPOINT_PATH))

输出结果为：

/content/weights/sam_vit_h_4b8939.pth ; exist: True

1.5 Load models

import torch

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

1.5.1 Load Grounding DINO Model

%cd {
    
    HOME}/GroundingDINO

from groundingdino.util.inference import Model

grounding_dino_model = Model(model_config_path=GROUNDING_DINO_CONFIG_PATH, model_checkpoint_path=GROUNDING_DINO_CHECKPOINT_PATH)

在这里插入图片描述

1.5.2 Load Segment Anything Model (SAM)

SAM_ENCODER_VERSION = "vit_h"

from segment_anything import sam_model_registry, SamPredictor

sam = sam_model_registry[SAM_ENCODER_VERSION](checkpoint=SAM_CHECKPOINT_PATH).to(device=DEVICE)
sam_predictor = SamPredictor(sam)

1.6 Download Example Data

f"{
      
      HOME}/data"

输出为：

/content/data

%cd {
    
    HOME}
!mkdir {
    
    HOME}/data
%cd {
    
    HOME}/data

!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-5.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-6.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-7.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-8.jpeg

输出结果为：

/content
/content/data

1.7 Single Image Mask Auto Annotation

在我们自动注释整个数据集之前，让我们先关注单个图像。

SOURCE_IMAGE_PATH = f"{
      
      HOME}/data/dog-3.jpeg"
CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

1.7.1 Zero-Shot Object Detection with Grounding DINO

注意：为了获得更好的Grounding DINO检测，我们将使用下面定义的enhance_class_name函数来利用一些提示工程。

from typing import List

def enhance_class_name(class_names: List[str]) -> List[str]:
    return [
        f"all {
      
      class_name}s"
        for class_name
        in class_names
    ]

import cv2
import supervision as sv

# load image
image = cv2.imread(SOURCE_IMAGE_PATH)

# detect objects
detections = grounding_dino_model.predict_with_classes(
    image=image,
    classes=enhance_class_name(class_names=CLASSES),
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

# annotate image with detections
box_annotator = sv.BoxAnnotator()
labels = [
    f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
    for _, _, confidence, class_id, _ 
    in detections]
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)

%matplotlib inline
sv.plot_image(annotated_frame, (16, 16))

输出结果为：
在这里插入图片描述

1.7.2 Prompting SAM with detected boxes

import numpy as np
from segment_anything import SamPredictor


def segment(sam_predictor: SamPredictor, image: np.ndarray, xyxy: np.ndarray) -> np.ndarray:
    sam_predictor.set_image(image)
    result_masks = []
    for box in xyxy:
        masks, scores, logits = sam_predictor.predict(
            box=box,
            multimask_output=True
        )
        index = np.argmax(scores)
        result_masks.append(masks[index])
    return np.array(result_masks)

import cv2

# convert detections to masks
detections.mask = segment(
    sam_predictor=sam_predictor,
    image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
    xyxy=detections.xyxy
)

# annotate image with detections
box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()
labels = [
    f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
    for _, _, confidence, class_id, _ 
    in detections]
annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

import math

grid_size_dimension = math.ceil(math.sqrt(len(detections.mask)))

titles = [
    CLASSES[class_id]
    for class_id
    in detections.class_id
]

sv.plot_images_grid(
    images=detections.mask,
    titles=titles,
    grid_size=(grid_size_dimension, grid_size_dimension),
    size=(16, 16)
)

在这里插入图片描述

1.8 Full Dataset Mask Auto Annotation

import os

IMAGES_DIRECTORY = os.path.join(HOME, 'data')
IMAGES_EXTENSIONS = ['jpg', 'jpeg', 'png']

CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear', 'coffee', 'backpack', 'cap']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

1.8.1 Extract labels from images

import cv2
from tqdm.notebook import tqdm

images = {
    
    }
annotations = {
    
    }

image_paths = sv.list_files_with_extensions(
    directory=IMAGES_DIRECTORY, 
    extensions=IMAGES_EXTENSIONS)

for image_path in tqdm(image_paths):
    image_name = image_path.name
    image_path = str(image_path)
    image = cv2.imread(image_path)

    detections = grounding_dino_model.predict_with_classes(
        image=image,
        classes=enhance_class_name(class_names=CLASSES),
        box_threshold=BOX_TRESHOLD,
        text_threshold=TEXT_TRESHOLD
    )
    detections = detections[detections.class_id != None]
    detections.mask = segment(
        sam_predictor=sam_predictor,
        image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
        xyxy=detections.xyxy
    )
    images[image_name] = image
    annotations[image_name] = detections

输出结果为：

在这里插入图片描述
注意：在以 Pascal VOC XML 格式保存检测之前，让我们看一下我们获得的注释。此步骤是可选的，请随意跳过。

plot_images = []
plot_titles = []

box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()

for image_name, detections in annotations.items():
    image = images[image_name]
    plot_images.append(image)
    plot_titles.append(image_name)

    labels = [
        f"{
      
      CLASSES[class_id]} {
      
      confidence:0.2f}" 
        for _, _, confidence, class_id, _ 
        in detections]
    annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
    annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
    plot_images.append(annotated_image)
    title = " ".join(set([
        CLASSES[class_id]
        for class_id
        in detections.class_id
    ]))
    plot_titles.append(title)

sv.plot_images_grid(
    images=plot_images,
    titles=plot_titles,
    grid_size=(len(annotations), 2),
    size=(2 * 4, len(annotations) * 4)
)

在这里插入图片描述

1.8.2 Save labels in Pascal VOC XML

在将注释上传到 Roboflow 之前，我们必须首先将它们保存到硬盘上。为此，我们将使用最新的监督功能之一（在 0.6.0 更新中提供） - 数据集保存。

ANNOTATIONS_DIRECTORY = os.path.join(HOME, 'annotations')

MIN_IMAGE_AREA_PERCENTAGE = 0.002
MAX_IMAGE_AREA_PERCENTAGE = 0.80
APPROXIMATION_PERCENTAGE = 0.75

sv.Dataset(
    classes=CLASSES,
    images=images,
    annotations=annotations
).as_pascal_voc(
    annotations_directory_path=ANNOTATIONS_DIRECTORY,
    min_image_area_percentage=MIN_IMAGE_AREA_PERCENTAGE,
    max_image_area_percentage=MAX_IMAGE_AREA_PERCENTAGE,
    approximation_percentage=APPROXIMATION_PERCENTAGE
)

1.8.3 Upload annotations to Roboflow

现在我们准备使用 API 将注释上传到 Roboflow。

PROJECT_NAME = "auto-generated-dataset-7"
PROJECT_DESCRIPTION = "auto-generated-dataset-7"

import roboflow
from roboflow import Roboflow

roboflow.login()

workspace = Roboflow().workspace()
new_project = workspace.create_project(
    project_name=PROJECT_NAME,
    project_license="MIT",
    project_type="instance-segmentation", 
    annotation=PROJECT_DESCRIPTION)

部分结果如下：

visit https://app.roboflow.com/auth-cli to get your authentication token.
Paste the authentication token here: ··········
loading Roboflow workspace...
loading Roboflow project...

import os

for image_path in tqdm(image_paths):
    image_name = image_path.name
    annotation_name = f"{
      
      image_path.stem}.xml"
    image_path = str(image_path)
    annotation_path = os.path.join(ANNOTATIONS_DIRECTORY, annotation_name)
    new_project.upload(
        image_path=image_path, 
        annotation_path=annotation_path, 
        split="train", 
        is_prediction=True, 
        overwrite=True, 
        tag_names=["auto-annotated-with-grounded-sam"],
        batch_name="auto-annotated-with-grounded-sam"
    )

在这里插入图片描述

1.8.4 Convert Object Detection to Instance Segmentation Dataset

如果您已经拥有数据集，但它带有边界框注释，您可以快速轻松地将其转换为实例分割数据集。 SAM 允许使用框作为提示。

1.8.5 Download Object Detection Dataset from Roboflow

要使用本教程中的数据集，请确保以 Pascal VOC XML 格式下载。我们将使用 BlueBerries 数据集作为示例。

%cd {
    
    HOME}

import roboflow
from roboflow import Roboflow

roboflow.login()

rf = Roboflow()

project = rf.workspace("inkyu-sa-e0c78").project("blueberries-u0e84")
dataset = project.version(1).download("voc")

在这里插入图片描述

dataset.location

位置为：/content/BlueBerries-1

!ls {
    
    dataset.location}

输出为：README.dataset.txt README.roboflow.txt test train valid

1.8.6 Load and Visualize Object Detection Dataset with Supervision

object_detection_dataset = sv.Dataset.from_pascal_voc(
    images_directory_path=f"{
      
      dataset.location}/train",
    annotations_directory_path=f"{
      
      dataset.location}/train"
)

import random
random.seed(9001)

注意：重新运行下面的单元格以查看数据集中的不同图像。

image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)

image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]

box_annotator = sv.BoxAnnotator()

annotated_image = box_annotator.annotate(scene=image.copy(), detections=detections, skip_label=True)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

1.8.7 Run SAM convert Boxes into Masks

from tqdm.notebook import tqdm

for image_name, image in tqdm(object_detection_dataset.images.items()):
    detections = object_detection_dataset.annotations[image_name]
    detections.mask = segment(
        sam_predictor=sam_predictor,
        image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
        xyxy=detections.xyxy
    )

在这里插入图片描述

image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)

image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]

mask_annotator = sv.MaskAnnotator()

annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)

%matplotlib inline
sv.plot_image(annotated_image, (16, 16))

在这里插入图片描述

【计算机视觉 | 目标检测 | 图像分割】Grounding DINO + Segment Anything Model (SAM)源代码分享（含源代码）

文章目录

一、Grounding DINO + Segment Anything Model (SAM)

1.1 开始

1.2 Install Grounding DINO and Segment Anything Model

1.3 Download Grounding DINO Model Weights

1.4 Download Segment Anything Model (SAM) Weights

1.5 Load models

1.5.1 Load Grounding DINO Model

1.5.2 Load Segment Anything Model (SAM)

1.6 Download Example Data

1.7 Single Image Mask Auto Annotation

1.7.1 Zero-Shot Object Detection with Grounding DINO

1.7.2 Prompting SAM with detected boxes

1.8 Full Dataset Mask Auto Annotation

1.8.1 Extract labels from images

1.8.2 Save labels in Pascal VOC XML

1.8.3 Upload annotations to Roboflow

1.8.4 Convert Object Detection to Instance Segmentation Dataset

1.8.5 Download Object Detection Dataset from Roboflow

1.8.6 Load and Visualize Object Detection Dataset with Supervision

1.8.7 Run SAM convert Boxes into Masks

猜你喜欢