Article directory
- 一、Grounding DINO + Segment Anything Model (SAM)
-
- 1.1 Getting Started
- 1.2 Install Grounding DINO and Segment Anything Model
- 1.3 Download Grounding DINO Model Weights
- 1.4 Download Segment Anything Model (SAM) Weights
- 1.5 Load models
- 1.6 Download Example Data
- 1.7 Single Image Mask Auto Annotation
- 1.8 Full Dataset Mask Auto Annotation
-
- 1.8.1 Extract labels from images
- 1.8.2 Save labels in Pascal VOC XML
- 1.8.3 Upload annotations to Roboflow
- 1.8.4 Convert Object Detection to Instance Segmentation Dataset
- 1.8.5 Download Object Detection Dataset from Roboflow
- 1.8.6 Load and Visualize Object Detection Dataset with Supervision
- 1.8.7 Run SAM convert Boxes into Masks
一、Grounding DINO + Segment Anything Model (SAM)
In this tutorial, we'll learn how to automatically annotate images using two groundbreaking models - Grounding DINO and Segment Anything Model (SAM).
We can then use this dataset to train real-time object detection or instance segmentation models. Annotating images using polygons in the traditional way is extremely time-consuming and expensive. With Grounding DINO and SAM, the initial annotation takes only a few minutes and our work is reduced to manual verification of the obtained labels.
1.1 Getting Started
Let's make sure we have access to the GPU. We can do this using the nvidia-smi command.
!nvidia-smi
Tue Sep 19 21:10:12 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 53C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Note: To make it easier for us to manage datasets, images, and models, we created a HOME constant.
import os
HOME = os.getcwd()
print("HOME:", HOME)
The output is:
HOME: /content
1.2 Install Grounding DINO and Segment Anything Model
Our project will use two groundbreaking designs: Grounding DINO (for zero-shot detection) and Segment Anything Model (SAM) (for converting boxes into segments). We have to install them first.
%cd {
HOME}
!git clone https://github.com/IDEA-Research/GroundingDINO.git
%cd {
HOME}/GroundingDINO
!git checkout -q 57535c5a79791cb76e36fdb64975271354f10251
!pip install -q -e .
Part of the output is:
/content
Cloning into 'GroundingDINO'...
remote: Enumerating objects: 267, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 267 (delta 18), reused 16 (delta 16), pack-reused 226
Receiving objects: 100% (267/267), 12.33 MiB | 22.16 MiB/s, done.
Resolving deltas: 100% (122/122), done.
/content/GroundingDINO
Preparing metadata (setup.py) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 40.4 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.9/200.9 kB 14.2 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 32.1 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.1/200.1 kB 16.4 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 62.7 MB/s eta 0:00:00
%cd {
HOME}
import sys
!{
sys.executable} -m pip install 'git+https://github.com/facebookresearch/segment-anything.git'
Part of the output is:
/content
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/facebookresearch/segment-anything.git
Cloning https://github.com/facebookresearch/segment-anything.git to /tmp/pip-req-build-e2789hn3
Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/segment-anything.git /tmp/pip-req-build-e2789hn3
Resolved https://github.com/facebookresearch/segment-anything.git to commit 567662b0fd33ca4b022d94d3b8de896628cd32dd
Preparing metadata (setup.py) ... done
Building wheels for collected packages: segment-anything
Building wheel for segment-anything (setup.py) ... done
Created wheel for segment-anything: filename=segment_anything-1.0-py3-none-any.whl size=36610 sha256=ed90a8a550d5948a1b3b9c5f2a173970657a518009512fb20e89dc5ec5240160
Stored in directory: /tmp/pip-ephem-wheel-cache-xtm37yc6/wheels/d5/11/03/7aca746a2c0e09f279b10436ced7175926bc38f650b736a648
Successfully built segment-anything
Installing collected packages: segment-anything
Successfully installed segment-anything-1.0
Note: To glue all the elements of the demo together we will be using the supervised pip package which will help us process, filter and visualize our detections as well as save our dataset.
However, in this demo we need the features introduced in the latest version. Therefore, we uninstall the current supervsion version and install version 0.6.0.
!pip uninstall -y supervision
!pip install -q supervision==0.6.0
import supervision as sv
print(sv.__version__)
The output is:
Found existing installation: supervision 0.4.0
Uninstalling supervision-0.4.0:
Successfully uninstalled supervision-0.4.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
groundingdino 0.1.0 requires supervision==0.4.0, but you have supervision 0.6.0 which is incompatible.
0.6.0
Let's install the roboflow pip package.
!pip install -q roboflow
Some results are shown below:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.2/56.2 kB 2.6 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.8/67.8 kB 8.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 4.9 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.8/58.8 kB 6.2 MB/s eta 0:00:00
Building wheel for wget (setup.py) ... done
1.3 Download Grounding DINO Model Weights
To run Grounding DINO we need two files - configuration and model weights. The configuration file is part of the Grounding DINO repository, which we have cloned. On the other hand, we need to download the weights file. We write the paths to these two files into the GROUNDING_DINO_CONFIG_PATH and GROUNDING_DINO_CHECKPOINT_PATH variables and verify that the paths are correct and that the files exist on disk.
import os
GROUNDING_DINO_CONFIG_PATH = os.path.join(HOME, "GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py")
print(GROUNDING_DINO_CONFIG_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CONFIG_PATH))
The output is:
/content/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py ; exist: True
%cd {
HOME}
!mkdir -p {
HOME}/weights
%cd {
HOME}/weights
!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
The output is:
/content
/content/weights
import os
GROUNDING_DINO_CHECKPOINT_PATH = os.path.join(HOME, "weights", "groundingdino_swint_ogc.pth")
print(GROUNDING_DINO_CHECKPOINT_PATH, "; exist:", os.path.isfile(GROUNDING_DINO_CHECKPOINT_PATH))
The output is:
/content/weights/groundingdino_swint_ogc.pth ; exist: True
1.4 Download Segment Anything Model (SAM) Weights
As with Grounding DINO, in order to run SAM we need a weights file, which we must first download. We write the path to the local weight file into the SAM_CHECKPOINT_PATH variable and verify that the path is correct and that the file exists on disk.
%cd {
HOME}
!mkdir -p {
HOME}/weights
%cd {
HOME}/weights
!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
The output is:
/content
/content/weights
import os
SAM_CHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")
print(SAM_CHECKPOINT_PATH, "; exist:", os.path.isfile(SAM_CHECKPOINT_PATH))
The output is:
/content/weights/sam_vit_h_4b8939.pth ; exist: True
1.5 Load models
import torch
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
1.5.1 Load Grounding DINO Model
%cd {
HOME}/GroundingDINO
from groundingdino.util.inference import Model
grounding_dino_model = Model(model_config_path=GROUNDING_DINO_CONFIG_PATH, model_checkpoint_path=GROUNDING_DINO_CHECKPOINT_PATH)
1.5.2 Load Segment Anything Model (SAM)
SAM_ENCODER_VERSION = "vit_h"
from segment_anything import sam_model_registry, SamPredictor
sam = sam_model_registry[SAM_ENCODER_VERSION](checkpoint=SAM_CHECKPOINT_PATH).to(device=DEVICE)
sam_predictor = SamPredictor(sam)
1.6 Download Example Data
f"{
HOME}/data"
The output is:
/content/data
%cd {
HOME}
!mkdir {
HOME}/data
%cd {
HOME}/data
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-5.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-6.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-7.jpeg
!wget -q https://media.roboflow.com/notebooks/examples/dog-8.jpeg
The output is:
/content
/content/data
1.7 Single Image Mask Auto Annotation
Before we automatically annotate the entire dataset, let's focus on a single image.
SOURCE_IMAGE_PATH = f"{
HOME}/data/dog-3.jpeg"
CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25
1.7.1 Zero-Shot Object Detection with Grounding DINO
NOTE: For better Grounding DINO detection we will take advantage of some hint engineering using the enhance_class_name function defined below.
from typing import List
def enhance_class_name(class_names: List[str]) -> List[str]:
return [
f"all {
class_name}s"
for class_name
in class_names
]
import cv2
import supervision as sv
# load image
image = cv2.imread(SOURCE_IMAGE_PATH)
# detect objects
detections = grounding_dino_model.predict_with_classes(
image=image,
classes=enhance_class_name(class_names=CLASSES),
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
# annotate image with detections
box_annotator = sv.BoxAnnotator()
labels = [
f"{
CLASSES[class_id]} {
confidence:0.2f}"
for _, _, confidence, class_id, _
in detections]
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)
%matplotlib inline
sv.plot_image(annotated_frame, (16, 16))
The output is:
1.7.2 Prompting SAM with detected boxes
import numpy as np
from segment_anything import SamPredictor
def segment(sam_predictor: SamPredictor, image: np.ndarray, xyxy: np.ndarray) -> np.ndarray:
sam_predictor.set_image(image)
result_masks = []
for box in xyxy:
masks, scores, logits = sam_predictor.predict(
box=box,
multimask_output=True
)
index = np.argmax(scores)
result_masks.append(masks[index])
return np.array(result_masks)
import cv2
# convert detections to masks
detections.mask = segment(
sam_predictor=sam_predictor,
image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
xyxy=detections.xyxy
)
# annotate image with detections
box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()
labels = [
f"{
CLASSES[class_id]} {
confidence:0.2f}"
for _, _, confidence, class_id, _
in detections]
annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
%matplotlib inline
sv.plot_image(annotated_image, (16, 16))
import math
grid_size_dimension = math.ceil(math.sqrt(len(detections.mask)))
titles = [
CLASSES[class_id]
for class_id
in detections.class_id
]
sv.plot_images_grid(
images=detections.mask,
titles=titles,
grid_size=(grid_size_dimension, grid_size_dimension),
size=(16, 16)
)
1.8 Full Dataset Mask Auto Annotation
import os
IMAGES_DIRECTORY = os.path.join(HOME, 'data')
IMAGES_EXTENSIONS = ['jpg', 'jpeg', 'png']
CLASSES = ['car', 'dog', 'person', 'nose', 'chair', 'shoe', 'ear', 'coffee', 'backpack', 'cap']
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25
1.8.1 Extract labels from images
import cv2
from tqdm.notebook import tqdm
images = {
}
annotations = {
}
image_paths = sv.list_files_with_extensions(
directory=IMAGES_DIRECTORY,
extensions=IMAGES_EXTENSIONS)
for image_path in tqdm(image_paths):
image_name = image_path.name
image_path = str(image_path)
image = cv2.imread(image_path)
detections = grounding_dino_model.predict_with_classes(
image=image,
classes=enhance_class_name(class_names=CLASSES),
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
detections = detections[detections.class_id != None]
detections.mask = segment(
sam_predictor=sam_predictor,
image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
xyxy=detections.xyxy
)
images[image_name] = image
annotations[image_name] = detections
The output is:
Note: Before saving the detection in Pascal VOC XML format, let's take a look at the annotations we obtained. This step is optional, feel free to skip it.
plot_images = []
plot_titles = []
box_annotator = sv.BoxAnnotator()
mask_annotator = sv.MaskAnnotator()
for image_name, detections in annotations.items():
image = images[image_name]
plot_images.append(image)
plot_titles.append(image_name)
labels = [
f"{
CLASSES[class_id]} {
confidence:0.2f}"
for _, _, confidence, class_id, _
in detections]
annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
plot_images.append(annotated_image)
title = " ".join(set([
CLASSES[class_id]
for class_id
in detections.class_id
]))
plot_titles.append(title)
sv.plot_images_grid(
images=plot_images,
titles=plot_titles,
grid_size=(len(annotations), 2),
size=(2 * 4, len(annotations) * 4)
)
1.8.2 Save labels in Pascal VOC XML
Before uploading annotations to Roboflow, we must first save them to the hard drive. To do this, we will use one of the latest supervision features (available in the 0.6.0 update) - dataset saving.
ANNOTATIONS_DIRECTORY = os.path.join(HOME, 'annotations')
MIN_IMAGE_AREA_PERCENTAGE = 0.002
MAX_IMAGE_AREA_PERCENTAGE = 0.80
APPROXIMATION_PERCENTAGE = 0.75
sv.Dataset(
classes=CLASSES,
images=images,
annotations=annotations
).as_pascal_voc(
annotations_directory_path=ANNOTATIONS_DIRECTORY,
min_image_area_percentage=MIN_IMAGE_AREA_PERCENTAGE,
max_image_area_percentage=MAX_IMAGE_AREA_PERCENTAGE,
approximation_percentage=APPROXIMATION_PERCENTAGE
)
1.8.3 Upload annotations to Roboflow
Now we are ready to upload the annotations to Roboflow using the API.
PROJECT_NAME = "auto-generated-dataset-7"
PROJECT_DESCRIPTION = "auto-generated-dataset-7"
import roboflow
from roboflow import Roboflow
roboflow.login()
workspace = Roboflow().workspace()
new_project = workspace.create_project(
project_name=PROJECT_NAME,
project_license="MIT",
project_type="instance-segmentation",
annotation=PROJECT_DESCRIPTION)
Some results are as follows:
visit https://app.roboflow.com/auth-cli to get your authentication token.
Paste the authentication token here: ··········
loading Roboflow workspace...
loading Roboflow project...
import os
for image_path in tqdm(image_paths):
image_name = image_path.name
annotation_name = f"{
image_path.stem}.xml"
image_path = str(image_path)
annotation_path = os.path.join(ANNOTATIONS_DIRECTORY, annotation_name)
new_project.upload(
image_path=image_path,
annotation_path=annotation_path,
split="train",
is_prediction=True,
overwrite=True,
tag_names=["auto-annotated-with-grounded-sam"],
batch_name="auto-annotated-with-grounded-sam"
)
1.8.4 Convert Object Detection to Instance Segmentation Dataset
If you already have a dataset but it has bounding box annotations, you can quickly and easily convert it into an instance segmentation dataset. SAM allows the use of boxes as prompts.
1.8.5 Download Object Detection Dataset from Roboflow
To use the dataset in this tutorial, be sure to download it in Pascal VOC XML format. We will use the BlueBerries dataset as an example.
%cd {
HOME}
import roboflow
from roboflow import Roboflow
roboflow.login()
rf = Roboflow()
project = rf.workspace("inkyu-sa-e0c78").project("blueberries-u0e84")
dataset = project.version(1).download("voc")
dataset.location
The location is:/content/BlueBerries-1
!ls {
dataset.location}
The output is:README.dataset.txt README.roboflow.txt test train valid
1.8.6 Load and Visualize Object Detection Dataset with Supervision
object_detection_dataset = sv.Dataset.from_pascal_voc(
images_directory_path=f"{
dataset.location}/train",
annotations_directory_path=f"{
dataset.location}/train"
)
import random
random.seed(9001)
Note: Rerun the cell below to see different images in the dataset.
image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)
image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]
box_annotator = sv.BoxAnnotator()
annotated_image = box_annotator.annotate(scene=image.copy(), detections=detections, skip_label=True)
%matplotlib inline
sv.plot_image(annotated_image, (16, 16))
1.8.7 Run SAM convert Boxes into Masks
from tqdm.notebook import tqdm
for image_name, image in tqdm(object_detection_dataset.images.items()):
detections = object_detection_dataset.annotations[image_name]
detections.mask = segment(
sam_predictor=sam_predictor,
image=cv2.cvtColor(image, cv2.COLOR_BGR2RGB),
xyxy=detections.xyxy
)
image_names = list(object_detection_dataset.images.keys())
image_name = random.choice(image_names)
image = object_detection_dataset.images[image_name]
detections = object_detection_dataset.annotations[image_name]
mask_annotator = sv.MaskAnnotator()
annotated_image = mask_annotator.annotate(scene=image.copy(), detections=detections)
%matplotlib inline
sv.plot_image(annotated_image, (16, 16))