foreword
CamVid
The data set is a data set of urban road scenes publicly released by the University of Cambridge. It includes 701 precisely labeled pictures for semantic segmentation. If you want to use the data CamVid
set for target detection, you need bbox
labels. This article CamVid
provides the extraction bbox
of labels based on semantic labels. The code is convenient for subsequent target detection model training.
Introduction to the Camvid Dataset
CamVid
The dataset is a dataset of urban road scenes publicly released by the University of Cambridge. CamVid
Full name: The Cambridge-driving Labeled Video Database, which is the first collection of videos with semantic labels for target categories. The data set includes 701 precisely labeled pictures for semantic segmentation model training, which can be divided into training set, verification set, and test set.
Dataset official download address: CamVid Dataset (cam.ac.uk)
The data example is as follows:
Class label link: CamVid ClassLabel
The database provides 32 ground truth
semantic tags, and the proportion of each category is as follows:
Camvid
Semantic labels are RGB
pictures, and the colors corresponding to each category name_color_dict
are as follows:
name_color_dict={ 'Animal': [64, 128, 64],
'Archway': [192, 0, 128],
'Bicyclist': [0, 128, 192],
'Bridge': [0, 128, 64],
'Building': [128, 0, 0],
'Car': [64, 0, 128],
'CartLuggagePram': [64, 0, 192],
'Child': [192, 128, 64],
'Column_Pole': [192, 192, 128],
'Fence': [64, 64, 128],
'LaneMkgsDriv': [128, 0, 192],
'LaneMkgsNonDriv': [192, 0, 64],
'Misc_Text': [128, 128, 64],
'MotorcycleScooter': [192, 0, 192],
'OtherMoving': [128, 64, 64],
'ParkingBlock': [64, 192, 128],
'Pedestrian': [64, 64, 0],
'Road': [128, 64, 128],
'RoadShoulder': [128, 128, 192],
'Sidewalk': [0, 0, 192],
'SignSymbol': [192, 128, 128],
'Sky': [128, 128, 128],
'SUVPickupTruck': [64, 128, 192],
'TrafficCone': [0, 0, 64],
'TrafficLight': [0, 64, 64],
'Train': [192, 64, 128],
'Tree': [128, 128, 0],
'Truck_Bus': [192, 128, 192],
'Tunnel': [64, 0, 64],
'VegetationMisc': [192, 192, 0],
'Void': [0, 0, 0],
'Wall': [64, 192, 0],
}
Convert semantic tags to bbox tags
Select the category to be extracted names = ['Pedestrian', 'Car', 'Truck_Bus']
, camvid
and bbox
the code to extract the target detection label based on the semantic segmentation label result is as follows:
import cv2
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
def mask_to_2D(label):
""" return: semantic_map -> [H, W] """
color_list = list(name_color_dict.values())
semantic_map = np.zeros(label.shape[:-1])
for index, color in enumerate(color_list):
equality = np.equal(label, color)
class_map = np.all(equality, axis=-1)
semantic_map[class_map] = index
return semantic_map
def draw_box(img, boxes, colors):
""" plots one bounding box on image img """
for box, color in zip(boxes, colors):
cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), color, thickness=2, lineType=cv2.LINE_AA)
plt.imshow(img)
plt.axis('off')
plt.show()
def get_bbox(label_file, names):
""" get bbox from semantic label """
# convert RGB mask to 2D mask
mask = np.array(Image.open(label_file))
mask_2D = mask_to_2D(mask)
mask_to_save = np.zeros_like(mask_2D)
# instances are encoded as different colors
obj_ids = np.unique(mask_2D)
# split the color-encoded mask into a set of binary masks
masks = mask_2D == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes, colors = [], []
for i in range(num_objs):
id = obj_ids[i]
name = list(name_color_dict.keys())[int(id)]
if name in names:
binary = masks[i].astype(np.int8)
num_labels, labels = cv2.connectedComponents(binary, connectivity=8, ltype=cv2.CV_16U)
for id_label in range(1, num_labels):
temp_mask = labels == id_label
pos = np.where(temp_mask)
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
# filter result by setting threshold of width and hegith: 20
if (xmax - xmin) > 20 and (ymax - ymin) > 20:
boxes.append([xmin, ymin, xmax, ymax])
color = list(name_color_dict.values())[int(id)]
colors.append(color)
mask_to_save[pos] = id_label
# draw mask and bbox
draw_box(mask, boxes, colors)
if __name__ == '__main__':
names = ['Pedestrian', 'Car', 'Truck_Bus']
label_file = "camvid/labels/0001TP_006690_L.png"
label = np.array(Image.open(label_file))
get_bbox(label_file, names)
According to the semantic label extraction results are as follows, the target boxes of bbox
pedestrians and vehicles are effectively extracted :bbox
After getting bbox
the labels, you can use camvid
the data set to do semantic segmentation and target detection at the same time.
reference
(1) CamVid 官网: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
(2) Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
(3) Semantic Object Classes in Video: A High-Definition Ground Truth Database