Target detection: Converting Camvid semantic labels to bbox labels

foreword

CamVidThe data set is a data set of urban road scenes publicly released by the University of Cambridge. It includes 701 precisely labeled pictures for semantic segmentation. If you want to use the data CamVidset for target detection, you need bboxlabels. This article CamVidprovides the extraction bboxof labels based on semantic labels. The code is convenient for subsequent target detection model training.
insert image description here

Figure 1 bbox extraction effect

Introduction to the Camvid Dataset

CamVidThe dataset is a dataset of urban road scenes publicly released by the University of Cambridge. CamVidFull name: The Cambridge-driving Labeled Video Database, which is the first collection of videos with semantic labels for target categories. The data set includes 701 precisely labeled pictures for semantic segmentation model training, which can be divided into training set, verification set, and test set.

Dataset official download address: CamVid Dataset (cam.ac.uk)

The data example is as follows:insert image description here

Figure 2 picture example

Class label link: CamVid ClassLabel

The database provides 32 ground truthsemantic tags, and the proportion of each category is as follows:

Figure 3 Proportion of CamVid categories

CamvidSemantic labels are RGBpictures, and the colors corresponding to each category name_color_dictare as follows:

name_color_dict={ 'Animal': [64, 128, 64], 
                  'Archway': [192, 0, 128], 
                  'Bicyclist': [0, 128, 192], 
                  'Bridge': [0, 128, 64],
                  'Building': [128, 0, 0], 
                  'Car': [64, 0, 128], 
                  'CartLuggagePram': [64, 0, 192], 
                  'Child': [192, 128, 64],
                  'Column_Pole': [192, 192, 128], 
                  'Fence': [64, 64, 128], 
                  'LaneMkgsDriv': [128, 0, 192],
                  'LaneMkgsNonDriv': [192, 0, 64], 
                  'Misc_Text': [128, 128, 64], 
                  'MotorcycleScooter': [192, 0, 192],
                  'OtherMoving': [128, 64, 64], 
                  'ParkingBlock': [64, 192, 128], 
                  'Pedestrian': [64, 64, 0], 
                  'Road': [128, 64, 128],
                  'RoadShoulder': [128, 128, 192], 
                  'Sidewalk': [0, 0, 192], 
                  'SignSymbol': [192, 128, 128], 
                  'Sky': [128, 128, 128],
                  'SUVPickupTruck': [64, 128, 192], 
                  'TrafficCone': [0, 0, 64], 
                  'TrafficLight': [0, 64, 64], 
                  'Train': [192, 64, 128],
                  'Tree': [128, 128, 0], 
                  'Truck_Bus': [192, 128, 192], 
                  'Tunnel': [64, 0, 64], 
                  'VegetationMisc': [192, 192, 0],
                  'Void': [0, 0, 0], 
                  'Wall': [64, 192, 0],
                 }

Convert semantic tags to bbox tags

Select the category to be extracted names = ['Pedestrian', 'Car', 'Truck_Bus'], camvidand bboxthe code to extract the target detection label based on the semantic segmentation label result is as follows:

import cv2
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

def mask_to_2D(label):
    """ return: semantic_map -> [H, W] """
    color_list = list(name_color_dict.values())
    semantic_map = np.zeros(label.shape[:-1])
    for index, color in enumerate(color_list):
        equality = np.equal(label, color)
        class_map = np.all(equality, axis=-1)
        semantic_map[class_map] = index
    return semantic_map

def draw_box(img, boxes, colors):
    """ plots one bounding box on image img """
    for box, color in zip(boxes, colors):
        cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), color, thickness=2, lineType=cv2.LINE_AA)
    plt.imshow(img)
    plt.axis('off')
    plt.show()

def get_bbox(label_file, names):
    """ get bbox from semantic label """
    # convert RGB mask to 2D mask
    mask = np.array(Image.open(label_file))
    mask_2D = mask_to_2D(mask)
    mask_to_save = np.zeros_like(mask_2D)
    # instances are encoded as different colors
    obj_ids = np.unique(mask_2D)
    # split the color-encoded mask into a set of binary masks
    masks = mask_2D == obj_ids[:, None, None]
    # get bounding box coordinates for each mask
    num_objs = len(obj_ids)
    boxes, colors = [], []
    for i in range(num_objs):
        id = obj_ids[i]
        name = list(name_color_dict.keys())[int(id)]
        if name in names:
            binary = masks[i].astype(np.int8)
            num_labels, labels = cv2.connectedComponents(binary, connectivity=8, ltype=cv2.CV_16U)
            for id_label in range(1, num_labels):
                temp_mask = labels == id_label
                pos = np.where(temp_mask)
                xmin = np.min(pos[1])
                xmax = np.max(pos[1])
                ymin = np.min(pos[0])
                ymax = np.max(pos[0])
                 # filter result by setting threshold of width and hegith: 20
                if (xmax - xmin) > 20 and (ymax - ymin) > 20:
                    boxes.append([xmin, ymin, xmax, ymax])
                    color = list(name_color_dict.values())[int(id)]
                    colors.append(color)
                    mask_to_save[pos] = id_label

    # draw mask and bbox
    draw_box(mask, boxes, colors)


if __name__ == '__main__':
    names = ['Pedestrian', 'Car', 'Truck_Bus']
    label_file = "camvid/labels/0001TP_006690_L.png"
    label = np.array(Image.open(label_file))
    get_bbox(label_file, names)

According to the semantic label extraction results are as follows, the target boxes of bboxpedestrians and vehicles are effectively extracted :bboxinsert image description here

Figure 4 Extraction results

After getting bboxthe labels, you can use camvidthe data set to do semantic segmentation and target detection at the same time.


reference

(1) CamVid 官网: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
(2) Segmentation and Recognition Using Structure from Motion Point Clouds, ECCV 2008
(3) Semantic Object Classes in Video: A High-Definition Ground Truth Database

Guess you like

Origin blog.csdn.net/weixin_46142822/article/details/106027202