Computer Vision - [Dataset] MOT17, COCO data input format, dataset visualization script

Written in the front: The purpose of this blog post is to 1. Clarify the meaning of the files and their contents in the MOT17 dataset; 2. The difference between the COCO data input format and the YOLO data input format and the VOC data input format; 3. Provide a dataset visualization The script can select a data set, visualize the groundtruth in the data set on jpg and generate a video playback.

Track1

The meaning of the data format:

<camera_id> <obj_id> <frame_id> <xmin> <ymin> <width> <height> <xworld> <yworld>

MOT17 dataset

Dataset download: https://pan.baidu.com/s/1TtKOUdcACLXBzS9L3lmE0A?pwd=67ey Extraction code: 67ey
Reference blog: Multi-target tracking dataset: mot16, mot17 dataset introduction and multi-target tracking index evaluation

Dataset introduction

As shown below, the file structure in this dataset is shown in the figure. MOT17 has 21 training sets and 21 test sets.
Please add a picture description

Training set

the

/det The folder in the training set is information for detection . There is only one det.txt file in this directory, with one label per line, representing a detected object.
The meaning of each line of annotation is as follows: the first one represents the first few frames, the second represents the track number (because the detection result only depends on the quality of the detection frame, not the id, so it is =-1.), the first 4 numbers represent idthe bbobject The coordinates and length and width of the upper left corner of the box. confRepresents the confidence level, the last three are the content used by MOT3D, and the 2D detection is always -1.

<frame>, -1, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <3D_x>, <3D_y>, <3D_z> 

gt

/gt The folder in the training set is for tracking information. There is only one gt.txt file in this directory, with one label per line, representing a detected object.
The meaning of each line of labeling is as follows: the first one represents the number of frames, the second value is the ID number of the target trajectory, the first four numbers bbrepresent the coordinates and length and width of the upper left corner of the object frame, and the seventh value is the target trajectory Whether to enter the flag in the consideration range, 0 means ignore, 1 means active. The eighth value is the target type corresponding to the trajectory (see the label-ID correspondence in the table below for the type), and the ninth value is the visibility ratio of the box, indicating that the target is included/covered by other target boxes or between targets when moving. Clipping of box edges.

<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <trajectory_conf>, <trajectory_type>, <visibility_ratio> 

COCO data input format and YOLO data input format and VOC data input format

Reference blog: VOC/YOLO/COCO dataset format conversion and LabelImg/Labelme/Elf labeling assistant Colabeler labeling tool introduces the
VOC label format, the labeled label is stored in the xml file in the
YOLO label format, and the labeled label is stored in the txt file in the
COCO label format , the labeled labels are stored in the json file

Dataset Visualization Script

Visualization in yolo format

import cv2
import os

label_path = '*.txt'
pic_path = '*.bmp'

img_gray = cv2.imread(pic_path)
width = img_gray.shape[1]
height = img_gray.shape[0]
label = []
if os.path.exists(label_path):
    label = []
    with open(label_path, 'r') as label_f:
        for line in label_f.readlines():
            txt_list = line.split(' ')
            print('txt_list',txt_list)
            norm_x = float(txt_list[1])
            norm_y = float(txt_list[2])
            norm_w = float(txt_list[3])
            norm_h = float(txt_list[4])
            xmin = int(width * (norm_x - 0.5 * norm_w))
            ymin = int(height * (norm_y - 0.5 * norm_h))
            xmax = int(width * (norm_x + 0.5 * norm_w))
            ymax = int(height * (norm_y + 0.5 * norm_h))
            label.append(ymin)
            label.append(xmin)
            label.append(ymax)
            label.append(xmax)  # [x1, x2, y1, y2]--> [212, 324, 296, 390]
    print('label',label)
    cv2.rectangle(img_gray, (label[1], label[0]),
                  (label[3], label[2]),
                  (120, 255, 120), 1)
cv2.imshow('vis', img_gray)
cv2.waitKey(0)

Guess you like

Origin blog.csdn.net/qq_42312574/article/details/128972328