I shared with you the multi-target tracking task and 5 related data sets before . The TAO data set mentioned in it is very popular with friends, so I will take you to take a look today.

Table of contents

3. Dataset task definition and introduction

4. Interpretation of data set file structure

5. Dataset download link

1. Dataset Introduction

Published by: Carnegie Mellon University, Inria, Argo AI

Release time: 2020

background:

In the field of multi-target tracking, categories often come from specific categories (vehicles, pedestrians, animals, etc.) in autonomous driving and video surveillance, ignoring most objects in the real world.

As we all know, large-scale data sets with rich categories such as COCO have greatly promoted the development of the field of target detection. Therefore, scholars from CMU and other units have launched a COCO-like diverse MOT data set (TAO) for tracking any objects, in order to change the status quo of multi-target tracking.

Introduction:

The TAO dataset consists of 2907 high-definition videos in different environments, with an average length of 36.8 seconds, and contains 833 categories. Compared with the existing public target tracking datasets, the TAO dataset has more types of samples.

2. Dataset details

1. Label the amount of data

Training set: 500 videos, including 534092 jpg images

Test set: 1,419 videos, including 1,502,450 jpg images

Validation set: 993 videos, including 1,045,668 jpg images

2. Labeling category

There are 833 categories of objects labeled in the dataset: 488 of them are the same as the LVIS dataset; the dataset contains a total of 2907 high-definition videos of different environments; the data in TAO comes from 7 other datasets: ArgoVerse, BDD, Charades , LaSOT, YFCC100M, AVA, HACS.

3. Visualization

The following is the tagging results of some samples of TAO's original video. The original video data comes from three datasets: Charades, LaSOT and ArgoVerse.

Original video annotation result

Car-Original Video Annotation Result

Indoor-Original Video Annotation Result

3. Dataset task definition and introduction

multiple target tracking

● Task definition

Identify and track multiple objects in a given video.

● Evaluation

a. Evaluation process

● In the first $t$ frame, the ground truth is $o_{1}, o_{2}, \ldots, o_{n}$ , and the prediction result hypothesis is $h_{1}, h_{2}, \ldots, h_{m}$ , each of which $o_{i} (1 \le i \le n)$ belongs $h_{j} (1 \le j \le m)$ to a specific trajectory, that is, it may have appeared in several previous frames.

● For $o_{i}, h_{j} (1 \le i \le n, 1 \le j \le m)$ , if the intersection and union ratio $\textrm{IOU}(o_i, h_j) \ge t_{d}$ , then the two can establish a relationship (relationship), where $t_{d}$ is the preset threshold, in TAO is $t_{d} = 0.5$ .

● If it is matched $o_{i}, h_{j}$ in the first frame and the association can be established in the first frame, then the first frame will be matched directly. $t-1$ $t$ $t$

● For the remaining unmatched ground truth and hypothesis, based on the established correlation, calculate the maximum bipartite matching.

● The unmatched ground truth is False Negative, and the unmatched hypothesis is False Positive.

● If $o_{i}$ it $h_{j}$ matches and the last matching object is $h_{k} \neq h_{j}$ , an identity switch (IDSW) occurs $o_{i}$ in the first frame. $t$

b. Evaluation indicators

Evaluations used in the TAO competition refer to:

https://motchallenge.net/results/TAO_Challenge/

4. Interpretation of data set file structure

1. Directory structure

dataset_root/
├── train/                                                  #训练数据集，包含七个数据集ArgoVerse,BDD,Charades,LaSOT,YFCC100M，AVA,HACS总共包含352504个文件
│   ├── ArgoVerse/                                          #数据集ArgoVerse，包含37个视频文件，总共21967个jpg文件
│   │   ├── rear_right_26d141ec-f952-3908-b4cc-ae359377424e/#固定时间的1个视频文件，此文件中有451个jpg（帧）文件
│   │   │   ├── ring_rear_right_315970942123500864.jpg      #*.jpg: 图片文件（帧）
│   │   │   └── ...
│   │   ├── side_right_84c35ea7-1a99-3a0c-a3ea-c5915d68acbc/#固定时间的1个视频文件，此文件中有871个jpg（帧）文件
│   │   │   └── ...
│   │   └── ...
│   ├── BDD/                                                #数据集BDD，包含54个视频文件，总共84294个jpg文件
│   │   ├── b1d0a191-03dcecc2/                              #固定时间的1个视频文件，此文件中有1171个jpg（帧）文件
│   │   │   ├── frame0001.jpg                               #*.jpg: 图片文件（帧）
│   │   │   └── ...
│   │   ├── b1f85377-44885085/                              #固定时间的1个视频文件，此文件中有2341个jpg文件件
│   │   │   └── ...
│   │   └── ...
│   └── ...                                                 
├── test/                                                   #测试数据集，包含七个数据集ArgoVerse,BDD,Charades,LaSOT,YFCC100M，AVA,HACS总共包含982754个文件
│   ├── ArgoVerse/                                          #数据集ArgoVerse，包含112个视频文件，总共62572个jpg文件
│   │   ├── 08aa8ed7-386d-373e-a56a-f01444a8d7e5/           #固定时间的1个视频文件，此文件中有451个jpg文件
│   │   │   └── ...
│   │   ├── rear_right_0f0d7759-fa6e-3296-b528-6c862d061bdd/#固定时间的1个视频文件，此文件中有451个jpg文件
│   │   │   └── ...
│   │   └── ...
│   └── ...                                                 
├── val/                                                    #验证数据集，包含七个数据集ArgoVerse,BDD,Charades,LaSOT,YFCC100M，AVA,HACS总共698485个文件
│   ├── ArgoVerse/                                          #数据集ArgoVerse，包含74个视频文件，总共44474个jpg文件
│   │   ├── 00c561b9-2057-358d-82c6-5b06d76cebcf/           #固定时间的1个视频文件，此文件中有901个jpg文件
│   │   │   ├── ring_front_center_315969629022515560.jpg    #*.jpg: 图片文件（帧）
│   │   │   └── ...
│   │   ├── rear_right_028d5cb1-f74d-366c-85ad-84fde69b0fd3/#固定时间的1个视频文件，此文件中有451个jpg文件
│   │   │   ├── ring_rear_right_315972720027556456.jpg      #*.jpg: 图片文件（帧）
│   │   │   └── ...
│   │   └── ...
│   └── ...                                                 
└── annotations                                             #标注文件
    ├── checksums                 
    │   ├──train_checksums.json                             #训练集所有目录文件及目录文件下的所有jpg文件
    │   └──validation_checksums.json                        #验证集所有目录文件及目录文件下的所有jpg文件
    ├──video-lists                                          #视频文件列表
    │   ├──README.md
    │   ├──tao_argoverse.txt                                #关于数据集ArgoVerse的视频文件名
    │   ├──tao_ava.txt                                      #关于数据集AVA的视频文件名
    │   ├──tao_bdd.txt                                      #关于数据集BDD的视频文件名
    │   ├──tao_charades.txt                                 #关于数据集Charades的视频文件名
    │   ├──tao_charades_subjects.txt                        #关于数据集Charades中主题的视频文件名
    │   ├──tao_hacs.txt                                     #关于数据集HACS的视频文件名
    │   ├──tao_lasot.txt                                    #关于数据集LaSOT的视频文件名
    │   └──tao_yfcc.txt                                     #关于数据集YFCC100M的视频文件名
    ├──README.md
    ├──test_categories.json                                 #测试数据只有test_without_annotations.json中的categories信息
    ├──test_without_annotations.json                        #测试数据标注文件信息
    ├──train.json                                           #训练数据集的标注文件信息
    ├──train_with_freeform.json                             #训练数据freeform形式
    ├──validation.json                                      #验证数据集标注文件信息
    └──validation_with_freeform.json                        #验证数据freeform形式

2. Labeling documents and other document descriptions

● The marked json file format is as follows:

Taking the train.json file as an example, other test and validation annotation files have similar information. The content of the annotation file in train.json is a dictionary data structure, including the following seven fields: videos, annotations, tracks, images, info, categories, licenses.

Annotated examples of training data are as follows:

{ 
   "videos": [
        {
            "id": 0,
            "height": 425,
            "width": 640,
            "date_captured": "2013-11-19 21:22:42",
            "neg_category_ids": [
                342,
                57,
                651,
                357,
                738
            ],
            "name": "train/YFCC100M/v_f69ebe5b731d3e87c1a3992ee39c3b7e",
            "not_exhaustive_category_ids": [805,95],
            "metadata":: {"dataset": "YFCC100M", "user_id": "22634709@N00", "username": "Amsterdamized"}
        },
        ...
    ],
    "annotations": [
        {
            "bbox": [
                114,
                166,
                67,
                71
            ],
            "category_id": 95,
            "iscrowd":0,
            "image_id": 0,
            "id": 0,
            "track_id":0,
            "_scale_uuid":"5a32709e-44a0-47b9-85af-b01286adea67",
            "scale_category":"moving object",
            "video_id": 0,
            "segmentation": [
                [
                    114,
                    166,
                    181,
                    166,
                    181,
                    237,
                    114,
                    237
                ]
            ],
            "area": 4757
        },
        ...
    ],
    "tracks": [
        {
            "id":0,
            "category_id":95,
            "video_id":0
        }
    ],
    "images": [
        {
            "video": "train/YFCC100M/v_f69ebe5b731d3e87c1a3992ee39c3b7e",
            "_scale_task_id": "5de800eddb2c18001a56aa11",
            "id": 0,
            "license": 0,
            "height": 480,
            "width": 640,
            "file_name": "train/YFCC100M/v_f69ebe5b731d3e87c1a3992ee39c3b7e/frame0391.jpg",
            "frame_index":390,
            "license": 0,
            "video_id": 0
        },
        ...
    ],
    "info": {
        "year": 2020,
        "version": "0.1.20200120",
        "description": "Annotationimported from Scale",
        "contributor": " ",
        "url": " ",
        "data_created":""2020-01-20 15:49:53.519740""
    },
    "categories": [
        {
            "frequency": "r",
            "id": 1,
            "synset":"acorn.n.01",
            "image_count":0,
            "instance_count":0,
            "synonyms": [
                "acorn"
            ],
            "def": "nut from an oak tree",
            "name": "acorn",
        }
        ...
    ],
    "licenses": [
        {
            "Unknown"
        }
    ]
}
...

overall field structure

{
    "videos": [video],        #视频帧相关属性
    "annotations" : [annotation],
    "tracks": [track],        # {'category_id': 805, 'id': 72, 'video_id': 11}
    "images" : [image],       # 图像相关属性
    "info" : info,
    "categories": [category], # "{ 'id': 1, 'name': 'acorn',  'synset': 'acorn.n.01',....}"
    "licenses" : [license],
}

Analysis of each field

videos: {
    "id": int,
    "height" : int,             #视频中的图像height
    "width" : int,              #视频中的图像width
    "date_captured":str,        #数据获取时间
    "neg_category_ids": [int],  # 标注不完整的类别
    "name": str,                #记录视频的相对位置 比如 'train/YFCC100M/v_f69ebe5b731d3e87c1a3992ee39c3b7e'
    "not_exhaustive_category_ids": [int], # 在图中不存在的类别
    "metadata": dict,           # Metadata about the video
}
annotation: {
    "bbox": [x,y,width,height], #[左上角x,左上角y,width,height],
    "category_id":int,
    "iscrowd":int,
    "id":int, 
    "track_id":int,
    "_scale_uuid":str,
    "scale_category":str,
    "video_id": int,
    "segmentation":[polygon] ,   # 标注的多边形点，两个一对组成 x,y 位置
    "area":int
}
tracks: {
    "id": int,                   #跟踪任务ID
    "category_id": int,          #类别ID
    "video_id": int              #video ID
}
images: {
    "video": str,
    "_scale_task_id": str,
    "id": int,
    "license": int,
    "height": int,
    "width": int,
    "file_name": str,
    "frame_index":int,
    "license": int,
    "video_id": int
}
info{
    "year" : int,
    "version ": str,
    "description ": str,
    "contributor" : str,
    "url" : str,
    "date_created" : datetime,
}
categories: {
    "frequency": str,           # 这个类别在所有图中出现的频次
    "id": int,
    "synset":str,               # 对类别唯一的识别字符串
    "image_count":0,            # 标注了这个类别的图片数量
    "instance_count":0,         # 标注了这个类别的实体数量
    "synonyms": [str],          # 同义词
    "def": str,                 # 对这个识别字符串的定义
    "name": str,
}
licenses: {
    "Unknown"
}

● Other json file formats:

The file format of tao_lasot.txt is as follows:

  microphone-10
  elephant-12
  bus-4
  elephant-10
  basketball-14
  frog-11
  ...

That is, each line is a video file name of about one and a half minutes. The same applies to other txt files such as tao_argoverse.txt and tao_ava.txt.

5. Dataset download link

The OpenDataLab platform provides you with complete data set information, intuitive data distribution statistics, smooth download speed, and convenient visual scripts. Welcome to experience. Click the original link to view.

https://opendatalab.com/TAO

References

[1] Official website: https://taodataset.org/

[2] References: A Dave, T Khurana,P Tokmakov, et al. Human: A large-scale benchmark for tracking any object. in ECCV, 2020. Springer, Cham, 2020: 436-4

[3]Github ：https://github.com/TAO-Dataset/tao

How to use multi-target tracking TAO dataset

1. Dataset Introduction

2. Dataset details

3. Dataset task definition and introduction

4. Interpretation of data set file structure

5. Dataset download link

Guess you like