Dataset formats commonly used in target detection tasks (voc, coco, yolo)

1. Pascal  VOC

VOC dataset (the format of Annotation is xmI)

The Pascal VOC dataset is one of the commonly used large-scale datasets for target detection. Competitions will be held from 2005 to 2012. The competition task task:

  • Classification
  • Object DetectionObject Detection
  • Semantic segmentation Class Segmentation
  • Instance segmentation Object Segmentation
  • Action Classification (a classification focusing on human actions)
  • Person Layout (a target detection system focusing on various parts of the human body)

A. Dataset contains types

A total of 20 categories are included. A total of 20 categories are included.

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

B. The difference between V0C2007 and V0C2012

VOC2007 contains 9963 labeled pictures, which consist of three parts: train/val/test, and a total of 24,640 objects are labeled.

For detection tasks, the trainval/test of VOC2012 contains all corresponding pictures from 2008-11. The trainval has 11540 pictures with a total of 27450 objects

C. VOC dataset file structure

The Pascal VOC dataset  consists of five parts: JPEGImages, Annotations, ImageSets, SegmentationClass, and SegmentationObject.

.
└── VOCdevkit #Root directory
    └── VOC2012 #Datasets of different years, only 2012 is downloaded here, and
        ├── Annotations of other years such as 2007 #Store xml files, corresponding to the pictures in JPEGImages , each xml corresponds to a picture in JPEGImage and describes the picture information
        ├── ImageSets #The directory stores all txt files, each line in the txt file contains the name of a picture, and ±1 is added at the end to indicate positive or negative Sample│
        ├── Action #[Action stores human actions (such as running, jumping, etc.)]
        │ ├── Layout # [Layout stores data with human body parts]
        │ ├── Main # 【 The data stored under Main is image object recognition, which is divided into 20 categories in total.
        │ └── Segmentation # [Segmentation stores data that can be used for segmentation]
        ├── JPEGImages #Stores source images
        ├── SegmentationClass #[Stores images segmented by class, semantic segmentation related; target detection does not need]
        └ ── SegmentationObject #【Stores images segmented by object, related to instance segmentation; not required for target detection】

  • JPEGImages: Stores all images for training and testing.
  • Annotations (comment): the storage path of the data set label, through the XML file format, store the labels of various tasks for the image data. Some of the labels are labels for target detection. Stored in it is the XML file corresponding to the label of each picture. Each image file corresponds to an xml file.
  • ImageSets: Under the ImageSets folder, only the Main folder is discussed this time. There are four main text files stored in this folder: test.txt, train.txt, trainval.txt, and val.txt, which store the test files respectively. The file name of the training set image, the file name of the training set image, the file name of the training validation set image, and the file name of the validation set image.
  • SegmentationClass and SegmentationObject: All images are stored, and they are all image segmentation results, which are useless for target detection tasks. class segmentation marks the category of each pixel
  • object segmentation marks which object each pixel belongs to.

The directory is as follows

VOC
├─Annotations
│      ├─img0001.xml
│      ├─img0002.xml
│      ├─img0003.xml
│      ├─img0004.xml
│      ├─img0005.xml
│      └─img0006.xml

├─ImageSets
│  └─Main
│      ├─test.txt
│      ├─train.txt
│      ├─trainval.txt
│      └─val.txt

└─JPEGImages
        ├─img0001.jpg
        ├─img0002.jpg
        ├─img0003.jpg
        ├─img0004.jpg
        ├─img0005.jpg
        └─img0006.jpg
 

D. Annotation information is organized by xmI file

The annotation format of the xml file is as follows:

<annotation>
	<folder>VOC2007</folder>   # 图片所处文件夹
	<filename>000001.jpg</filename>  # 图片文件名
	<path>pathto/000001.jpg</path>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>  # 图像尺寸,深度
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>  # 是否用于分割
	<object>  # 标注目标 1
		<name>dog</name>  # 物体类别
		<pose>Left</pose>  # 拍摄角度:front, rear, left, right, unspecified 
		<truncated>1</truncated>  # 目标是否被截断(比如在图片之外),或者被遮挡(超过15%)
		<difficult>0</difficult>  # 检测难易程度,这个主要是根据目标的大小,光照变化,图片质量来判断,虽有标注, 但一般忽略这类物体
		<bndbox>     # 物体的bound box
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>  # 标注目标 2
		<name>person</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

2. COCO dataset

The COCO data set has 91 categories. Although there are fewer categories than ImageNet and SUN, there are more images in each category, which is conducive to obtaining more capabilities in a specific scene in each category. Compared with PASCAL VOC, it has more categories. and images.

Compared with VOC, there are more small targets on the coco dataset, more targets in a single picture, and most of the objects are non-central distribution, which is more in line with the daily environment, so coco detection is more difficult.

COCO is more difficult, because the coco data set has a large number of objects in each picture, so compared to other data sets, the detection accuracy of this data set is very low

In order to better introduce this dataset, Microsoft published this article in ECCV Workshops: Microsoft COCO: Common Objects in Context. From this article, we learned that this data set is aimed at scene understanding, mainly intercepted from complex daily scenes, and the position of the target in the image is calibrated through precise segmentation. The images include 91 types of objects, 328,000 images and 2,500,000 labels.

A. The coco dataset contains categories:

The 80 categories are

[‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’, ‘fire hydrant’, ‘stop sign’, ‘parking meter’, ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’, ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘backpack’, ‘umbrella’, ‘handbag’, ‘tie’, ‘suitcase’, ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’, ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’, ‘tennis racket’, ‘bottle’, ‘wine glass’, ‘cup’, ‘fork’, ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’, ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’, ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘dining table’, ‘toilet’, ‘tv’, ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’, ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’, ‘hair drier’, ‘toothbrush’]

B. COCO dataset file structure

COCO_ROOT #Root directory
├── annotations # Store annotations in json format
│ ├── instances_train2017.json
│ └── instances_val2017.json
└── train2017 # Store image files
│ ├── 000000000001.jpg
│ ├── 000000000002.jpg
│ └── 000000000003.jpg
└── val2017
├── 000000000004.jpg
└── 000000000005.jpg

Different from the VOC one file one xml standard, all COCO target box annotations are in the same json (instances_train2017.json or instances_val2017.json). The json is parsed out in dictionary format

Coco has three annotation types in total, and these three types share these basic types: info, image, license , and are stored in JSON files. Each type includes training and validation

  • object instances (target instance): that is, the target detection object detection annotation;
  • object keypoints (key points on the target);
  • image captions (look at the picture and talk)

 The annotation format of the json file is as follows. Taking Object Instance as an example, the file in this format is divided into the following paragraphs in order from beginning to end:

(1) The length of the list elements in the images field is equal to the number of pictures included in the training set (or test set); (2) The number of
list elements in the annotations field is equal to the number of bounding boxes in the training set (or test set);
( 3) The number of elements in the categories field list is equal to the number of categories

# 整个 json 文件格式
{
    "info": info,               # dict
    "licenses": [license],      # list,内部是dict
    "images": [image],          # list,内部是dict
    "annotations": [annotation],# list,内部是dict
    "categories": [category]    # list,内部是dict
}

# 上面中每个字典的结构
info{                           # 数据集信息描述
    "year": int,                # 数据集年份
    "version": str,             # 数据集版本
    "description": str,         # 数据集描述
    "contributor": str,         # 数据集提供者
    "url": str,                 # 数据集下载链接
    "date_created": datetime,   # 数据集创建日期
}
license{
    "id": int,     # int 协议id号      在images中遵循的license即1
    "name": str,   # str 协议名 
    "url": str,    # str 协议链接    
} 

# images是一个list,存放所有图片(dict)信息。image是一个dict,存放单张图片信息 
image{     
    "id": int,                  # 图片的ID编号(每张图片ID唯一)
    "width": int,               # 图片宽
    "height": int,              # 图片高
    "file_name": str,           # 图片名字
    "license": int,             # 协议
    "flickr_url": str,          # flick图片链接url
    "coco_url": str,            # coco图片链接url
    "date_captured": datetime,  # 数据集获取日期
}

# annotations是一个list,存放所有标注(dict)信息。annotation是一个dict,存放单个目标标注信息。
annotation{
    "id": int,                  # 图片中每个被标记物体的id编号,目标对象ID(每个对象ID唯一),每张图片可能有多个目标
    "image_id": int,            # 该物体所在图片的对应ID
    "category_id": int,         # 被标记物体的对应类别ID编号,与categories中的ID对应
    "segmentation": RLE or [polygon],   # 实例分割,对象的边界点坐标[x1,y1,x2,y2,....,xn,yn]
    "area": float,              # 对象区域面积
    "bbox": [xmin,ymin,width,height], # 目标检测,对象定位边框[x,y,w,h]
    "iscrowd": 0 or 1,          # 表示是否是人群/ 0 or 1 目标是否被遮盖,默认为0
}
# 类别描述
categories{                  
    "id": int,                  # 类别对应的ID编号(0默认为背景)
    "name": str,                # 子类别名字
    "supercategory": str,       # 主类别名字, 类别所属的大类,如卡车和轿车都属于机动车这个class
}

3. YOLO dataset format

Labels are saved in txt text. The directory of yolo is as follows:

dataset
├─images
│  ├─train
│  │    ├─ flip_mirror_himg0026393.jpg
│  │    ├─ flip_mirror_himg0026394.jpg
│  │    ├─ flip_mirror_himg0026395.jpg
│  │    ├─ flip_mirror_himg0027314.jpg
│  │    ├─ flip_mirror_himg0027315.jpg
│  │    └─flip_mirror_himg0027316.jpg
│  └─val
│     ├─ flip_mirror_himg0027317.jpg
│     └─flip_mirror_himg0027318.jpg
└─labels
    ├─train
    │    ├─ flip_mirror_aimg0025023.txt
    │    ├─ flip_mirror_aimg0025024.txt
    │    ├─ flip_mirror_aimg0025025.txt
    │    ├─ flip_mirror_aimg0025026.txt
    │    ├─ flip_mirror_aimg0025027.txt
    │    └─ flip_mirror_aimg0025028.txt
    └─val 
         ├─ flip_mirror_aimg0025029.txt
         └─flip_mirror_aimg0025030.txt

Labels are saved in txt text.

The yolo annotation format is as follows:

<object-class> <x> <y> <width> <height>

For example:

0 0.412500 0.318981 0.358333 0.636111

Each row represents a target labeled

  • 0: the label index of the object
  • x, y: The center coordinates of the target, normalized relative to the H and W of the picture. That is x/W, y/H.
  • width, height: The relative width and height of the target (bbox), normalized relative to the H and W of the image.

Guess you like

Origin blog.csdn.net/ytusdc/article/details/131972922