Microsoft COCO: Common Objects in Context - 数据格式 (Data Format)

http://cocodataset.org/#home
http://cocodataset.org/#format-data
Home -> Evaluate -> Data Format

Data format
COCO has five annotation types: for object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning. The annotations are stored using JSON. Please note that the COCO API described on the download page can be used to access and manipulate all anotations. All annotations share the same basic data structure below:
The data structures specific to the various annotation types are described below.
注释使用 JSON 文件格式存储。所有注释共享下面的基本数据结构。
http://cocodataset.org/#detection-2018
http://cocodataset.org/#keypoints-2018
http://cocodataset.org/#stuff-2018
http://cocodataset.org/#panoptic-2018
http://cocodataset.org/#captions-2015
http://json.org/
https://github.com/cocodataset/cocoapi

{
"info": info,
"images": [image],
"annotations": [annotation],
"licenses": [license],
}

info{
"year": int,
"version": str,
"description": str,
"contributor": str,
"url": str,
"date_created": datetime,
}

image{
"id": int,
"width": int,
"height": int,
"file_name": str,
"license": int,
"flickr_url": str,
"coco_url": str,
"date_captured": datetime,
}

license{
"id": int,
"name": str,
"url": str,
}

1. Object Detection
Each object instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names. See also the detection task.
http://cocodataset.org/#detection-2018
每个实例注释包含一系列字段，包括对象的类别 ID 和分割掩码 (segmentation mask)。分割格式取决于实例是否代表单个对象 (iscrowd = 0，在这种情况下使用多边形) 或对象集合 (iscrowd = 1，在这种情况下使用 RLE)。请注意，单个对象 (iscrowd = 0) 可能需要多个多边形，例如，如果被遮挡。人群注释 (Crowd annotations) (iscrowd = 1) 用于标记大量对象 (例如一群人)。此外，还为每个对象提供了一个封闭的边界框 (框坐标是从左上角的图像角度测量的，并且是0索引的)。最后，注解结构的类别字段存储了类别 ID 到类别和超类别名称的映射。

annotation{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}

categories[{
"id": int,
"name": str,
"supercategory": str,
}]

2. Keypoint Detection
A keypoint annotation contains all the data of the object annotation (including id, bbox, etc.) and two additional fields. First, "keypoints" is a length 3k array where k is the total number of keypoints defined for the category. Each keypoint has a 0-indexed location x,y and a visibility flag v defined as v=0: not labeled (in which case x=y=0), v=1: labeled but not visible, and v=2: labeled and visible. A keypoint is considered visible if it falls inside the object segment. "num_keypoints" indicates the number of labeled keypoints (v>0) for a given object (many objects, e.g. crowds and small objects, will have num_keypoints=0). Finally, for each category, the categories struct has two additional fields: "keypoints," which is a length k array of keypoint names, and "skeleton", which defines connectivity via a list of keypoint edge pairs and is used for visualization. Currently keypoints are only labeled for the person category (for most medium/large non-crowd person instances). See also the keypoint task.
http://cocodataset.org/#keypoints-2018
关键点注释包含 object annotation 的所有数据 (包括 id，bbox 等) 和两个附加字段。首先，“关键点”是长度为 3k 的数组，其中 k 是为该类别定义的关键点的总数。每个关键点有一个 0 索引的位置 x，y 和一个被定义为可见性标志。v = 0：没有标记 (在这种情况下x = y = 0)，v = 1：标记但不可见，v = 2：标记并可见。如果关键点位于对象段内部，则认为它是可见的。“num_keypoints”指示给定对象 (many objects, e.g. crowds and small objects, will have num_keypoints=0) 的标记关键点的数量 (v> 0)。最后，对于每个类别，类别 struct 还有两个附加字段：“keypoints”，它是关键点名称的长度为 k 的数组，以及“skeleton”，它通过关键点边缘对的列表定义连接，并用于可视化。目前，关键点仅标记为人物类别 (对于大多数中/大型非人群人物实例)。

annotation{
"keypoints": [x1,y1,v1,...],
"num_keypoints": int,
"[cloned]": ...,
}

categories[{
"keypoints": [str],
"skeleton": [edge],
"[cloned]": ...,
}]

3. Stuff Segmentation
The stuff annotation format is identical and fully compatible to the object detection format above (except iscrowd is unnecessary and set to 0 by default). We provide annotations in both JSON and png format for easier access, as well as conversion scripts between the two formats. In the JSON format, each category present in an image is encoded with a single RLE annotation (see the Mask API for more details). The category_id represents the id of the current stuff category. For more details on stuff categories and supercategories see the stuff evaluation page. See also the stuff task.
https://github.com/nightrome/coco
http://cocodataset.org/#stuff-eval
http://cocodataset.org/#stuff-2018
物体注释格式是完全相同和完全兼容上面的对象实例注释格式 (除了 iscrowd 是不必要的，默认设置为 0)。我们提供 JSON 和 PNG 格式的注释，以便于访问，以及两种格式之间的 conversion scripts。在JSON格式中，图像中的每个类别都使用单个 RLE 注释进行编码 (有关更多详细信息，请参阅上面的Mask API)。 category_id 表示当前的东西类别的 ID。有关东西类别和超类别的更多细节查询 the stuff evaluation page.

4. Panoptic Segmentation
Details coming soon!

5. Image Captioning
These annotations are used to store image captions. Each caption describes the specified image and each image has at least 5 captions (some images have more). See also the captioning task.
http://cocodataset.org/#captions-2015
这些注释用于存储图像标题。每个标题描述指定的图像，每个图像至少有5个字幕 (一些图像有更多)。

annotation{
"id": int,
"image_id": int,
"caption": str,
}

Microsoft COCO: Common Objects in Context - 数据格式 (Data Format)

Microsoft COCO: Common Objects in Context - 数据格式 (Data Format)

猜你喜欢