COCO dataset

I. Introduction

Official Website: http: //cocodataset.org/
full name: Microsoft Common Objects in Context (MS COCO)
support tasks: Detection, Keypoints, Stuff, Panoptic , Captions
Description: COCO data set currently available in three versions, namely 2014,2015 and 2017 , 2015 edition of which only test set, there are two other sets of training, validation and test sets.
(This paste content from the official website + personal understanding and description)

Second, the data set download

Method One: direct official website to download (need FQ).
Method two: I Yiba set of data in the official website Baidu cloud network disk , download their own (without FQ).

Third, the described data set

COCO data set comprises two parts: Images and Annotations
Images: "Task + version" named folder (for example: train2014), which is xxx.jpg image files;
Annotations: folder containing the text format xxx.json file (e.g.: instances_train2014.json);
core usage COCO dataset is that the reading operation xxx.json file, the following details the structure and use of annotation file.

3.1 Common fields

  There are five types of annotations COCO 01. Task: object detection, key detection, the physical division, the panoramic image segmentation and description. Note Using JSON file storage. Xxx.json entire contents of each of a dictionary, key as "info", "images", "annotations" and "licenses", as follows:

1 {
2     "info"            :info,    
3     "images"          :[image],
4     "annotations"     :[annotation],
5     "licenses"        :[license],
6 }

  value for the corresponding data type, wherein, info is a dictionary, Images is a list, annotations is a list, licenses is a list. In addition to the annotation, the content of each part is defined as follows:

. 1 info { 
 2 "year": int, year data set number # 
 3 "version": str, # dataset version 
 4 "description": str, # Dataset Description 
 5 "contributor": str, # contributor 
 6 "url ": str, # dataset official website 
 7" date_created ": datetime, # data set to create a detailed time 
 . 8 } 
 . 9 
10 image { 
. 11," ID ": int, # image ID 
12 is" width ": int, # image width 
13" height ": int, # image height 
14" file_name ": str, # image file name 
15" license ": int, # Permit 
16"flickr_url ": str, # flickr link 
17"coco_url ": str, # coco link 
18 "date_captured": datetime, # photographing time 
. 19 }  20 is  21 is License {  22 is "ID": int, License number #, 1-8  23 is "name": STR, license # certificate name  24 "url": str, # URL license  25}

  Different value corresponding to the key for the "annotation" of xxx.json slightly different, but the meaning of representing the content is the same, namely the description of pictures and examples. At the same time in addition to the annotation, as well as a key for the "categories" represents the category. The following are categories for annotation and different tasks will be explained.

3.2 Non-GM fields

3.2.1 Object Detection (object detection)

  Mission to detect, for example, for each image, including at least one object, COCO data set for each object is described, rather than a picture. Each object contains a set of fields, including object classes and id code mask, mask code division format depends on the number of objects in the image, when an image of an object a Lane (iscrowd = 0), mask code by RLE format, when more than one object (iscrowd = 1), using polyhon format.

. 1 annotation { 
 2 "ID": int, the annotation # id, each object corresponds to an annotation 
 . 3 "the image_id": int, where the object image # annotation of the ID 
 . 4 "category_id": int, category # id, each category corresponds to a target 
 . 5 "Segmentation": the RLE or [Polygon], 
 . 6 "area": a float, area # 
 7 "bbox": [x, y, width, height], # x, y coordinates of the upper left corner 
 8 "iscrowd ": 0 or 1, # 0 when segmentation is REL, 1 to Polygon 
 . 9 } 
10 
. 11 the categories [{  12 is" ID ": int, # category ID  13 is" name ": STR, # category name  14" supercategory ": str,# Parent category, for example: bicycle parent is Vehicle  15}]

3.2.2 Keypoint Detection (key point detection)

  As with the detection task, a lump sum image of a number of objects, an object corresponding to a keypoint annotation, all the annotation data comprising a keypoint annotation objects (including id, bbox etc.) and two additional fields.
  First, key as "keypoints" the value is an array of length 3k, where k is the total number of critical points defined by the categories (e.g., body posture critical point k 17). Each key has a location index 0 x , y and visibility flag v (v = 0 to represent unlabeled, then x = y = 0; v = 1 represents mark, but is not visible, the reason is that the invisible obscured; and a mark indicating when v = 2 visible), if a key point falls within the target segment, it is considered to be visible.

. 1 Annotation { 
 2 "keypoints"         : [X1, Y1, V1, ...], 
 . 3 "num_keypoints": int, V # 1, 2 = the number of keys, the key number that is labeled point 
 4 " [Cloned] "          : ...,     
 . 5 }  . 6  . 7 the Categories [{  . 8" keypoints ": [STR], # key length k, named string  9" skeleton ": [edge] , # key points of communication, mainly in the form of a set of keypoints by edge queue table indicates, for visualization.  10 "[Cloned]" : ...,. 11}]

  Wherein, [cloned] indicates a copy from the Object Detection annotation fields defined above. Because keypoint of json file contains all required fields detection tasks.

3.2.3 Stuff Segmentation (Example division)

  Dividing the same task with the above comment format objects Object Detection and compatible (except iscrowd unnecessary, the default value is 0), the field is divided main task "segmentation".

3.2.4 Panoptic Segmentation (panoramic division)

For pan segmentation task, each annotation structure is a comment for each image, rather than annotation of each object, the above three are different. Note each image has two parts: 1 PNG) stored independent of the type of image segmentation; 2) storing semantic information of each image segment JSON structure.

  1. To match the image annotation using image_id field (ie: annotation.image_id == image.id);
  2. For each annotation, id for each pixel segment is stored as a separate PNG, PNG file located in the same name as the folder JSON. Each division has a unique id, unlabeled pixels is 0;
  3. For each annotation, each semantic information is stored in annotation.segments_info. Segment_info.id, the storage section stores a unique id, and used to retrieve the corresponding mask (ids == segment_info.id) from the PNG. iscrowd represents the segment contains a set of objects. field indicates bbox area and additional information.
 1 annotation{
 2     "image_id"    : int, 
 3     "file_name"    : str, 
 4     "segments_info"  : [segment_info],
 5 }
 6 
 7 segment_info{ 8 "id"      : int,. 9 "category_id" : int, 10 "area"     : int, 11 "bbox"     : [x,y,width,height], 12 "iscrowd"   : 0 or 1, 13 } 14 15 categories[{ 16 "id"        : int, 17 "name"       : str, 18 "supercategory" : str, 19 "isthing" : 0 or 1, 20 "color"      : [R,G,B], 21 }]

3.2.5 Image Captioning (picture caption)

  Note tasks for storing image subtitle video title, each title described in the specified images, each at least five titles.

1 annotation{
2     "id"      : int, 
3     "image_id"   : int, 
4     "caption"   : str,
5 }

Fourth, the use of the data set (Python)

4.1 COCOAPI

  Through the above description shows COCO label data set has a certain complexity, the need to obtain various documents to read through the comments, in order to allow users to make better use of the data set COCO, COCO offers a variety of API, namely to introduce the following cocoapi .

4.2 API Installation

  Install dependencies:

1 ~$ pip install numpy Cython matplotlab

  git Download: https://github.com/cocodataset/cocoapi.git
  After downloading to the next PythonAPI catalog:

1 ~$ cd coco/PythonAPI
2 ~/cocoapi$ make

4.3 COCO API to use (the official routines)

  Installing the site-packages folder can be seen pycocotools packet, the packet is Python API COCO data set, to help loading, parsing and annotation COCO visualization. Using the API method is to use direct API provides functions to load the file and read the comments Python dictionary. API function is defined as follows:

  1. COCO: COCO comment file is loaded and ready COCO api class data structure.
  2. decodeMask: decoding the run-length encoded by a binary mask M.
  3. encodeMask: using a run-length encoding to encode the binary masks M.
  4. getAnnIds: met to set the annotation id filters.
  5. getCatIds: obtaining a category id satisfies the filter conditions given.
  6. getImgIds: met to the id given imgage filters.
  7. loadAnns: id specified load annotation.
  8. loadCats: id specified load category.
  9. loadImgs: id specified load imgage.
  10. annToMask: Convert comments in a binary segmentation mask.
  11. showAnns: Displays the specified annotation.
  12. loadRes: loading algorithm results and create access their API.
  13. download: Download images from mscoco.org COCO server.

  Loading data is shown below, parsing and annotation visualization etc., the following steps:

1, first import the necessary package

1 %matplotlib inline
2 from pycocotools.coco import COCO
3 import numpy as np
4 import skimage.io as io
5 import matplotlib.pyplot as plt
6 import pylab 7 pylab.rcParams['figure.figsize'] = (8.0, 10.0)

2, defined annotation file path (with "instances_val2014.json" for example)

1 dataDir='..'
2 dataType='val2014'
3 annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)

3, to read the file COCO class instances_val2014.json

1 # initialize COCO api for instance annotations
2 coco = COCO(annFile)

输出如下:
loading annotations into memory…
Done (t=4.19s)
creating index…
index created!

4, COCO image reading category

1 # display COCO categories and supercategories
2 cats = coco.loadCats(coco.getCatIds())
3 nms=[cat['name'] for cat in cats]
4 print('COCO categories: \n{}\n'.format(' '.join(nms)))
5 
6 nms = set([cat['supercategory'] for cat in cats])
7 print('COCO supercategories: \n{}'.format(' '.join(nms)))

输出如下
COCO categories:
person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush

COCO supercategories:
sports furniture electronic food appliance vehicle animal kitchen outdoor indoor person accessory

5, COCO original image reading

# 1 found for 'person', 'dog', 'skateboard' filter condition category_id 
 2 CATIDs = coco.getCatIds (= catNms [ 'person', 'dog', 'skateboard' ]); 
 . 3 to identify compliance category_id filter # image_id conditions 
 . 4 imgIds = coco.getImgIds (CATIDs = CATIDs); 
 . 5 # images_id to identify imgIds in the image_id 324 158 
 . 6 imgIds = coco.getImgIds (imgIds = [324 158 ]) 
 . 7 # load picture, acquiring a digital image matrix 
 8 = IMG coco.loadImgs (imgIds [np.random.randint (0, len (imgIds))]) [0]  . 9 display image #  10 io.imread the I = (IMG [ 'coco_url' ])  . 11 plt.axis ( ' OFF ' ) 12 is plt.imshow (the I) 13 is plt.show ()

Output is as follows:

 6, load and display annotations

1 # load and display instance annotations
2 plt.imshow(I); plt.axis('off')
3 annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
4 anns = coco.loadAnns(annIds)
5 coco.showAnns(anns)

Output is as follows:

 7, load and display the annotations person_keypoints_2014.json

 1 # initialize COCO api for person keypoints annotations
 2 annFile = '{}/annotations/person_keypoints_{}.json'.format(dataDir,dataType)
 3 coco_kps=COCO(annFile)
 4 
 5 # load and display keypoints annotations
 6 plt.imshow(I); plt.axis('off')
 7 ax = plt.gca()
 8 annIds = coco_kps.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
 9 anns = coco_kps.loadAnns(annIds) 10 coco_kps.showAnns(anns)

输出如下:
loading annotations into memory…
Done (t=2.08s)
creating index…
index created!

 8, load and display the annotations captions_2014.json.json

1 # initialize COCO api for caption annotations
2 annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType)
3 coco_caps=COCO(annFile)
4 
5 # load and display caption annotations
6 annIds = coco_caps.getAnnIds(imgIds=img['id']);
7 anns = coco_caps.loadAnns(annIds)
8 coco_caps.showAnns(anns)
9 plt.imshow(I); plt.axis('off'); plt.show()

输出如下:
loading annotations into memory…
Done (t=0.41s)
creating index…
index created!
A man is skate boarding down a path and a dog is running by his side.
A man on a skateboard with a dog outside.
A person riding a skate board with a dog following beside.
This man is riding a skateboard behind a dog.
A man walking his dog on a quiet country road.

五、COCO数据集的评估

5.1 IOU值计算

  上图所示的IOU计算如下:

5.2 COCO评估指标

  1. 除非另有说明,否则AP和AR在多个交汇点(IoU)值上取平均值,使用0.50到0.95共10个IOU阈值下的mAP求平均,结果就是COCO数据集定义的AP,与只用一个IOU=0.50下计算的AP相比,是一个突破;
  2. AP是所有类别的平均值。传统上,这被称为“平均准确度”(mAP,mean average precision)。官方没有区分AP和mAP(同样是AR和mAR),并假定从上下文中可以清楚地看出差异。
  3. AP(所有10个IoU阈值和所有80个类别的平均值)将决定赢家。在考虑COCO性能时,这应该被认为是最重要的一个指标。
  4. 在COCO中,比大物体相比有更多的小物体。具体地说,大约41%的物体很小(area<322),34%是中等(322 < area < 962)),24%大(area > 962)。测量的面积(area)是分割掩码(segmentation mask)中的像素数量。
  5. AR是在每个图像中检测到固定数量的最大召回(recall),在类别和IoU上平均。AR与proposal evaluation中使用的同名度量相关,但是按类别计算。
  6. 所有度量标准允许每个图像(在所有类别中)最多100个最高得分检测进行计算。
  7. 除了IoU计算(分别在框(box)或掩码(mask)上执行)之外,用边界框和分割掩码检测的评估度量在所有方面是相同的。

5.3 COCO结果文件统一格式

Object Detection

1 [{
2     "image_id"        : int,    
3     "category_id"    : int, 
4     "bbox"            : [x,y,width,height],     
5     "score"            : float,    
6 }]

  框坐标是从图像左上角测量的浮点数(并且是0索引的)。官方建议将坐标舍入到最接近十分之一像素的位置,以减少JSON文件的大小。
   对于对象segments的检测(实例分割),请使用以下格式:

1 [{
2     "image_id"        : int,    
3     "category_id"     : int, 
4     "segmentation"    : RLE,     
5     "score"           : float,    
6 }]

Keypoint Detection

1 [{
2     "image_id"      : int, 
3     "category_id"   : int,
4      "keypoints"    : [x1,y1,v1,...,xk,yk,vk], 
5      "score"        : float,
6 }]

  关键点坐标是从左上角图像角测量的浮点数(并且是0索引的)。官方建议四舍五入坐标到最近的像素,以减少文件大小。还请注意,目前还没有使用vi的可视性标志(除了控制可视化之外),官方建议简单地设置vi=1。

Stuff Segmentation

1 [{
2     "image_id"       : int, 
3     "category_id"    : int, 
4     "segmentation"   : RLE,
5 }]

  除了不需要score字段外,Stuff 分割格式与Object分割格式相同。注意:官方建议用单个二进制掩码对图像中出现的每个标签进行编码。二进制掩码应该使用MaskApi函数encode()通过RLE进行编码。例如,参见cocostuffhelper.py中的segmentationToCocoResult()。为了方便,官方还提供了JSON和png格式之间的转换脚本。

Panoptic Segmentation

 1 annotation{
 2     "image_id"    : int, 
 3     "file_name"   : str, 
 4     "segments_info" : [segment_info],
 5 }
 6 
 7 segment_info{ 8 "id"      : int, 9 "category_id" : int, 10 }

Image Captioning

1 [{
2     "image_id": int, 
3     "caption": str,
4 }]

5.4 COCOEVAL API使用(官方例程)

COCO还提供了一个计算评估指标的API,即当自己的模型按照官方定义的格式输出后,可以使用API进行快速评估模型的一系列指标。

1、导入必要的包

1 %matplotlib inline
2 import matplotlib.pyplot as plt
3 from pycocotools.coco import COCO
4 from pycocotools.cocoeval import COCOeval
5 import numpy as np
6 import skimage.io as io 7 import pylab 8 pylab.rcParams['figure.figsize'] = (10.0, 8.0)

2、选择任务

1 annType = ['segm','bbox','keypoints']
2 annType = annType[1]      #specify type here
3 prefix = 'person_keypoints' if annType=='keypoints' else 'instances'
4 print('Running demo for *%s* results.'%(annType))

输出如下:
Running demo for bbox results.

3、加载json注释文件(即:Ground Truth)

1 #initialize COCO ground truth api
2 dataDir='../'
3 dataType='val2014'
4 annFile = '%s/annotations/%s_%s.json'%(dataDir,prefix,dataType)
5 cocoGt=COCO(annFile)

输出如下:
loading annotations into memory…
Done (t=3.16s)
creating index…
index created!

4、加载result文件(即:Predict)

  COCO.loadRes(resFile)返回的也是一个COCO类,与COCO(annFile)不同的是,前者加载官方规定格式的result文件,后者加载官方提供的json文件。

1 #initialize COCO detections api
2 resFile='%s/results/%s_%s_fake%s100_results.json'
3 resFile = resFile%(dataDir, prefix, dataType, annType)
4 cocoDt=cocoGt.loadRes(resFile)

输出如下:
Loading and preparing results…
DONE (t=0.03s)
creating index…
index created!

5、使用测试集当中的100张图片进行评估

1 imgIds=sorted(cocoGt.getImgIds())    # 把测试集的图像id按从小到达排列
2 imgIds=imgIds[0:100]    # 取出前面100个图像
3 imgId = imgIds[np.random.randint(100)]    # 顺序打乱

6、执行评估

1 # running evaluation
2 cocoEval = COCOeval(cocoGt,cocoDt,annType)
3 cocoEval.params.imgIds  = imgIds
4 cocoEval.evaluate()
5 cocoEval.accumulate()
6 cocoEval.summarize()

输出如下:
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=0.21s).
Accumulating evaluation results…
DONE (t=0.25s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.505
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.697
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.573
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.586
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.519
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.387
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.594
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.640
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.564

六、总结

  以上为COCO数据集官方例程+个人理解,作为本人的学习笔记,同时供新手了解。若有错漏,请在评论区指出。(转载请注明来源)

Guess you like

Origin www.cnblogs.com/Meumax/p/12021913.html