COCO数据集标注格式介绍

本博客参考自

https://zhuanlan.zhihu.com/p/29393415

https://blog.csdn.net/yeyang911/article/details/78675942

这两篇介绍都挺好的,但是没有亲自查看json文件中的变量结构,总觉得记起来不是很深刻。

因为我是做目标检测,只用到instance json,如果查看另外两类json文件,方法相同。

首先把json文件读进来,它本身是一个字典,包含如下key

>>> import json
>>> val=json.load(open('instances_val2017.json', 'r'))
>>> val.keys()
dict_keys(['info', 'licenses', 'images', 'annotations', 'categories'])

共五个键,先看几个轻量级的键

>>> val['info']
{'description': 'COCO 2017 Dataset', 'url': 'http://cocodataset.org', 'version': '1.0', 'year': 2017, 'contributor': 'COCO Consortium', 'date_created': '2017/09/01'}

>>> val['licenses']
[{'url': 'http://creativecommons.org/licenses/by-nc-sa/2.0/', 'id': 1, 'name': 'Attribution-NonCommercial-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nc/2.0/', 'id': 2, 'name': 'Attribution-NonCommercial License'}, {'url': 'http://creativecommons.org/licenses/by-nc-nd/2.0/', 'id': 3, 'name': 'Attribution-NonCommercial-NoDerivs License'}, {'url': 'http://creativecommons.org/licenses/by/2.0/', 'id': 4, 'name': 'Attribution License'}, {'url': 'http://creativecommons.org/licenses/by-sa/2.0/', 'id': 5, 'name': 'Attribution-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nd/2.0/', 'id': 6, 'name': 'Attribution-NoDerivs License'}, {'url': 'http://flickr.com/commons/usage/', 'id': 7, 'name': 'No known copyright restrictions'}, {'url': 'http://www.usa.gov/copyright.shtml', 'id': 8, 'name': 'United States Government Work'}]

这两个key我们似乎没有用到,只是说明了数据集信息和版权相关的信息。

接下来看categories这个键:

>>> len(val['categories'])
80
>>> val['categories']
[{'supercategory': 'person', 'id': 1, 'name': 'person'}, {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}, {'supercategory': 'vehicle', 'id': 3, 'name': 'car'}, {'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'}, {'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'}, {'supercategory': 'vehicle', 'id': 6, 'name': 'bus'}, {'supercategory': 'vehicle', 'id': 7, 'name': 'train'},

这个键的值是长度为80的数组,这里我只展示了前几个,每个的结构都是一样的。'supercategory'表示当前这个类别从属的大类,例如自行车类从属于交通工具类这个大类。‘id’是当前这个类别的编号,总共80个类,编号从1-80,编号0表示背景。

再看image这个键:

>>> len(val['images'])
5000
>>> val['images'][:2]
[{'license': 4, 'file_name': '000000397133.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg', 'height': 427, 'width': 640, 'date_captured': '2013-11-14 17:02:52', 'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg', 'id': 397133}, {'license': 1, 'file_name': '000000037777.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000037777.jpg', 'height': 230, 'width': 352, 'date_captured': '2013-11-14 20:55:31', 'flickr_url': 'http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg', 'id': 37777}]
>>> val['images'][0].keys()
dict_keys(['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'id'])

images这个键有5000个值,表示5000张图片的信息,个人感觉比较重要的是‘file_name’,'height','width'和'id'。'height','width'表明图片的长和宽。

最后看最重要的annotations键

>>> len(val['annotations'])
36781
>>> val['annotations'][0]
{'segmentation': [[510.66, 423.01, 511.72, 420.03, 510.45, 416.0, 510.34, 413.02, 510.77, 410.26, 510.77, 407.5, 510.34, 405.16, 511.51, 402.83, 511.41, 400.49, 510.24, 398.16, 509.39, 397.31, 504.61, 399.22, 502.17, 399.64, 500.89, 401.66, 500.47, 402.08, 499.09, 401.87, 495.79, 401.98, 490.59, 401.77, 488.79, 401.77, 485.39, 398.58, 483.9, 397.31, 481.56, 396.35, 478.48, 395.93, 476.68, 396.03, 475.4, 396.77, 473.92, 398.79, 473.28, 399.96, 473.49, 401.87, 474.56, 403.47, 473.07, 405.59, 473.39, 407.71, 476.68, 409.41, 479.23, 409.73, 481.56, 410.69, 480.4, 411.85, 481.35, 414.93, 479.86, 418.65, 477.32, 420.03, 476.04, 422.58, 479.02, 422.58, 480.29, 423.01, 483.79, 419.93, 486.66, 416.21, 490.06, 415.57, 492.18, 416.85, 491.65, 420.24, 492.82, 422.9, 493.56, 424.39, 496.43, 424.6, 498.02, 423.01, 498.13, 421.31, 497.07, 420.03, 497.07, 415.15, 496.33, 414.51, 501.1, 411.96, 502.06, 411.32, 503.02, 415.04, 503.33, 418.12, 501.1, 420.24, 498.98, 421.63, 500.47, 424.39, 505.03, 423.32, 506.2, 421.31, 507.69, 419.5, 506.31, 423.32, 510.03, 423.01, 510.45, 423.01]], 'area': 702.1057499999998, 'iscrowd': 0, 'image_id': 289343, 'bbox': [473.07, 395.93, 38.65, 28.67], 'category_id': 18, 'id': 1768}
>>> val['annotations'][0].keys()
dict_keys(['segmentation', 'area', 'iscrowd', 'image_id', 'bbox', 'category_id', 'id'])

segmentation字段是一个数组,若使用polygon格式(iscrowd字段为0),目标被4个点包起来,则segmentation有八个元素,分别表示四个点的x坐标和y坐标。

area是被segmentation包起来的面积

image_id表明了这个目标所在的图片id,跟images键的id字段对应

Bbox是将segmentation包起来的水平矩形

category_id表明了这个目标的类别

id是这个框的id,虽然不是从0开始的,但是每个框的id都是一样的

>>> ids=[val['annotations'][i]['id'] for i in range(36781)]
>>> sorted(ids)[:10]
[283, 381, 567, 760, 810, 1363, 1367, 1536, 1599, 1747]
>>> len(set(ids))
36781
>>> len(ids)
36781

最后借花献佛,贴一下前面提到的两篇博客的截图,讲的比较清楚。

猜你喜欢

转载自blog.csdn.net/scut_salmon/article/details/88252363