(一) COCO Python API - 使用篇

为使用户更好地使用 COCO 数据集, COCO 提供了各种 API, 这里就 Python API 的使用做个简单介绍.

在介绍 API 之前, 首先应该对 COCO 数据集有个大概的了解.

根据年份来区分, 到目前为止, COCO 的数据集, 分别有 2014, 2015, 2017.
根据用途不同, 数据集分为目标检测, 目标分割( 对应的标注信息是 “bbox” 和 “segmentation”), 图像语义理解(“captions”), 人体关节点(“keypoints”);

这里着重分析一下图像检测数据集, 其余两种等之后有时间了再做补充.

1. 图像检测数据集标注信息

一言以蔽之, API 的作用就是为了提取标注文件中的信息, 使其分别用于各自的场景, 比如图像检测使用的边界框参数, 图像分割使用的 mask 参数, 人体姿态检测使用的关节点参数等.

为了更好的使用数据集, 需要先提前将数据集下载好.

图像检测数据集的标注信息保存在 .json 文件中, 例如 2017_val 的标注数据就保存在 instances_val2017.json 文件中. 其内容如下:


{"info": 
    {"description": "This is stable 1.0 version of the 2017 MS COCO dataset.", 
     "url": "http://mscoco.org",  "version": "1.0", "year": 2017, 
     "contributor": "Microsoft COCO group", 
     "date_created": "2017-11-11 02:11:36.777541"
    }, 
    "images": [
        {"license": 2,"file_name": "000000289343.jpg",
         "coco_url": "http://images.cocodataset.org/val2017/000000289343.jpg",
         "height": 640,"width": 529,"date_captured": "2013-11-15 00:35:14",
         "flickr_url": "http://farm5.staticflickr.com/4029/4669549715_7db3735de0_z.jpg","id": 289343}, 
        ...
        {"license": 1,"file_name": "000000329219.jpg",
         "coco_url": "http://images.cocodataset.org/val2017/000000329219.jpg",
         "height": 427,"width": 640,"date_captured": "2013-11-14 19:21:56",
         "flickr_url": "http://farm9.staticflickr.com/8104/8505307842_465524a6a6_z.jpg",
         "id": 329219},
        ...
    ],
    "annotations": [
        {"segmentation": [[510.66,423.01,511.72,420.03,510.45,416.0,510...,423.01]],
         "area": 702.1057499999998,
         "iscrowd": 0,
         "image_id": 289343,
         "bbox": [473.07,395.93,38.65,28.67], "category_id": 18, "id": 1768
        },
        ...
        {"segmentation": [[304.09,266.18,308.95,263.56,313.06,262.81,...,266.55]],
         "area": 4290.290900000001,
         "iscrowd": 0,
         "image_id": 329219,
         "bbox": [297.73,252.34,60.21,108.45],"category_id": 18,"id": 8032}
    ],

    "licenses": [
        {"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/", 
         "id": 1, "name": "Attribution-NonCommercial-ShareAlike License"}, 
        ...
        {"url": "http://www.usa.gov/copyright.shtml", 
         "id": 8, "name": "United States Government Work"}
    ],
    "categories": [
        {"supercategory": "person", "id": 1, "name": "person"}, 
        ...
        {"supercategory": "indoor", "id": 90, "name": "toothbrush"}
    ]
}

为了显示条目的层次性, 在排版上对条目做了缩进. 同时为了不至于列出全部条目, 在保留内容的完整性的基础上删除了相似的条目.

可以看到, 我们最关心的就是 image 和 annotations, annotations 中就保存的是标注信息, 其中根据不同的应用场景又包含有: 目标的边界框 bbox 和图像分割区域 segmentation.

接下来介绍如何用 Python API 来提取这些信息.

以下 API 的使用参考的是 coco 官网给出的 demo 脚本: https://github.com/dengdan/coco/blob/master/PythonAPI/pycocoDemo.ipynb.

1) COCO 安装

安装其实很简单, 运行下面的命令:

git clone https://github.com/pdollar/coco.git

cd coco/PythonAPI
# 如果使用的是 python2, 运行下面的命令:  
make -j8
# 如果使用的是 python3, 需要更改 Makefile:  
vi Makefile
# 将 Makefile 中的 python 改为 python3, 然后:
make -j8

至此, COCO 就安装好了.

2) 加载 json 文件, 并解析其中的标注信息

下面这个是运行程序需要的包, 提前导入.

from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

COCO 是一个类, 因此, 使用构造函数创建一个 coco 对象, 构造函数首先会加载 json 文件, 然后解析图片和标注信息的 id, 根据 id 来创建其关联关系.

dataDir='/path/to/your/coco_data'
dataType='val2017'
annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)
# 初始化标注数据的 COCO api 
coco=COCO(annFile)

coco 对象创建完毕后会输出如下信息:

loading annotations into memory...
Done (t=0.81s)
creating index...
index created!

至此, json 脚本解析完毕, 并且将图片和对应的标注数据关联起来.

3) 显示 COCO 数据集中的具体类和超类

这部分数据信息保存在 json 文件的末尾部分, 具体参见上面提到的”图像检测数据集标注信息”部分的内容.

# display COCO categories and supercategories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))

nms = set([cat['supercategory'] for cat in cats])
print('COCO supercategories: \n{}'.format(' '.join(nms)))

输出信息如下:

COCO categories: 
person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush

COCO supercategories: 
outdoor food indoor appliance sports person animal vehicle furniture accessory electronic kitchen

其中, 重要的一个函数是: loadCats(self, ids=[]), 函数原型为:

def loadCats(self, ids=[]):
    """
    Load cats with the specified ids.
    :param ids (int array)       : integer ids specifying cats
    :return: cats (object array) : loaded cat objects
    """
    if _isArrayLike(ids):
        return [self.cats[id] for id in ids]
    elif type(ids) == int:
        return [self.cats[ids]]

函数返回的是从 json 文件加载进来的 80 类对象. 这个函数接收一个 id list 作为参数, 如果没有指定 id 参数, 那么函数返回也为一个空 list. 本例中使用 getCatIds(self, catNms=[], supNms=[], catIds=[]) 函数获取 id 作为参数, 其函数原型如下:

def getCatIds(self, catNms=[], supNms=[], catIds=[]):
    """
    filtering parameters. default skips that filter.
    :param catNms (str array)  : get cats for given cat names
    :param supNms (str array)  : get cats for given supercategory names
    :param catIds (int array)  : get cats for given cat ids
    :return: ids (int array)   : integer array of cat ids
    """
    catNms = catNms if _isArrayLike(catNms) else [catNms]
    supNms = supNms if _isArrayLike(supNms) else [supNms]
    catIds = catIds if _isArrayLike(catIds) else [catIds]

    if len(catNms) == len(supNms) == len(catIds) == 0:
        cats = self.dataset['categories']
    else:
        cats = self.dataset['categories']
        cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms]
        cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
        cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds]
    ids = [cat['id'] for cat in cats]

    return ids

对于该函数, 如果不指定参数, 则返回所有类的 id, 否则, 返回指定类的 id ( 类可以通过 ‘name’, ‘supercategory’ 或 ‘id’ 指定).

4) 加载并显示指定 id 的图片

下面的例子是加载并显示指定 id 的图片.

# get all images containing given categories, select one at random
catIds = coco.getCatIds(catNms=['person','dog','skateboard']);
imgIds = coco.getImgIds(catIds=catIds );
imgIds = coco.getImgIds(imgIds = [324158])
// loadImgs() 返回的是只有一个元素的列表, 使用[0]来访问这个元素
// 列表中的这个元素又是字典类型, 关键字有: ["license", "file_name", 
//  "coco_url", "height", "width", "date_captured", "id"]
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]

# 加载并显示图片,可以使用两种方式: 1) 加载本地图片, 2) 在线加载远程图片
# 1) 使用本地路径, 对应关键字 "file_name"
# I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))  

# 2) 使用 url, 对应关键字 "coco_url"
I = io.imread(img['coco_url'])        
plt.axis('off')
plt.imshow(I)
plt.show()

下面显示的就是加载进来的图片:

5) 加载并将 “segmentation” 标注信息显示在图片上

下面这段代码的作用是加载 “segmentation” 标注信息, 并将其显示在图片上.

# 加载并显示标注信息
plt.imshow(I); plt.axis('off')
annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
coco.showAnns(anns)

输出效果如下:

getAnnIds() 函数会根据 image id 来获取这张图对应的标注信息的 id, 然后 loadAnns() 函数会显示指定 id 的标注信息到图片上.

标注信息对应的图片已经显示在当前的 “figure” 对象上了, 那么标注信息是如何找到这个 “figure” 对象的?

答案是: loadAnns() 函数内会调用 plt.gca() 函数来获取当前 “figure” 对象的轴, 如果存在的话直接返回, 不存在的话会新建一个”figure” 对象, 并将其轴返回.

2. 人体关节点检测数据集标注信息

1) 加载并将 “keypoints” 标注信息显示在图片上

“keypoints” 标注信息存放在另外一个名为 “person_keypoints_xxx.json” 的文件中, 因此需要重新创建一个 coco 对象.

下面这段代码的作用就是为 “person_keypoints” 标注信息重新创建一个 coco 对象.

# 为 person keypoints 标注信息创建一个 coco 对象
annFile = '{}/annotations/person_keypoints_{}.json'.format(dataDir,dataType)
coco_kps=COCO(annFile)

和 “segmentation” 标注信息 coco 对象创建类似, 其输出信息如下:

loading annotations into memory...
Done (t=0.58s)
creating index...
index created!

下面这段代码的作用是加载 “keypoints” 标注信息, 并将其显示在图片上.

# load and display keypoints annotations
plt.imshow(I); plt.axis('off')
ax = plt.gca()
annIds = coco_kps.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
anns = coco_kps.loadAnns(annIds)
coco_kps.showAnns(anns)

输出效果如下:

3. 语义分析数据集标注信息

1) 加载并打印 “caption” 标注信息

“caption” 标注信息存放在另外一个名为 “caption_xxx.json” 的文件中, 因此需要重新创建一个 coco 对象.

下面这段代码的作用就是为 “caption” 标注信息重新创建一个 coco 对象.

# 为 caption 标注信息创建一个 coco 对象
annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType)
coco_caps=COCO(annFile)

和 “segmentation” 标注信息 coco 对象创建类似, 其输出信息如下:

loading annotations into memory...
Done (t=0.13s)
creating index...
index created!

下面这段代码的作用是加载 “keypoints” 标注信息并打印.

# 加载并打印 caption 标注信息
annIds = coco_caps.getAnnIds(imgIds=img['id']);
anns = coco_caps.loadAnns(annIds)
coco_caps.showAnns(anns)
plt.imshow(I); plt.axis('off'); plt.show()

输出 “caption” 信息如下:

)

A man is skate boarding down a path and a dog is running by his side.
A man on a skateboard with a dog outside. 
A person riding a skate board with a dog following beside.
This man is riding a skateboard behind a dog.
A man walking his dog on a quiet country road.

总结

熟悉了 COCO Python API 之后, 就可以研究其中的数据分布情况. 然后在满足 COCO 原有数据分布的基础上将自己的数据集定制到 COCO 数据集中.

参考资料

[1]. https://github.com/dengdan/coco