Microsoft COCO: Common Objects in Context - COCO API - MASK API

Microsoft COCO: Common Objects in Context - COCO API - MASK API

http://cocodataset.org/#home
http://cocodataset.org/#download
Home -> Dataset -> Download


1. COCO API
The COCO API assists in loading, parsing, and visualizing annotations in COCO. The API supports multiple annotation formats (please see the data format page). For additional details see: CocoApi.m, coco.py, and CocoApi.lua for Matlab, Python, and Lua code, respectively, and also the Python API demo.
https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb
COCO API 帮助加载,解析和可视化 COCO 中的标注。API支持多种标注格式,包括目标实例 (object instance),目标关键点 (object keypoint) 和图像标题 (image caption) 标注。

Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
getAnnIds Get ann ids that satisfy given filter conditions.
getCatIds Get cat ids that satisfy given filter conditions.
getImgIds Get img ids that satisfy given filter conditions.
loadAnns Load anns with the specified ids.
loadCats Load cats with the specified ids.
loadImgs Load imgs with the specified ids.
loadRes Load algorithm results and create API for accessing them.
showAnns Display the specified annotations.

getAnnIds 获取满足给定过滤条件的 ann ids
getCatIds 获取满足给定过滤条件的 cat ids
getImgIds 获取满足给定过滤条件的 img ids
loadAnns 使用指定的 ids 加载 anns
loadCats 使用指定的 ids 加载 cats
loadImgs 使用指定的 ids 加载 imgs
loadRes 加载算法结果并创建用于访问它们的API
showAnns 显示指定的标注

2. MASK API
COCO provides segmentation masks for every object instance. This creates two challenges: storing masks compactly and performing mask computations efficiently. We solve both challenges using a custom Run Length Encoding (RLE) scheme. The size of the RLE representation is proportional to the number of boundaries pixels of a mask and operations such as area, union, or intersection can be computed efficiently directly on the RLE. Specifically, assuming fairly simple shapes, the RLE representation is O(√n) where n is number of pixels in the object, and common computations are likewise O(√n). Naively computing the same operations on the decoded masks (stored as an array) would be O(n).
The MASK API provides an interface for manipulating masks stored in RLE format. The API is defined below, for additional details see: MaskApi.m, mask.py, or MaskApi.lua. Finally, we note that a majority of ground truth masks are stored as polygons (which are quite compact), these polygons are converted to RLE when needed.
COCO 为每个对象实例提供分割掩码。这产生了两个挑战:紧凑地存储掩码并有效地执行掩码计算。我们使用自定义运行长度编码 (RLE,Run Length Encoding) 方案来解决这两个挑战。RLE 表示的大小与掩码的边界像素的数量成比例,并且可以直接在 RLE 上有效地计算诸如面积,并集或交集 (area, union, or intersection) 的操作。具体而言,假设相当简单的形状,RLE 表示是 O(√n),其中 n 是对象中像素的数量,并且公共计算同样是 O(√n)。Naively 计算解码掩码 (存储为数组) 相同的操作将是 O(n)。

MASK API 提供了一个操作以 RLE 格式存储的掩码的接口。最后,我们注意到,大部分的实际真值 (ground truth)掩码被存储为多边形 (polygons) (非常紧凑),这些多边形在需要时被转换为RLE。


encode Encode binary masks using RLE.
decode Decode binary masks encoded via RLE.
merge Compute union or intersection of encoded masks.
iou Compute intersection over union between masks.
area Compute area of encoded masks.
toBbox Get bounding boxes surrounding encoded masks.
frBbox Convert bounding boxes to encoded masks.
frPoly Convert polygon to encoded mask.

encodeEncode 使用 RLE 编码二进制掩码
decodeDecode 解码通过 RLE 编码的二进制掩码
mergeCompute 计算编码掩码的并集或交集
iouCompute 计算掩码之间的交并比
areaCompute 计算编码掩码的面积
toBboxGet 获取围绕编码掩码的边界框
frBboxConvert 将边界框转换为编码的掩码
frPolyConvert 将多边形转换为编码掩码

wordbook
annotation,ann
category,cat
image,img

猜你喜欢

转载自blog.csdn.net/chengyq116/article/details/80489439