【COCO】coco数据集分析

coco数据集下载链接

训练集

http://images.cocodataset.org/zips/train2017.zip

http://images.cocodataset.org/annotations/annotations_trainval2017.zip

验证集

http://images.cocodataset.org/zips/val2017.zip

http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip

测试集

http://images.cocodataset.org/zips/test2017.zip

http://images.cocodataset.org/annotations/image_info_test2017.zip

COCO数据集是微软团队制作的一个数据集,通过这个数据集我们可以训练到神经网络对图像进行detection,classification,segmentation,captioning。具体介绍请祥见官网。

annotation格式介绍

annotainon 数据格式:

https://blog.csdn.net/zziahgf/article/details/72819043

https://blog.csdn.net/u013735511/article/details/79099483

直观查看:instances_train2014.json内容

https://blog.csdn.net/hehangjiang/article/details/79084794

Object Instance Annotations

Each instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names.

中文翻译如下: 每个实例注释包含一系列字段,这些字段有category id和segmentation mask。segementation字段的格式取决于实例是代表单个物体(具体来说iscrowd=0,这时候就会用到polygon,也就是多边形)还是目标的集合体(此时iscrowd=1, 会用到RLE,后面解释这个的意思)。注意到单个目标可能需要多个多边形来表示,例如在被遮挡的情况下。群体注释是用来标注目标的集合体(例如一群人)。除此之外,每个目标都会有一个封闭的外接矩形框来标记(矩形框的坐标从图像的左上角开始记录,没有索引)。最后,类别字段存储着category id到category和父级category名字的映射。

mask存储处理方式简单介绍

上面提到coco数据集使用了两种方式进行mask存储,一是polygon,一是RLE。polygon比较好理解,就是多边形嘛!RLE是什么呢?

简单点来讲,RLE是一种压缩方法,也是最容易想到的压缩方式。

举个例子:M = [0,0,0,1,1,1,1,1,1,0,0],则M的RLE编码为[3,6,2],当然这是针对二进制进行的编码,也是coco里面采用的。RLE远不止这样简单,我们这里并不着重讲RLE,请百度吧。

代码中注释说的

# RLE is a simple yet efficient format for storing binary masks. RLE
# first divides a vector (or vectorized image) into a series of piecewise
# constant regions and then for each piece simply stores the length of
# that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would
# be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1]
# (note that the odd counts are always the numbers of zeros). Instead of
# storing the counts directly, additional compression is achieved with a
# variable bitrate representation based on a common scheme called LEB128.

解释一下就是:RLE将一个二进制向量分成一系列固定长度的片段,对每个片段只存储那个片段的长度。例如M=[0 0 1 1 1 0 1], RLE就是[2 3 1 1];M=[1 1 1 1 1 1 0], RLE为[0 6 1],注意奇数位始终为0的个数。另外,也使用一个基于LEB128的通用方案的可变比特率来完成额外的压缩。 

猜你喜欢

转载自blog.csdn.net/qq_30159015/article/details/82900248