在自建的数据集上训练yolov3准备（一）：按照物体类别（单类&多类）快速准确下载Open Images V4数据集、保存成yolo3标注格式并用yolo_mark验证（附python脚本）。

前言

Open Images V4是谷歌在2018年开放的一个约900万（9M）图像的数据集，被分成训练集，验证集和测试集三个部分。使用图像级标签（image-level labels），目标边界框（object bounding boxes）和视觉关系（visual relationships）进行注释。图像级标签：有近两万个不同的类具有标签，有些为人工标注，有些为机器标注。目标边界框：含在约191万（1.91M）张图像上的，针对600个类别的约1544万（15.44M）个边界框，这使其成为具有目标位置注释（object location annotations）的最大现有数据集。有分类较精细（比如人相关的就有人头、人脸、胡须、人手等等）、比较贴近生活（从乐器、电脑、书到厨具等等）和边框准确（Google的专业注释人员手动绘制了训练集上90％的方框。其他方法半自动生成剩余的10％。这些框经过人工验证，IoU> 0.7，物体上有一个完美的盒子，实际上它们是准确的（平均IoU~0.82））等特点[1]。该数据集可以将许多计算机视觉任务加速数天甚至数月。例如，如果我们想要为单个或多个对象创建一个对象检测器，我们可以只下载这些类的图像以及它们的注释并开始我们的训练过程。（本文主要针对目标检测数据集下载）

准备

ubuntu
磁盘
python
awscli

结构

Open Images V4全部数据集约有560G，包括训练集、验证集和测试集。如图所示的库里包括图片文件（.zip）和注释文件（.csv）,如下表所示。

我们先来看看注释文件（.csv），表示其数据格式：

Class Names:

class-descriptions-boxable.csv - 数据集内部使用的类名到人类可理解名称的对应，例如/ m / 011k07对应Tortoise，/ m / 011q46kg对应Container，/ m / 012074对应Magpie等等。

/m/011k07	Tortoise
/m/011q46kg	Container
/m/012074	Magpie
/m/0120dh	Sea turtle
/m/01226z	Football

Boxes:

train-annotations-bbox.csv - 训练图像中对象实例的边框注释。
validation-annotations-bbox.csv - 验证图像中对象实例的边框注释。
test-annotations-bbox.csv - 测试图像中对象实例的边框注释。

这个类型的文件包含表格所示的几项，如下表所示。

ImageID	Source	LabelName	Confidence	XMin	XMax	YMin	YMax	IsOccluded	IsTruncated	IsGroupOf	IsDepiction	IsInside
000026e7ee790996	freeform	/m/07j7r	1	0.071905	0.145346	0.206591	0.391306	0	1	1	0	0
000026e7ee790996	freeform	/m/07j7r	1	0.439756	0.572466	0.264153	0.435122	0	1	1	0	0
000026e7ee790996	freeform	/m/07j7r	1	0.668455	1	0	0.552825	0	1	1	0	0
000062a39995e348	freeform	/m/015p6	1	0.205719	0.849912	0.154144	1	0	0	0	0	0
000062a39995e348	freeform	/m/05s2s	1	0.137133	0.377634	0	0.884185	1	1	0	0	0
0000c64e1253d68f	freeform	/m/07yv9	1	0	0.97385	0	0.043342	0	1	1	0	0
0000c64e1253d68f	freeform	/m/0k4j	1	0	0.513534	0.321356	0.689661	0	1	0	0	0

ImageID：这个框所在的图像。可以看到一个ID出现了一次或者多次，应该表示该图像上有多少个边框。
Source：表示边框是如何制作的：①freeform并且xclick是手动绘制的框。②activemil是使用方法的增强版本生成的框。
LabelName：框所属的对象类的MID，在class-descriptions-boxable.csv中有对应。
Confidence：虚拟值，始终为1。
XMin，XMax，YMin，YMax：框的坐标，在标准化图像坐标。XMin在[0,1]中，其中0是最左边的像素，1是图像中最右边的像素。Y坐标从顶部像素0到底部像素1。

最后的五项表示属性，对于它们中的每一个，值1表示存在、0表示不存在、-1表示未知。

IsOccluded：表示对象被图像中的另一个对象遮挡。
IsTruncated：表示对象超出图像边界。
IsGroupOf：表示该框横跨一组对象（例如，一张鲜花或一群人）。我们要求注释者将这个标签用于超过5个实例的情况，这些情况彼此严重阻塞并且是物理接触的。
IsDepiction：表示对象是描绘（例如，对象的卡通或绘图，而不是真实的物理实例）。
IsInside：表示从对象内部拍摄的照片（例如，汽车内部或建筑物内部）。
test-images.csv：表示测试图像路径。
validation-images.csv ：表示验证图像路径。

image_name	image_url
e0c995e9359596dd.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/e0c995e9359596dd.jpg
110487ec7e9be60a.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/110487ec7e9be60a.jpg
90596bf3313e72e3.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/90596bf3313e72e3.jpg
4b3c6afd44adbe59.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/4b3c6afd44adbe59.jpg
69248ebbbea5aa0c.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/69248ebbbea5aa0c.jpg

官网上还有关于Image Labels、Visual relationships等的描述，由于没有找到文件，此处就不做展开了。

由上述可知，直接在官网下载的数据集，是按照训练集、验证集和测试集来打包的。可能有的时候我们不需要那么多数据，或者储存不了那么多数据，又或是我们的任务只是要其中的几类，这时就需要我们按照类别来下载。以目标检测为例，我们知道有边界框的类别有600类，他们的名字列表如下所示。此处要感谢learn OpenCv的作者 SUNITA NAYAK为我们提供了这个表格。篇幅有限，此处只列了前一百个，需要知道全部请访问https://www.learnopencv.com/fast-image-downloader-for-open-images-v4/

class	train	validation	test
Person	1034721	13274	40861
Wheel	340639	11394	34604
Car	248075	9381	28737
Human hair	234057	8594	26301
Clothing	1438128	8527	26531
Human arm	208982	8341	25162
Human head	201633	7865	25080
Footwear	744474	7189	21205
Human body	175244	6769	20246
Man	1418594	5654	17514
Human face	1037710	5170	15536
Flower	345296	5089	15040
Mammal	156154	4349	13479
Human nose	60142	4341	12718
Human eye	77233	4304	13034
Tire	122615	4181	13177
Human hand	75307	4123	12505
Human leg	71479	4093	13334
Sports equipment	44900	3951	11992
Plant	267913	3808	11579
Tree	1051344	3209	10148
Auto part	13586	2898	8845
Woman	767337	2865	9047
Food	88422	2736	8331
Land vehicle	81108	2689	8480
Human mouth	44197	2505	7424
Girl	197155	2420	7479
Vehicle	50959	2105	7064
Dog	28675	1930	5818
Fruit	26236	1905	6215
Window	503467	1650	5091
Airplane	21285	1027	3272
Fashion accessory	91024	1026	3164
Baked goods	23010	1020	2907
Building	178634	984	2915
Bird	47921	943	2751
Boat	79113	903	2672
Human ear	17774	870	2611
Bicycle wheel	59521	733	2018
Table	85691	714	2279
Snack	37374	708	2173
Book	41280	698	2147
Furniture	38527	646	1893
Dessert	27407	645	2092
Boy	87555	600	2031
Dress	52999	567	1581
Fish	23195	564	1422
Vehicle registration plate	7852	512	1570
Chair	132483	511	1535
Vegetable	18621	496	1679
Fast food	24991	492	1599
Drink	40323	482	1427
Helmet	16502	440	1275
Toy	70963	437	1205
Bicycle	40161	403	1158
Jeans	78473	396	1433
Horse	13368	392	1144
Cat	15183	381	1095
Bottle	40188	340	979
Strawberry	7944	326	774
Cake	5784	326	878
Suit	110848	321	857
Houseplant	22834	319	825
Sports uniform	19396	315	1135
Truck	12135	311	969
Rose	12053	309	899
Dairy	8146	308	970
Flowerpot	22760	302	659
Roller skates	5476	295	723
Animal	17442	290	882
Tableware	41086	285	936
Bread	3846	277	911
Ball	6845	266	902
Glasses	57946	262	890
Palm tree	42026	253	620
Paddle	6951	253	699
House	136152	246	822
Seafood	3063	226	689
Sculpture	34533	221	653
Tomato	6254	216	722
Salad	3088	213	605
Insect	8981	210	717
Hat	13245	201	557
Carnivore	3501	200	625
Human foot	2237	199	467
Monkey	3026	195	543
Wine	15400	193	388
Shelf	22899	191	563
Cabinetry	9191	188	451
Aircraft	1898	186	556
Drawer	4414	184	448
Cookie	4158	184	636
Sandal	2938	181	393
Musical instrument	16503	178	525
Orange	6195	175	839
Juice	2838	174	512
Motorcycle	13382	173	530
Lemon	1756	171	425
Cattle	11603	170	450
Door	19256	165	524

有了这个表格，我们就可以根据自己需要的类别来进行下载，而且还知道此类别的图片数，比如我们要下载human head，那么通过脚本编写我们可以实现这个过程。而这个脚本 SUNITA NAYAK也已经实现好了，直接使用就可以。

下载

安装用于管理AWS服务的统一工具——AWS命令行界面（CLI）

sudo pip3 install awscli

下载boxes和class names的四各.csv文件（推荐直接点前面的链接，然后用迅雷，个人测试这样要快非常多）

wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
 
wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/validation/validation-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv

将四个文件与脚本放在一个文件夹下，并运行脚本。名字有两个单词的用下划线连接。此处可能会有一个错误，如果发生先

执行pip3 install tqdm再运行脚本。

python3 downloadOI.py --classes 'Cheese,Ice_cream,Cookie' --mode train

获取脚本程序downloadOI.py请访问作者github，在次感谢作者

https://github.com/spmallick/learnopencv/blob/master/downloadOpenImages/downloadOI.py

另外，还可以通过在命令行中将它们显式设置为0来添加可选参数以排除某些类型的图像。

--occluded = 0 以排除被遮挡的实例。

--truncated = 0 以排除在边界处截断的实例。

--groupOf = 0 以排除一起表示一组对象的实例。这些实例通常包括一组5个或更多同一物理接触或遮挡的物体，例如一袋苹果。

--depiction = 0 以排除草图或漫画的实例，而不是真实物理对象的图片。

--inside = 0 以排除从对象内部拍摄照片的实例，例如在汽车内部

python3 downloadOI.py --classes 'Cheese,Ice_cream,Cookie' --mode train --groupOf=0 --inside=0

结果

等........等..........等。已经一下午下了7170个文件，共49308个，估计还要下一天一夜。下载的文件包括此类目标的图片和其对应的txt文件，txt文件里是图片上所有此类目标框的坐标。脚本下载的txt文件默认的格式是class、Xmin、Xmax、Ymin、Ymax。从以下代码中得到体现：

 with open('%s/%s/%s.txt'%(run_mode,class_name,line_parts[0]),'a') as f:  
 f.write(','.join([class_name, line_parts[4], line_parts[5], line_parts[6], line_parts[7] ])+'\n')

为了适应Yolo的label结构，<object-class> <x_center> <y_center> <width> <height>我们需要做适当改变，在SUNITA NAYAK的另一篇文章中有体现，以下代码需要做适当调整才可使用。见附录

with open('labels/%s.txt'%(lineParts[0]),'a') as f:
f.write(' '.join([str(ind),str((float(lineParts[5]) + float(lineParts[4]))/2), str((float(lineParts[7]) + float(lineParts[6]))/2), str(float(lineParts[5])-float(lineParts[4])),str(float(lineParts[7])-float(lineParts[6]))])+'\n')

这是txt文件里的信息，妥妥的yolo格式：

2 0.7209375 0.5066665 0.556875 0.608333
5 0.146875 0.7058335 0.115 0.401667

导入yolo_mask标注工具，可以看到下载的标注格式是符合yolo的，成功。

参考

https://storage.googleapis.com/openimages/web/factsfigures.html

https://www.learnopencv.com/fast-image-downloader-for-open-images-v4/

https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/

https://blog.csdn.net/wulala789/article/details/80646618

附录（代码为引用、并修改）

#Author : Sunita Nayak, Big Vision LLC

#### Usage example: python3 downloadOI.py --classes 'Ice_cream,Cookie' --mode train

import argparse
import csv
import subprocess
import os
from tqdm import tqdm
import multiprocessing
from multiprocessing import Pool as thread_pool

cpu_count = multiprocessing.cpu_count()

parser = argparse.ArgumentParser(description='Download Class specific images from OpenImagesV4')
parser.add_argument("--mode", help="Dataset category - train, validation or test", required=True)
parser.add_argument("--classes", help="Names of object classes to be downloaded", required=True)
parser.add_argument("--nthreads", help="Number of threads to use", required=False, type=int, default=cpu_count*2)
parser.add_argument("--occluded", help="Include occluded images", required=False, type=int, default=1)
parser.add_argument("--truncated", help="Include truncated images", required=False, type=int, default=1)
parser.add_argument("--groupOf", help="Include groupOf images", required=False, type=int, default=1)
parser.add_argument("--depiction", help="Include depiction images", required=False, type=int, default=1)
parser.add_argument("--inside", help="Include inside images", required=False, type=int, default=1)

args = parser.parse_args()

run_mode = args.mode

threads = args.nthreads

classes = []
for class_name in args.classes.split(','):
    classes.append(class_name)

with open('./class-descriptions-boxable.csv', mode='r') as infile:
    reader = csv.reader(infile)
    dict_list = {rows[1]:rows[0] for rows in reader}

subprocess.run(['rm', '-rf', 'labels'])
subprocess.run([ 'mkdir', 'labels'])

subprocess.run(['rm', '-rf', 'JPEGImages'])
subprocess.run([ 'mkdir', 'JPEGImages'])

pool = thread_pool(threads)
commands = []
cnt = 0

for ind in range(0, len(classes)):
    
    class_name = classes[ind]
    print("Class "+str(ind) + " : " + class_name)
    
    subprocess.run([ 'mkdir', run_mode+'/'+class_name])

    command = "grep "+dict_list[class_name.replace('_', ' ')] + " ./" + run_mode + "-annotations-bbox.csv"
    class_annotations = subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode('utf-8')
    class_annotations = class_annotations.splitlines()

    for line in class_annotations:

        line_parts = line.split(',')
        
        #IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
        if (args.occluded==0 and int(line_parts[8])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.truncated==0 and int(line_parts[9])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.groupOf==0 and int(line_parts[10])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.depiction==0 and int(line_parts[11])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.inside==0 and int(line_parts[12])>0):
            print("Skipped %s",line_parts[0])
            continue

        cnt = cnt + 1

        command = 'aws s3 --no-sign-request --only-show-errors cp s3://open-images-dataset/'+run_mode+'/'+line_parts[0]+'.jpg '+ 'JPEGImages'+'/'+class_name+'/'+line_parts[0]+'.jpg'
        commands.append(command)
        

        with open('labels/%s.txt'%(line_parts[0]),'a') as f:
            f.write(' '.join([str(ind), str((float(line_parts[5]) + float(line_parts[4]))/2), str((float(line_parts[7]) + float(line_parts[6]))/2), str(float(line_parts[5])-float(line_parts[4])), str(float(line_parts[7])-float(line_parts[6]))])+'\n')

print("Annotation Count : "+str(cnt))
commands = list(set(commands))
print("Number of images to be downloaded : "+str(len(commands)))

list(tqdm(pool.imap(os.system, commands), total = len(commands) ))

pool.close()
pool.join()

意疏

发布了28 篇原创文章 · 获赞 34 · 访问量 2万+

私信关注

在自建的数据集上训练yolov3准备（一）：按照物体类别（单类&多类）快速准确下载Open Images V4数据集、保存成yolo3标注格式并用yolo_mark验证（附python脚本）。

前言

准备

结构

下载

结果

参考

附录（代码为引用、并修改）

猜你喜欢