Prepare yolov3 on the self-built data set (1): Download the Open Images V4 data set quickly and accurately according to the object type (single type & multi type), save it in yolo3 annotation format and verify with yolo_mark (with python script).

Foreword

Open Images V4 is a data set of about 9 million (9M) images that Google opened in 2018. It is divided into a training set, a verification set, and a test set. Annotate with image-level labels, object bounding boxes and visual relationships. Image-level labels: There are nearly 20,000 different categories with labels, some are manually labeled, and some are labeled by machines. Target bounding box: contained in about 1.91 million (1.91M) images, about 15.44 million (15.44M) bounding boxes for 600 categories, which makes it the largest occurrence with object location annotations There are data sets. There are fine classifications (such as human heads, faces, beards, hands, etc.), closer to life (from musical instruments, computers, books to kitchen utensils, etc.) and accurate borders (Google professional annotators manually draw 90% of the boxes on the training set. Other methods semi-automatically generate the remaining 10%. These boxes are manually verified, IoU> 0.7, there is a perfect box on the object, in fact they are accurate (average IoU ~ 0.82)), etc. Features [1]. This data set can accelerate many computer vision tasks for days or even months. For example, if we want to create an object detector for a single or multiple objects, we can download only these types of images and their annotations and start our training process. (This article is mainly for downloading target detection data sets)

ready

ubuntu
Disk
python
awscli

structure

Open Images V4 entire data set of about 560G, including the training set, validation and test sets. The library shown in the figure includes picture files (.zip) and annotation files (.csv), as shown in the following table.

Let's first take a look at the annotation file (.csv), indicating its data format:

Class Names:

class-descriptions-boxable.csv-Correspondence of the class names used in the data set to human-understandable names, such as / m / 011k07 for Tortoise, / m / 011q46kg for Container, / m / 012074 for Magpie, etc.

/m/011k07	Tortoise
/m/011q46kg	Container
/m/012074	Magpie
/m/0120dh	Sea turtle
/m/01226z	Football

Boxes:

train-annotations-bbox.csv -The border annotation of the object instance in the training image.
validation-annotations-bbox.csv -Validate the border annotation of the object instance in the image.
test-annotations-bbox.csv -Test the border annotation of the object instance in the image.

This type of file contains several items shown in the table, as shown in the following table.

ImageID	Source	LabelName	Confidence	XMin	XMax	YMin	YMax	IsOccluded	IsTruncated	IsGroupOf	IsDepiction	IsInside
000026e7ee790996	freeform	/m/07j7r	1	0.071905	0.145346	0.206591	0.391306	0	1	1	0	0
000026e7ee790996	freeform	/m/07j7r	1	0.439756	0.572466	0.264153	0.435122	0	1	1	0	0
000026e7ee790996	freeform	/m/07j7r	1	0.668455	1	0	0.552825	0	1	1	0	0
000062a39995e348	freeform	/ M / 015p6	1	0.205719	0.849912	0.154144	1	0	0	0	0	0
000062a39995e348	freeform	/m/05s2s	1	0.137133	0.377634	0	0.884185	1	1	0	0	0
0000c64e1253d68f	freeform	/m/07yv9	1	0	0.97385	0	0.043342	0	1	1	0	0
0000c64e1253d68f	freeform	/m/0k4j	1	0	0.513534	0.321356	0.689661	0	1	0	0	0

ImageID: The image where this box is located. You can see that an ID appears one or more times, which should indicate how many borders there are on the image.
Source: Indicates how the frame is made: ①freeform And it xclickis a frame drawn manually. ② activemilis the box generated by using the enhanced version of the method.
LabelName: MID of the object class to which the box belongs, corresponding in class-descriptions-boxable.csv .
Confidence: Dummy value, always 1.
XMin, XMax, YMin, YMax: Coordinates of the box, the normalized image coordinates. XMin is in [0,1], where 0 is the leftmost pixel and 1 is the rightmost pixel in the image. The Y coordinate goes from the top pixel 0 to the bottom pixel 1.

The last five items represent attributes, and for each of them, the value 1indicates presence, 0表示absence, and -1表示unknown.

IsOccluded: Indicates that the object is blocked by another object in the image.
IsTruncated: Indicates that the object exceeds the image boundary.
IsGroupOf: Indicates that the box spans a group of objects (for example, a piece of flowers or a group of people). We asked the annotator to use this label for more than 5 instances, which are severely blocked from each other and are in physical contact.
IsDepiction: Indicates that the object is a depiction (for example, a cartoon or drawing of the object, rather than a real physical instance).
IsInside: Represents a picture taken from inside the subject (for example, inside a car or inside a building).
test-images.csv : indicates the test image path.
validation-images.csv : indicates the verification image path.

image_name	image_url
e0c995e9359596dd.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/e0c995e9359596dd.jpg
110487ec7e9be60a.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/110487ec7e9be60a.jpg
90596bf3313e72e3.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/90596bf3313e72e3.jpg
4b3c6afd44adbe59.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/4b3c6afd44adbe59.jpg
69248ebbbea5aa0c.jpg	https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/validation/69248ebbbea5aa0c.jpg

There are also descriptions of Image Labels, Visual relationships, etc. on the official website . Since no files were found, they will not be expanded here.

As can be seen from the above, the data set downloaded directly on the official website is packaged according to the training set, verification set and test set. There may be times when we do n’t need that much data, or we ca n’t store so much data, or our task just requires a few of them, then we need to download them by category. Taking target detection as an example, we know that there are 600 categories with bounding boxes, and their name list is shown below. Here to thank the author learn OpenCv of SUNITA NAYAK provide us with this form . The space is limited. Only the top 100 are listed here. To know all, please visit https://www.learnopencv.com/fast-image-downloader-for-open-images-v4/

class	train	validation	test
Person	1034721	13274	40861
Wheel	340639	11394	34604
Car	248075	9381	28737
Human hair	234057	8594	26301
Clothing	1438128	8527	26531
Human arm	208982	8341	25162
Human head	201633	7865	25080
Footwear	744474	7189	21205
Human body	175244	6769	20246
Man	1418594	5654	17514
Human face	1037710	5170	15536
Flower	345296	5089	15040
Mammal	156154	4349	13479
Human nose	60142	4341	12718
Human eye	77233	4304	13034
Tire	122615	4181	13177
Human hand	75307	4123	12505
Human leg	71479	4093	13334
Sports equipment	44900	3951	11992
Plant	267913	3808	11579
Tree	1051344	3209	10148
Auto part	13586	2898	8845
Woman	767337	2865	9047
Food	88422	2736	8331
Land vehicle	81108	2689	8480
Human mouth	44197	2505	7424
Girl	197155	2420	7479
Vehicle	50959	2105	7064
Dog	28675	1930	5818
Fruit	26236	1905	6215
Window	503467	1650	5091
Airplane	21285	1027	3272
Fashion accessory	91024	1026	3164
Baked goods	23010	1020	2907
Building	178634	984	2915
Bird	47921	943	2751
Boat	79113	903	2672
Human ear	17774	870	2611
Bicycle wheel	59521	733	2018
Table	85691	714	2279
Snack	37374	708	2173
Book	41280	698	2147
Furniture	38527	646	1893
Dessert	27407	645	2092
Boy	87555	600	2031
Dress	52999	567	1581
Fish	23195	564	1422
Vehicle registration plate	7852	512	1570
Chair	132483	511	1535
Vegetable	18621	496	1679
Fast food	24991	492	1599
Drink	40323	482	1427
Helmet	16502	440	1275
Toy	70963	437	1205
Bicycle	40161	403	1158
Jeans	78473	396	1433
Horse	13368	392	1144
Cat	15183	381	1095
Bottle	40188	340	979
Strawberry	7944	326	774
Cake	5784	326	878
Suit	110848	321	857
Houseplant	22834	319	825
Sports uniform	19396	315	1135
Truck	12135	311	969
Rose	12053	309	899
Dairy	8146	308	970
Flowerpot	22760	302	659
Roller skates	5476	295	723
Animal	17442	290	882
Tableware	41086	285	936
Bread	3846	277	911
Ball	6845	266	902
Glasses	57946	262	890
Palm tree	42026	253	620
Paddle	6951	253	699
House	136152	246	822
Seafood	3063	226	689
Sculpture	34533	221	653
Tomato	6254	216	722
Salad	3088	213	605
Insect	8981	210	717
Hat	13245	201	557
Carnivore	3501	200	625
Human foot	2237	199	467
Monkey	3026	195	543
Wine	15400	193	388
Shelf	22899	191	563
Cabinetry	9191	188	451
Aircraft	1898	186	556
Drawer	4414	184	448
Cookie	4158	184	636
Sandal	2938	181	393
Musical instrument	16503	178	525
Orange	6195	175	839
Juice	2838	174	512
Motorcycle	13382	173	530
Lemon	1756	171	425
Cattle	11603	170	450
Door	19256	165	524

有了这个表格，我们就可以根据自己需要的类别来进行下载，而且还知道此类别的图片数，比如我们要下载human head，那么通过脚本编写我们可以实现这个过程。而这个脚本 SUNITA NAYAK也已经实现好了，直接使用就可以。

下载

安装用于管理AWS服务的统一工具——AWS命令行界面（CLI）

sudo pip3 install awscli

下载boxes和class names的四各.csv文件（推荐直接点前面的链接，然后用迅雷，个人测试这样要快非常多）

wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
 
wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/validation/validation-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv

将四个文件与脚本放在一个文件夹下，并运行脚本。名字有两个单词的用下划线连接。此处可能会有一个错误，如果发生先

执行pip3 install tqdm再运行脚本。

python3 downloadOI.py --classes 'Cheese,Ice_cream,Cookie' --mode train

获取脚本程序downloadOI.py请访问作者github，在次感谢作者

https://github.com/spmallick/learnopencv/blob/master/downloadOpenImages/downloadOI.py

另外，还可以通过在命令行中将它们显式设置为0来添加可选参数以排除某些类型的图像。

--occluded = 0 以排除被遮挡的实例。

--truncated = 0 以排除在边界处截断的实例。

--groupOf = 0 以排除一起表示一组对象的实例。这些实例通常包括一组5个或更多同一物理接触或遮挡的物体，例如一袋苹果。

--depiction = 0 以排除草图或漫画的实例，而不是真实物理对象的图片。

--inside = 0 以排除从对象内部拍摄照片的实例，例如在汽车内部

python3 downloadOI.py --classes 'Cheese,Ice_cream,Cookie' --mode train --groupOf=0 --inside=0

结果

等........等..........等。已经一下午下了7170个文件，共49308个，估计还要下一天一夜。下载的文件包括此类目标的图片和其对应的txt文件，txt文件里是图片上所有此类目标框的坐标。脚本下载的txt文件默认的格式是class、Xmin、Xmax、Ymin、Ymax。从以下代码中得到体现：

 with open('%s/%s/%s.txt'%(run_mode,class_name,line_parts[0]),'a') as f:  
 f.write(','.join([class_name, line_parts[4], line_parts[5], line_parts[6], line_parts[7] ])+'\n')

为了适应Yolo的label结构，<object-class> <x_center> <y_center> <width> <height>我们需要做适当改变，在SUNITA NAYAK的另一篇文章中有体现，以下代码需要做适当调整才可使用。见附录

with open('labels/%s.txt'%(lineParts[0]),'a') as f:
f.write(' '.join([str(ind),str((float(lineParts[5]) + float(lineParts[4]))/2), str((float(lineParts[7]) + float(lineParts[6]))/2), str(float(lineParts[5])-float(lineParts[4])),str(float(lineParts[7])-float(lineParts[6]))])+'\n')

这是txt文件里的信息，妥妥的yolo格式：

2 0.7209375 0.5066665 0.556875 0.608333
5 0.146875 0.7058335 0.115 0.401667

导入yolo_mask标注工具，可以看到下载的标注格式是符合yolo的，成功。

参考

https://storage.googleapis.com/openimages/web/factsfigures.html

https://www.learnopencv.com/fast-image-downloader-for-open-images-v4/

https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/

https://blog.csdn.net/wulala789/article/details/80646618

附录（代码为引用、并修改）

#Author : Sunita Nayak, Big Vision LLC

#### Usage example: python3 downloadOI.py --classes 'Ice_cream,Cookie' --mode train

import argparse
import csv
import subprocess
import os
from tqdm import tqdm
import multiprocessing
from multiprocessing import Pool as thread_pool

cpu_count = multiprocessing.cpu_count()

parser = argparse.ArgumentParser(description='Download Class specific images from OpenImagesV4')
parser.add_argument("--mode", help="Dataset category - train, validation or test", required=True)
parser.add_argument("--classes", help="Names of object classes to be downloaded", required=True)
parser.add_argument("--nthreads", help="Number of threads to use", required=False, type=int, default=cpu_count*2)
parser.add_argument("--occluded", help="Include occluded images", required=False, type=int, default=1)
parser.add_argument("--truncated", help="Include truncated images", required=False, type=int, default=1)
parser.add_argument("--groupOf", help="Include groupOf images", required=False, type=int, default=1)
parser.add_argument("--depiction", help="Include depiction images", required=False, type=int, default=1)
parser.add_argument("--inside", help="Include inside images", required=False, type=int, default=1)

args = parser.parse_args()

run_mode = args.mode

threads = args.nthreads

classes = []
for class_name in args.classes.split(','):
    classes.append(class_name)

with open('./class-descriptions-boxable.csv', mode='r') as infile:
    reader = csv.reader(infile)
    dict_list = {rows[1]:rows[0] for rows in reader}

subprocess.run(['rm', '-rf', 'labels'])
subprocess.run([ 'mkdir', 'labels'])

subprocess.run(['rm', '-rf', 'JPEGImages'])
subprocess.run([ 'mkdir', 'JPEGImages'])

pool = thread_pool(threads)
commands = []
cnt = 0

for ind in range(0, len(classes)):
    
    class_name = classes[ind]
    print("Class "+str(ind) + " : " + class_name)
    
    subprocess.run([ 'mkdir', run_mode+'/'+class_name])

    command = "grep "+dict_list[class_name.replace('_', ' ')] + " ./" + run_mode + "-annotations-bbox.csv"
    class_annotations = subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode('utf-8')
    class_annotations = class_annotations.splitlines()

    for line in class_annotations:

        line_parts = line.split(',')
        
        #IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
        if (args.occluded==0 and int(line_parts[8])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.truncated==0 and int(line_parts[9])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.groupOf==0 and int(line_parts[10])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.depiction==0 and int(line_parts[11])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.inside==0 and int(line_parts[12])>0):
            print("Skipped %s",line_parts[0])
            continue

        cnt = cnt + 1

        command = 'aws s3 --no-sign-request --only-show-errors cp s3://open-images-dataset/'+run_mode+'/'+line_parts[0]+'.jpg '+ 'JPEGImages'+'/'+class_name+'/'+line_parts[0]+'.jpg'
        commands.append(command)
        

        with open('labels/%s.txt'%(line_parts[0]),'a') as f:
            f.write(' '.join([str(ind), str((float(line_parts[5]) + float(line_parts[4]))/2), str((float(line_parts[7]) + float(line_parts[6]))/2), str(float(line_parts[5])-float(line_parts[4])), str(float(line_parts[7])-float(line_parts[6]))])+'\n')

print("Annotation Count : "+str(cnt))
commands = list(set(commands))
print("Number of images to be downloaded : "+str(len(commands)))

list(tqdm(pool.imap(os.system, commands), total = len(commands) ))

pool.close()
pool.join()

意疏

发布了28 篇原创文章 · 获赞 34 · 访问量 2万+

私信关注