Faster-RCNN series (1) Make your own data set in Pascal_VOC format

1. Data set preparation

When we are doing target recognition, everyone knows that there must be a training set, a validation set, and a test set. Of course, there are many data set formats. Here we use the pascal_voc data format. The following figure is the voc2007 data set folder format

  • JPEGImages: Used to save your data pictures. Of course, for faster-rcnn, all pictures must be in jpg/jpeg format, and other formats must be converted. In addition, the pictures must be numbered, generally in accordance with the voc data set format, using six-digit encoding, such as 000001.jpg, 000002.jpg, etc.
  • Annotations: Here is to store your annotations for all data pictures. The annotation information of each photo must be in xml format.
  • ImageSets: There is a main file under this file, and there are four txt files under the main file, which are train.txt, test.txt, trainval.txt, and val.txt. They are all stored image numbers. Of course, we only Focus on training, so put all the picture numbers used for training into train.txt, one number per line.

I need to explain here that if you want to train a better model, the amount of data set must be large. I have used a training data set of more than 1,000 sheets before, and after my own verification, I cannot train a good classifier model. Of course, you want to make your own data sets and make annotations. This is also a very big project. For details on how to make annotations, you can refer to other blogs.
Of course, you only need to modify these three files in the voc2007 data set, so that some more cumbersome steps can be avoided.

1. Pictures required for training;

Picture rename
VOC2007 format must be JPG format, and the picture is a unified six-digit number, starting from 000001. Then we also need to rename all training images to this. The Python code is as follows:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
# 批量重命名文件

class ImageRename():
    def __init__(self):
        self.path = '/Users/douglaswang/workspace/2019-02/train/train'

    def rename(self):
        filelist = os.listdir(self.path)
        total_num = len(filelist)

        i = 0
        for item in filelist:
            if item.endswith('.jpg'):
                src = os.path.join(os.path.abspath(self.path), item)
                dst = os.path.join(os.path.abspath(self.path), '0000' + format(str(i), '0>3s') + '.jpg')
                os.rename(src, dst)
                os.rename(quad_src, quad_dst)
                print 'converting %s to %s ...' % (src, dst)
                i = i + 1
        print 'total %d to rename & converted %d jpgs' % (total_num, i)

if __name__ == '__main__':
    newname = ImageRename()
    newname.rename()

2. XML file of ROI label information on the picture;

Step 2: Frame the ROI

For the image that needs to be trained, we need to output the ROI. As shown in the figure below, the position information contains four values, namely the X and Y values ​​in the upper left corner of the ROI and the X and Y values ​​in the lower right corner of the ROI. The question is, how to quickly extract ROI for a large-scale data set? I searched the Internet for a long time and found that Matlab2014 already has this function.
It should be noted here that the 4 values ​​generated by the ROI generated by matlab boxing are the X and Y values ​​in the upper left corner of the ROI and the width and height of the ROI. It is recommended to use Eexcel to handle it.

  • pos-all.txt is as follows
# 文件名 xmin ymin xmax ymax
1.jpg 132 1769 808 2193
2.jpg 132 1769 808 2193
3.jpg 132 1769 808 2193
4.jpg 132 1769 808 2193
5.jpg 132 1769 808 2193
6.jpg 132 1769 808 2193
……

Step 3: Generate XML file

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import sys
import os
import codecs
import cv2
reload(sys)
sys.setdefaultencoding('utf8')

path= "/Users/douglaswang/workspace/2019-02/Faster-RCNN_TF/data/VOCdevkit2007/VOC2007/JPEGImages"
root = r'/Users/douglaswang/workspace/2019-02/train'
fp = open('pos-all.txt')
fp2 = open('train.txt', 'w')
uavinfo = fp.readlines()
 
for i in range(len(uavinfo)):
    line = uavinfo[i]
    line = line.strip().split(' ') 
    line[0] = "/Users/douglaswang/workspace/2019-02/train/jpg/"+str(line[0])  
    img = cv2.imread(line[0])
    print line[0]
    sp = img.shape
    height = sp[0]
    width = sp[1]
    depth = sp[2]
    info1 = line[0].split('/')[-1]
    info2 = info1.split('.')[0]
 
    l_pos1 = line[1]
    l_pos2 = line[2]
    r_pos1 = line[3]
    r_pos2 = line[4]
    fp2.writelines(info2 + '\n')
    with codecs.open(root +r'/xml/'+ info2 + '.xml', 'w', 'utf-8') as xml:
        xml.write('<?xml version="1.0" encoding="UTF-8"?>\n')
        xml.write('<annotation>\n')
        xml.write('\t<folder>' + 'voc2007' + '</folder>\n')
        xml.write('\t<filename>' + info1 + '</filename>\n')
        xml.write('\t<path>' + path+"/"+info1 + '</path>\n')
        xml.write('\t<source>\n')
        xml.write('\t\t<database>The UAV autolanding</database>\n')
        xml.write('\t</source>\n')
        xml.write('\t<size>\n')
        xml.write('\t\t<width>'+ str(width) + '</width>\n')
        xml.write('\t\t<height>'+ str(height) + '</height>\n')
        xml.write('\t\t<depth>' + str(depth) + '</depth>\n')
        xml.write('\t</size>\n')
        xml.write('\t\t<segmented>0</segmented>\n')
        xml.write('\t<object>\n')
        xml.write('\t\t<name>cat</name>\n')
        xml.write('\t\t<pose>Unspecified</pose>\n')
        xml.write('\t\t<truncated>0</truncated>\n')
        xml.write('\t\t<difficult>0</difficult>\n')
        xml.write('\t\t<bndbox>\n')
        xml.write('\t\t\t<xmin>' + l_pos1 + '</xmin>\n')
        xml.write('\t\t\t<ymin>' + l_pos2 + '</ymin>\n')
        xml.write('\t\t\t<xmax>' + r_pos1 + '</xmax>\n')
        xml.write('\t\t\t<ymax>' + r_pos2 + '</ymax>\n')
        xml.write('\t\t</bndbox>\n')
        xml.write('\t</object>\n')
        xml.write('</annotation>')
fp2.close()
  • The details of the generated xml are as follows:
    注: object->name,cat为我们的label
<?xml version="1.0" encoding="UTF-8"?>
<annotation>
	<folder>voc2007</folder>
	<filename>1.jpg</filename>
	<path>/Users/douglaswang/workspace/2019-02/Faster-RCNN_TF/data/VOCdevkit2007/VOC2007/JPEGImages/1.jpg</path>
	<source>
		<database>The UAV autolanding</database>
	</source>
	<size>
		<width>1668</width>
		<height>2361</height>
		<depth>3</depth>
	</size>
		<segmented>0</segmented>
	<object>
		<name>bill</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>132</xmin>
			<ymin>1769</ymin>
			<xmax>808</xmax>
			<ymax>2193</ymax>
		</bndbox>
	</object>
</annotation>

3. Divide the data set into three parts for frcnn for training, verification, and testing.

Step 4: Data set segmentation

Create a .py file in your own VOC2007 folder, run the following program, you can modify the two parameters trainval_percent and train_percent to adjust the number of images used for training and testing

#coding=utf-8
import cv2
import os
import random
 
root = '/Users/douglaswang/workspace/2019-02/train/Main'
fp = open(root + '/'+'name_list.txt')
fp_trainval = open(root + '/'+'trainval.txt', 'w')
fp_test = open(root + '/'+'test.txt', 'w')
fp_train = open(root + '/'+'train.txt', 'w')
fp_val = open(root + '/'+'val.txt', 'w')
 
filenames = fp.readlines()
for i in range(len(filenames)):
    pic_name = filenames[i]
    pic_name = pic_name.strip()
    x = random.uniform(0, 1)
    pic_info = pic_name.split('.')[0]
 
    if x >= 0.5:
        fp_trainval.writelines(pic_info + '\n')
 
    else:
        fp_test.writelines(pic_info + '\n')
 
fp_trainval.close()
fp_test.close()
 
 
 
fp = open(root + '/' +'trainval.txt')
filenames = fp.readlines()
for i in range(len(filenames)):
    pic_name = filenames[i]
    pic_name = pic_name.strip()
    pic_info = pic_name.split('.')[0]
    x = random.uniform(0, 1)
    if x >= 0.5:
        fp_train.writelines(pic_info + '\n')
    else:
        fp_val.writelines(pic_info + '\n')
fp_train.close()

Step 5: Place the obtained file in the correct path

The specific path of the pre-training model and data set is placed in:

- Faster-RCNN_TF
    - data
        - VOCdevkit2007
            - VOC2007
                - JPEGImages
                - Annotations
                - ImageSets
        - pretrain_model
            - VGG_imagenet.npy

End

Guess you like

Origin blog.csdn.net/sage_wang/article/details/87614485