[YOLO] yolov5 trains its own data set (2)

0 Preliminary Tutorials

1 Introduction

  In the previous tutorial above, the configuration method of the yolov5 development environment and the basic structure of the yolov5 project were roughly introduced. The next step is to train your own data set based on the yolov5 pre-training model. For those who just want to use the yolov5 tool, they still want to Those who want to study in depth the target recognition algorithm similar to yolov5 are the introductory operations that cannot be avoided. This article will briefly introduce this process based on the information I searched and my own experience.

2 Prepare the dataset

2.1 Dataset source

  Because the process of data labeling is too cumbersome and time-consuming, I personally think that you can try it with the public data set on the Internet first, and then try to label the data yourself and then train after you learn it. It’s just that the latter has an extra step of labeling, and the others are exactly the same. .
  The data set recommended here is the PASCAL VOC2012 data set, which is a data set dedicated to world-class computer vision competitions. It contains 20 common goals in life, and its data category distribution is shown in the figure below.
insert image description here

Official website download link

  After opening, as shown in the figure below, just click the first download.
insert image description here

2.2 Dataset Structure Introduction

  What you get after downloading is a compressed package, and its folder structure is as follows

├───Annotations       //标注文件
├───ImageSets         //图像数据
│   ├───Action        //人物动作
│   ├───Layout        //人各个部位数据集
│   ├───Main          //主要数据集(含训练集和测试集)
│   └───Segmentation  //语义分割训练和测试集
├───JPEGImages        //所有图片
├───SegmentationClass //语义分割类别
└───SegmentationObject//语义分割物体

Among them, Mainthe distribution of data sets under the folder is very regular:

insert image description here

Each target corresponds to three txt files. The one that ends with train is the training set, the one that ends with val is the test set, and the one that ends with trainval is the sum of the training set and the test set. Here we take the data set of this target as an example bottle:
insert image description here
Among them, the first column is JPEGImagesthe corresponding image file name under the folder, and the latter column indicates the target information: -1 means that there is no target; 1 means that the target is present; 0 means that the target is difficult to detect.

2.3 Label format conversion

  AnnotationsThe label information of all pictures is stored in the folder. The label file name and the picture file name are in one-to-one correspondence, and you can search according to the file name when using it.
  The tags in the VOC dataset are in xml format, but the yolov5 training uses the yolo format, so it needs to be converted first. Take one of these tags as an example:
insert image description here

For the target recognition task, there are mainly two labels that need to be used, namely <size>label and <object>label. The <size>label mainly describes the size information of the picture; <object>the label describes the target information in the picture, and an object label corresponds to a target, where the <name>target information <truncated>indicates whether the target is truncated (1 means truncated); <difficult>indicates whether the target is difficult to detect (1 means difficult to detect) ; <bndbox>Indicates the upper-left and lower-right coordinates of the target.

  And the label in yolo format is as follows
insert image description here

Respectively: [目标类别(一般用数字表示) x_center y_center width height], therefore, before training the data set, it is necessary to convert the label format, and this library is mainly used xmlto parse the xml file.

The sample code is as follows:

import xml.etree.ElementTree as ET
import os
import shutil
import tqdm

def convert(size, box):
    '''	@func: 将box的坐标转换为yolo需要的格式
        @para	size: 图片的尺寸, eg:[500, 200]
        @para	box: box的坐标, [xmin, xmax, ymin, ymax]
        @return: 转换后的yolo格式坐标[x_center, y_center, width, height]
    '''
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return x, y, w, h

def transfer(xmlfile:str, txtfile:str, classes:list[str]=['bottle']):
    '''	@func: 将xml文件转换为txt文件
        @para	xmlfile: xml文件的路径
        @para	txtfile: txt文件的路径
        @return: None
    '''
    in_file = open(xmlfile, encoding='utf-8')
    out_file = open(txtfile, 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes: continue
        num_id = classes.index(cls) #找到该类别的序号
        xmlbox = obj.find('bndbox')
        # 坐标转换
        box = [float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text),
               float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)]
        bb = convert((w, h), box)
        out_file.write(str(num_id) + ' ' + " ".join([str(a) for a in bb]) + '\n')


def construct(train_set, val_set, JPEG_Path, Ano_Path, dataset_path='./dataset', classes:list[str]=['bottle']):
    '''	@func: 将VOC数据集构建成yolo训练所需的数据集格式
        @para	train_set: 训练集的路径txt
        @para	val_set: 验证集的路径txt
        @para	JPEG_Path: VOC数据集图片文件的路径
        @para	Ano_Path: VOC数据集标注文件所在路径
        @para	dataset_path: 构造好的数据集所在路径,可以不存在, 默认为当前路径
        @para	classes: 类别列表
        @return: None
    '''
    os.makedirs(dataset_path, exist_ok=True)
    img_path = os.path.join(dataset_path, 'images'); os.makedirs(img_path, exist_ok=True)
    img_train_path = os.path.join(img_path, 'train'); os.makedirs(img_train_path, exist_ok=True)
    img_val_path = os.path.join(img_path, 'val'); os.makedirs(img_val_path, exist_ok=True)
    label_path = os.path.join(dataset_path, 'labels'); os.makedirs(label_path, exist_ok=True)
    label_train_path = os.path.join(label_path, 'train'); os.makedirs(label_train_path, exist_ok=True)
    label_val_path = os.path.join(label_path, 'val'); os.makedirs(label_val_path, exist_ok=True)

    with open(train_set, 'r') as f:
        print('Start converting train set...')
        for line in tqdm.tqdm(f.readlines()):
            num = line.strip().split()[1]
            filename = line.strip().split()[0]
            shutil.copy(os.path.join(JPEG_Path, filename + '.jpg'), img_train_path)
            if num == '-1':
                with open(os.path.join(label_train_path, filename + '.txt'), 'w') as f:
                    f.write("") # 写入空值
            else:
                transfer(os.path.join(Ano_Path, filename + '.xml'),
                         os.path.join(label_train_path, filename + '.txt'), classes)

    with open(val_set, 'r') as f:
        print('Start converting val set...')
        for line in tqdm.tqdm(f.readlines()):
            num = line.strip().split()[1]
            filename = line.strip().split()[0]
            shutil.copy(os.path.join(JPEG_Path, filename + '.jpg'), img_val_path)
            if num == '-1':
                with open(os.path.join(label_val_path, filename + '.txt'), 'w') as f:
                    f.write("")
            else:
                transfer(os.path.join(Ano_Path, filename + '.xml'),
                         os.path.join(label_val_path, filename + '.txt'), classes)

    print('Done!')

    # 写入yaml文件
    with open(os.path.join(dataset_path, 'dataset.yaml'), 'w') as f:
        f.write('path: {}\n'.format(dataset_path))
        f.write('train: images/train\n')
        f.write('val: images/val\n\n')
        f.write('nc: {}\n'.format(len(classes)))
        f.write('names: {}'.format(classes))

if __name__ ==  "__main__":
    dataset = r'C:\Users\Zeoy\Desktop\dataset' # 构造好的数据集所在路径
    train_set = r'C:\Users\Zeoy\Desktop\VOC2012\ImageSets\Main\bottle_train.txt'
    val_set = r'C:\Users\Zeoy\Desktop\VOC2012\ImageSets\Main\bottle_val.txt'
    JPEG_Path = r'C:\Users\Zeoy\Desktop\VOC2012\JPEGImages'
    Ano_Path = r'C:\Users\Zeoy\Desktop\VOC2012\Annotations'
    classes = ['bottle'] # 如果类别较多,可以用用numpy读取txt
    construct(train_set, val_set, JPEG_Path, Ano_Path, dataset, classes)

When using, pay attention to replace train_set, val_set, JPEG_Path, in the main part Ano_Path.

This code needs to be noted that when using yolov5 for model training, the parameters filled in are not the path of the data set, but the yaml file, and in the yaml file, only the path of the image is located, and the path of the file is not marked, as follows As shown in the figure:
insert image description here
This is because the folder path read by yolov5 by default is fixed, and the folder name is required to be consistent with the following:

├───images
│   ├───train
│   └───val
└───labels
    ├───train
    └───val

3 Training and training results

3.1 Training

  In the previous step, the data set and yaml file were obtained, and the next step is to use the data set for training to obtain the pt model. The file in the outermost folder is used here train.py, and the usage method can refer to the comment at the beginning of the file.
insert image description here

Enter directly on the command line:

python train.py --data "C:\Users\Zeoy\Desktop\dataset\dataset.yaml" --weights yolov5s.pt --img 640 --batch-size -1

Among them, --datathe parameter is the path where the newly constructed data set is located; --batch-sizethe parameter is the number of images to be read at one time, and if it is set to -1, the program will automatically allocate according to the video memory size of the graphics card.

After entering the following interface, you can wait patiently. The default epoch number is 100. If you think it is not appropriate, you can add parameters --epochs xxxxto set it.

insert image description here

3.2 Testing

  After the training is completed, run/train/exp?there is a weights folder under the folder, which is the weight model obtained by training, ending in pt, generally there are two: best.pt and last.pt, generally choose the former. The next step is to use detect.pythe file to read the model and process the test image.

  The method is similar to running train.py, first look at the comment section at the beginning of the file, and you will know how to use it
insert image description here

It can be seen that the input data for yolov5 testing can be pictures, videos, folders containing pictures or videos, or even cameras and network video streams, which is very convenient to use. The model format used can also be the result of various deep learning frameworks.

  Copy the path of the pt file, and then enter it again on the command line:

python detect.py --weights 'C:\Users\Zeoy\Desktop\Code\Python\yolov5-master\runs\train\exp19\weights\best.pt' --source 'C:\Users\Zeoy\Desktop\img.png'

--weightsThe parameter is the path of the pt file just trained, and --sourcethe parameter is the data source to be tested.

4 Data labeling

  The above is the whole process of training and testing based on the VOC dataset and using yolov5. As mentioned earlier, the difference between using an open source dataset or your own dataset is the process of labeling. Therefore, a brief introduction to the method of dataset labeling.

  At present, most methods of labeling image data on the Internet use labelImgthis tool, which is a third-party library of python, which can be installed directly using pip or conda:

pip install labelImg

  However, it is particularly important to note that this library does not seem to support the new version of python well enough, and there will often be a crash when running, and the internal code of the library needs to be modified. I am lazy, and here is the direct use of python3.8 to install (It happens that there is also python3.8 on the computer), and there is no problem in running it.

  labelImg is relatively simple to use, just refer to the figure below.

insert image description here

5 Follow-up Tutorials

Guess you like

Origin blog.csdn.net/ZHOU_YONG915/article/details/131136833
Recommended