Use YOLOv5 to train your own target detection dataset (take mask detection as an example)

        To run YOLOv5, you first need to install the deep learning environment. For the tutorial, please refer to Installing the pytorch Deep Learning Environment (GPU Version) .

The code of YOLOv5 is open source GitHub - ultralytics/yolov5         on GitHub . Using its code to realize your own target detection needs requires three steps: 1. Prepare the data set; 2. Configure the code parameters and train the model; 3. Predict. The following author will take you step by step to realize your own target detection model training.

1. Prepare the dataset

1.1 Collect pictures

        We collect relevant pictures according to our own needs, and here we take mask recognition as an example. We collected some pictures from the Internet with and without masks, as shown below:                

1.2 Use labelimg software to label the collected pictures

1.2.1 Installation of labelimg software

        labelimg software is an open source data labeling tool that can label three formats. ① XML file in VOC tag format . ② txt file in yolo label format. ③createML tag format json file.

        The installation of labelimg is very simple. We open cmd and enter the following command:

pip install labelimg -i https://pypi.tuna.tsinghua.edu.cn/simple

1.2.2 Labeling with labelimg software 

        First of all, we might as well create a folder named VOC2007, and create a folder named JPEGImages in it to store the pictures we collected that need to be labeled; then create a folder named Annotations to store the labeled label files ;Finally create a txt file named predefined_classes.txt to store the class name to be marked. The structure is shown in the figure below:

        What we want to achieve here is to detect whether a mask is worn, so there are only 2 categories in the predefined_classes.txt file, as shown in the following figure:

                                   

        Then, we need to open cmd in the directory of VOC2007 (it must be in this directory), and enter the following command: 

labelimg JPEGImages predefined_classes.txt

         This command means to use the labelimg software to label the pictures in the JPEGImages folder according to the categories in the predefined_classes.txt file.

        The opened interface is shown in the figure below, where

        Open Dir is the folder to choose to store pictures, and here our command defaults it to the JPEGImages folder;

        Change Save Dir is the folder to change the storage label, here we default to the Annotations folder;

        PascalVOC is the choice of label format. As mentioned above, there are three main types. We usually choose the xml format of PascalVOC, and the YOLO format is also fine, and the two can be converted to each other;

        Create RectBox is to generate a labeled cross position line to mark the picture.

        After selecting the target detection position, a label selection box will appear, and we can select the corresponding label, as shown in the figure below. Then you can click Next Image to mark the next picture until all the pictures are marked.

        The two label formats are shown in the figure below:

        The xml format of PascalVOC:

                                           

        The txt format of YOLO:

                                     

1.3 Transformation of label format and division of training set and verification set

1.3.1 Convert xml format tags to txt format, and divide training set (80%) and verification set (20%)

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile

classes = ["unmask", "mask"]

TRAIN_RATIO = 80        %训练集的比例


def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)


def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def convert_annotation(image_id):
    in_file = open('VOCdevkit/VOC2007/Annotations/%s.xml' % image_id)
    out_file = open('VOCdevkit/VOC2007/YOLOLabels/%s.txt' % image_id, 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()


wd = os.getcwd()
wd = os.getcwd()
data_base_dir = os.path.join(wd, "VOCdevkit/")
if not os.path.isdir(data_base_dir):
    os.mkdir(data_base_dir)
work_sapce_dir = os.path.join(data_base_dir, "VOC2007/")
if not os.path.isdir(work_sapce_dir):
    os.mkdir(work_sapce_dir)
annotation_dir = os.path.join(work_sapce_dir, "Annotations/")
if not os.path.isdir(annotation_dir):
    os.mkdir(annotation_dir)
clear_hidden_files(annotation_dir)
image_dir = os.path.join(work_sapce_dir, "JPEGImages/")
if not os.path.isdir(image_dir):
    os.mkdir(image_dir)
clear_hidden_files(image_dir)
yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/")
if not os.path.isdir(yolo_labels_dir):
    os.mkdir(yolo_labels_dir)
clear_hidden_files(yolo_labels_dir)
yolov5_images_dir = os.path.join(data_base_dir, "images/")
if not os.path.isdir(yolov5_images_dir):
    os.mkdir(yolov5_images_dir)
clear_hidden_files(yolov5_images_dir)
yolov5_labels_dir = os.path.join(data_base_dir, "labels/")
if not os.path.isdir(yolov5_labels_dir):
    os.mkdir(yolov5_labels_dir)
clear_hidden_files(yolov5_labels_dir)
yolov5_images_train_dir = os.path.join(yolov5_images_dir, "train/")
if not os.path.isdir(yolov5_images_train_dir):
    os.mkdir(yolov5_images_train_dir)
clear_hidden_files(yolov5_images_train_dir)
yolov5_images_test_dir = os.path.join(yolov5_images_dir, "val/")
if not os.path.isdir(yolov5_images_test_dir):
    os.mkdir(yolov5_images_test_dir)
clear_hidden_files(yolov5_images_test_dir)
yolov5_labels_train_dir = os.path.join(yolov5_labels_dir, "train/")
if not os.path.isdir(yolov5_labels_train_dir):
    os.mkdir(yolov5_labels_train_dir)
clear_hidden_files(yolov5_labels_train_dir)
yolov5_labels_test_dir = os.path.join(yolov5_labels_dir, "val/")
if not os.path.isdir(yolov5_labels_test_dir):
    os.mkdir(yolov5_labels_test_dir)
clear_hidden_files(yolov5_labels_test_dir)

train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'w')
train_file.close()
test_file.close()
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'a')
list_imgs = os.listdir(image_dir)  # list image files
prob = random.randint(1, 100)
print("Probability: %d" % prob)
for i in range(0, len(list_imgs)):
    path = os.path.join(image_dir, list_imgs[i])
    if os.path.isfile(path):
        image_path = image_dir + list_imgs[i]
        voc_path = list_imgs[i]
        (nameWithoutExtention, extention) = os.path.splitext(os.path.basename(image_path))
        (voc_nameWithoutExtention, voc_extention) = os.path.splitext(os.path.basename(voc_path))
        annotation_name = nameWithoutExtention + '.xml'
        annotation_path = os.path.join(annotation_dir, annotation_name)
        label_name = nameWithoutExtention + '.txt'
        label_path = os.path.join(yolo_labels_dir, label_name)
    prob = random.randint(1, 100)
    print("Probability: %d" % prob)
    if (prob < TRAIN_RATIO):  # train dataset
        if os.path.exists(annotation_path):
            train_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention)  # convert label
            copyfile(image_path, yolov5_images_train_dir + voc_path)
            copyfile(label_path, yolov5_labels_train_dir + label_name)
    else:  # test dataset
        if os.path.exists(annotation_path):
            test_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention)  # convert label
            copyfile(image_path, yolov5_images_test_dir + voc_path)
            copyfile(label_path, yolov5_labels_test_dir + label_name)
train_file.close()
test_file.close()

 1.3.2 Convert the txt tag format to xml format, and then use the method of 1.3.1 to divide the data set

from xml.dom.minidom import Document
import os
import cv2


# def makexml(txtPath, xmlPath, picPath):  # txt所在文件夹路径,xml文件保存路径,图片所在文件夹路径
def makexml(picPath, txtPath, xmlPath):  # txt所在文件夹路径,xml文件保存路径,图片所在文件夹路径
    """此函数用于将yolo格式txt标注文件转换为voc格式xml标注文件
    在自己的标注图片文件夹下建三个子文件夹,分别命名为picture、txt、xml
    """
    dic = {'0': "unmask",  # 创建字典用来对类型进行转换
           '1': "mask",  # 此处的字典要与自己的classes.txt文件中的类对应,且顺序要一致
           }
    files = os.listdir(txtPath)
    for i, name in enumerate(files):
        xmlBuilder = Document()
        annotation = xmlBuilder.createElement("annotation")  # 创建annotation标签
        xmlBuilder.appendChild(annotation)
        txtFile = open(txtPath + name)
        txtList = txtFile.readlines()
        img = cv2.imread(picPath + name[0:-4] + ".jpg")
        Pheight, Pwidth, Pdepth = img.shape

        folder = xmlBuilder.createElement("folder")  # folder标签
        foldercontent = xmlBuilder.createTextNode("driving_annotation_dataset")
        folder.appendChild(foldercontent)
        annotation.appendChild(folder)  # folder标签结束

        filename = xmlBuilder.createElement("filename")  # filename标签
        filenamecontent = xmlBuilder.createTextNode(name[0:-4] + ".jpg")
        filename.appendChild(filenamecontent)
        annotation.appendChild(filename)  # filename标签结束

        size = xmlBuilder.createElement("size")  # size标签
        width = xmlBuilder.createElement("width")  # size子标签width
        widthcontent = xmlBuilder.createTextNode(str(Pwidth))
        width.appendChild(widthcontent)
        size.appendChild(width)  # size子标签width结束

        height = xmlBuilder.createElement("height")  # size子标签height
        heightcontent = xmlBuilder.createTextNode(str(Pheight))
        height.appendChild(heightcontent)
        size.appendChild(height)  # size子标签height结束

        depth = xmlBuilder.createElement("depth")  # size子标签depth
        depthcontent = xmlBuilder.createTextNode(str(Pdepth))
        depth.appendChild(depthcontent)
        size.appendChild(depth)  # size子标签depth结束

        annotation.appendChild(size)  # size标签结束

        for j in txtList:
            oneline = j.strip().split(" ")
            object = xmlBuilder.createElement("object")  # object 标签
            picname = xmlBuilder.createElement("name")  # name标签
            namecontent = xmlBuilder.createTextNode(dic[oneline[0]])
            picname.appendChild(namecontent)
            object.appendChild(picname)  # name标签结束

            pose = xmlBuilder.createElement("pose")  # pose标签
            posecontent = xmlBuilder.createTextNode("Unspecified")
            pose.appendChild(posecontent)
            object.appendChild(pose)  # pose标签结束

            truncated = xmlBuilder.createElement("truncated")  # truncated标签
            truncatedContent = xmlBuilder.createTextNode("0")
            truncated.appendChild(truncatedContent)
            object.appendChild(truncated)  # truncated标签结束

            difficult = xmlBuilder.createElement("difficult")  # difficult标签
            difficultcontent = xmlBuilder.createTextNode("0")
            difficult.appendChild(difficultcontent)
            object.appendChild(difficult)  # difficult标签结束

            bndbox = xmlBuilder.createElement("bndbox")  # bndbox标签
            xmin = xmlBuilder.createElement("xmin")  # xmin标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
            xminContent = xmlBuilder.createTextNode(str(mathData))
            xmin.appendChild(xminContent)
            bndbox.appendChild(xmin)  # xmin标签结束

            ymin = xmlBuilder.createElement("ymin")  # ymin标签
            mathData = int(((float(oneline[2])) * Pheight + 1) - (float(oneline[4])) * 0.5 * Pheight)
            yminContent = xmlBuilder.createTextNode(str(mathData))
            ymin.appendChild(yminContent)
            bndbox.appendChild(ymin)  # ymin标签结束

            xmax = xmlBuilder.createElement("xmax")  # xmax标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) + (float(oneline[3])) * 0.5 * Pwidth)
            xmaxContent = xmlBuilder.createTextNode(str(mathData))
            xmax.appendChild(xmaxContent)
            bndbox.appendChild(xmax)  # xmax标签结束

            ymax = xmlBuilder.createElement("ymax")  # ymax标签
            mathData = int(((float(oneline[2])) * Pheight + 1) + (float(oneline[4])) * 0.5 * Pheight)
            ymaxContent = xmlBuilder.createTextNode(str(mathData))
            ymax.appendChild(ymaxContent)
            bndbox.appendChild(ymax)  # ymax标签结束

            object.appendChild(bndbox)  # bndbox标签结束

            annotation.appendChild(object)  # object标签结束

        f = open(xmlPath + name[0:-4] + ".xml", 'w')
        xmlBuilder.writexml(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
        f.close()


if __name__ == "__main__":
    picPath = "VOCdevkit/VOC2007/JPEGImages/"  # 图片所在文件夹路径,后面的/一定要带上
    txtPath = "VOCdevkit/VOC2007/YOLO/"  # txt所在文件夹路径,后面的/一定要带上
    xmlPath = "VOCdevkit/VOC2007/Annotations/"  # xml文件保存路径,后面的/一定要带上
    makexml(picPath, txtPath, xmlPath)

        If the label is in txt format, several issues need to be paid attention to during the conversion process:

        1. According to the last few lines of the code, the txt note should be stored in the YOLO folder;

        2. The tags converted into xml format will be stored in the Annotations folder;

        3. If there is an error of .shape, check whether there is a class.txt file in the YOLO folder and delete it.

        With that, the part of preparing the dataset is complete.

2. Configure code parameters and train models

2.1 Download source code

The code of YOLOv5 is open source GitHub - ultralytics/yolov5          on GitHub , and we can download the source code on the website. As shown in the figure below, here we have selected the v6.0 version.

        We decompress the downloaded yolov5 code, and then open it with pycharm. After opening, the entire code directory is as follows:

2.2 Join the dataset

        Put the dataset you prepared into the VOCdevkit folder under the project directory, and use the label conversion and dataset division method introduced in 1.3 to divide the dataset, as shown in the following figure:

                  

2.3 Configure code parameters       

1. Select the configured pytorch environment in the lower right corner of pycharm. If the environment has not been installed, please refer to the previous article;

2. Install the dependent libraries required by yolov5. Open the command terminal of pycharm and enter the following command, as shown in the figure:

pip install -r requirements.txt

 3. Download the preweight file. Website pre-weight download , here we use yolov5s.pt , download it and place it in the project directory.

 4. Modify the data configuration file.

① Copy the VOC.yaml file under the data folder in this directory, name it mask.yaml, and modify it referring to the figure below.

 ②Copy a copy of the yolov5s.yaml file under the model folder in this directory, name it yolov5s_mask.yaml, and modify it referring to the figure below.

 5.train file parameter setting

        As shown in the figure below, line 436 is to set the pre-weight file, line 437 and line 438 are to set the data configuration file, and line 440 is to set the number of iterations (you can set it yourself according to your needs).

         After the above parameters are configured, you can run the train.py file for training. But you may encounter the following problems:

Question one:

This means that the virtual memory is not enough. We can change the num_workers parameter nw in line 117 to 0 by modifying the datasets.py file under the utils path.

 Question two:

This indicates GPU memory overflow. We can solve it by reducing the batch-size and workers parameter size.

         After completing the above parameter configuration, we can train our own data. Run the train.py file, the Run column as shown in the figure below indicates that the training has started.

 3. Forecast

        After the training is completed, the project will have a runs/train/exp folder, which contains the trained weight data and other parameter files, as shown in the following figure: 

         Then we open the detect.py file and modify the following parameters. Line 269 is to set the trained weight file, line 270 is to set the picture folder or specific picture we want to detect or call the camera, 0 means call the camera.

         After the setting is complete, just run the detect.py file, and the detection results are saved in the runs\detect\exp folder.

         The test results are as follows:

         In this way, we have completed an experimental project based on YOLOv5 training our own target detection data set. Are you surprised by the test results? You may also feel that the process is cumbersome or unclear. Practice makes perfect, if you do more practice, you will be able to grasp the main points soon. Okay, thank you everyone, if you encounter other problems, you can communicate.

Guess you like

Origin blog.csdn.net/cxzgood/article/details/124618506