YOLOV5 target detection record

This article only discusses some records of YOLOV5's target detection process, and does not involve CUDA configuration and some environment configuration issues. For related issues, please refer to my other blogs.

1. Run the official website YOLOV5 code

1. Download the source code

First of all, you must find the open source code. At present, it seems that most of the code is modified from the following code. We found the open source code of the great god from git:

ultralytics/yolov5 at v5.0 (github.com)

Here, select the v5.0 branch in the branch next to it, and you can download the full text, or if your computer has git, you can also use the following command to get it

git clone xxx

As shown in the picture, the download page
insert image description here
is only 1.4M after downloading. Our intuition is definitely wrong, so we need to see what is missing. Yes, we need to download the weight. Open this sh file and you can see it. Here we can Click on this link to download it
insert image description here
, and then pull down the page to find the required version download weight, because the weight below is required for both training and target detection. This can also be understood as a model file, a trained model File
insert image description here
In fact, the official model is a model for the target of 20 types of objects. It is a relatively common model. We can see the content in the voc configuration file of data. The following modifications are also based on this
insert image description here

2. Folder analysis

Here is a reference: Target detection - teach you to use yolov5 to train your own target detection model

The following is a description of some important folders

insert image description here

data folder :

  • It mainly stores some configuration files of hyperparameters, that is, yaml files, which are configuration files of some data sets.
  • The official test pictures are provided, of course, your own test pictures can also be used.
  • If you are training your own data set, you need to modify the yaml file in it.
    insert image description here
    There are actually no other files here. .sh is used for downloading, so just delete them all, leaving only one
    insert image description here

models folder:

  • It mainly contains some configuration files and functions for network construction, including four different versions of the project, namely s, m, l, and x.
  • Therefore, even if the front has changed, corresponding changes must be made here.
    insert image description here

utils folder:

  • It stores the functions of tools, including loss function, metrics function, plots function and so on.
    insert image description here

weights folder:

  • Place the trained weight parameters. Here is also a sh file by default, which is the download link. If the network speed is not good, download it first, so you don’t need to download it.

The following is an introduction to some important py files

  • detect.py: Use the trained weight parameters for target detection, which can detect images, videos and cameras.
  • train.py: Functions to train your own dataset.
  • test.py: A function to test the results of training.

3. Start the test

After the official file is downloaded, you can test it directly. Here are mainly pictures, videos and camera tests.

1. Picture test

Just modify the preset parameters here, which is the jpg file,
insert image description here
and then put the weight file we prepared before into the same path, and write whatever weight path is selected here. The following tests are based on this weight.
insert image description here
The picture is an official picture. After running, the relevant results will be displayed under the run file.
insert image description here
The results are as follows, which are relatively accurate.
insert image description here

2. Video test

The part of the video is to modify the picture to the corresponding video. Here
insert image description here
I used a video of driving on the road. The result of the operation is as follows
insert image description here

3. Camera real-time test

For real-time detection, we need to modify the path to our own camera number. For example, the default camera of the computer is 0, and the external camera I use here is 1.
insert image description here

But after the actual test, it seems that it can’t work, because I ran this last year, so I also downloaded the corresponding weight file before, and then wrote this article and downloaded the weight file again. I found that the two weight files reported errors. It's the same, please refer to the following description for details:

The first is the problem of using the latest weights. The error message is as follows. It is said that this SPPF module is newly added, so you need to add this part to the source file. The suggestion of other bloggers here is to add this, which is missing
insert image description here
. The SPPF is added, because this weight is actually the weight of the yolov5-6.0 version, and yolov5 is updated very quickly. According to the official description, the detection speed of using this module has improved a lot, but we can’t use it here, so just try it

import warnings


class SPPF(nn.Module):
    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 4, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))

But after modification, this error will be reported, and other things need to be changed. Simply do not use the weight of version 6.0, and directly switch to version 5.0 below, which is the following solution
insert image description here

The second is the problem that the old model will have. It will report such an error, but it can still run, but only one picture is displayed, and the latter is stuck. The reason is said on the Internet to be a problem with the production of the data set, which is the data set. Some of the data sets in it are empty, and there will be such a problem if there is no label. The
insert image description here
solution proposed on the Internet is to reprocess the data set, that is, the marked data set has no label box, and this part needs to be cleaned up for him. It’s too annoying. I think it’s better to deal with it directly. Just comment out this line of code. After commenting out, you
insert image description here
can perform real-time camera detection.

2. Train your own neural network model

1. Data set production

Production software: labelmg , here this software thinks the more important thing is the shortcut key, the shortcut key is very important, you must take a look, otherwise the label is suspicious of life, and the rest is nothing important, the software basically imports pictures, and then draws The box will export an xml file. This xml file is some information about the box, such as its location and size, so that it can be used for subsequent data set training.
insert image description here

The shortcut keys are as follows:

  • A: switch to the previous picture
  • D: switch to the next picture
  • W: Bring up the marked cross
  • del : delete the label box
  • Ctrl+u: Select the marked picture folder
  • Ctrl+r: Select the folder where the marked label tag exists

We often obtain some target detection data set resource tags from the Internet in VOC (xml format), and the file format required for yolov5 training is yolo (txt format), here we need to convert the tag file in xml format txt file. At the same time, when training your own yolov5 detection model, the data set needs to be divided into a training set and a verification set. Here is a code to convert the annotation file in xml format to an annotation file in txt format, and divide it into training set and verification set in proportion. First on the code and then explain the precautions of the code.

2. Divide the dataset

The data set format of voc2007 used here, the reference article is as follows:

https://blog.csdn.net/didiaopao/article/details/120022845?spm=1001.2014.3001.5501

The first is to modify the configuration file, which needs to be modified as follows
insert image description here
. Here, modify the configuration file for training again. In fact, it is all necessary information and paths. After that, it is
insert image description here
the data set part. Here, create a new folder first. Here
insert image description here
is the blog above. The master provides the code to convert our labeled data set into voc format, and it can be used by modifying a little parameter. You can see the modification of the label type and the proportion of some training sets here. The output txt file used by yolo can be used
insert image description here
after
insert image description here
running See that the files we need for training are generated
insert image description here

The source code is as follows: The source of the code here

https://blog.csdn.net/didiaopao/article/details/120022845?spm=1001.2014.3001.5501

import xml.etree.ElementTree as ET
import os
import random
from shutil import copyfile

classes = ["hat", "person"]
TRAIN_RATIO = 80


def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)


def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


def convert_annotation(image_id):
    in_file = open('VOCdevkit/VOC2007/Annotations/%s.xml' % image_id)
    out_file = open('VOCdevkit/VOC2007/YOLOLabels/%s.txt' % image_id, 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()


wd = os.getcwd()
wd = os.getcwd()
data_base_dir = os.path.join(wd, "VOCdevkit/")
if not os.path.isdir(data_base_dir):
    os.mkdir(data_base_dir)
work_sapce_dir = os.path.join(data_base_dir, "VOC2007/")
if not os.path.isdir(work_sapce_dir):
    os.mkdir(work_sapce_dir)
annotation_dir = os.path.join(work_sapce_dir, "Annotations/")
if not os.path.isdir(annotation_dir):
    os.mkdir(annotation_dir)
clear_hidden_files(annotation_dir)
image_dir = os.path.join(work_sapce_dir, "JPEGImages/")
if not os.path.isdir(image_dir):
    os.mkdir(image_dir)
clear_hidden_files(image_dir)
yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/")
if not os.path.isdir(yolo_labels_dir):
    os.mkdir(yolo_labels_dir)
clear_hidden_files(yolo_labels_dir)
yolov5_images_dir = os.path.join(data_base_dir, "images/")
if not os.path.isdir(yolov5_images_dir):
    os.mkdir(yolov5_images_dir)
clear_hidden_files(yolov5_images_dir)
yolov5_labels_dir = os.path.join(data_base_dir, "labels/")
if not os.path.isdir(yolov5_labels_dir):
    os.mkdir(yolov5_labels_dir)
clear_hidden_files(yolov5_labels_dir)
yolov5_images_train_dir = os.path.join(yolov5_images_dir, "train/")
if not os.path.isdir(yolov5_images_train_dir):
    os.mkdir(yolov5_images_train_dir)
clear_hidden_files(yolov5_images_train_dir)
yolov5_images_test_dir = os.path.join(yolov5_images_dir, "val/")
if not os.path.isdir(yolov5_images_test_dir):
    os.mkdir(yolov5_images_test_dir)
clear_hidden_files(yolov5_images_test_dir)
yolov5_labels_train_dir = os.path.join(yolov5_labels_dir, "train/")
if not os.path.isdir(yolov5_labels_train_dir):
    os.mkdir(yolov5_labels_train_dir)
clear_hidden_files(yolov5_labels_train_dir)
yolov5_labels_test_dir = os.path.join(yolov5_labels_dir, "val/")
if not os.path.isdir(yolov5_labels_test_dir):
    os.mkdir(yolov5_labels_test_dir)
clear_hidden_files(yolov5_labels_test_dir)

train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'w')
train_file.close()
test_file.close()
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'a')
list_imgs = os.listdir(image_dir)  # list image files
prob = random.randint(1, 100)
print("Probability: %d" % prob)
for i in range(0, len(list_imgs)):
    path = os.path.join(image_dir, list_imgs[i])
    if os.path.isfile(path):
        image_path = image_dir + list_imgs[i]
        voc_path = list_imgs[i]
        (nameWithoutExtention, extention) = os.path.splitext(os.path.basename(image_path))
        (voc_nameWithoutExtention, voc_extention) = os.path.splitext(os.path.basename(voc_path))
        annotation_name = nameWithoutExtention + '.xml'
        annotation_path = os.path.join(annotation_dir, annotation_name)
        label_name = nameWithoutExtention + '.txt'
        label_path = os.path.join(yolo_labels_dir, label_name)
    prob = random.randint(1, 100)
    print("Probability: %d" % prob)
    if (prob < TRAIN_RATIO):  # train dataset
        if os.path.exists(annotation_path):
            train_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention)  # convert label
            copyfile(image_path, yolov5_images_train_dir + voc_path)
            copyfile(label_path, yolov5_labels_train_dir + label_name)
    else:  # test dataset
        if os.path.exists(annotation_path):
            test_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention)  # convert label
            copyfile(image_path, yolov5_images_test_dir + voc_path)
            copyfile(label_path, yolov5_labels_test_dir + label_name)
train_file.close()
test_file.close()

Run it to convert our marked data into the format we need!

3. Start training

The training has entered the train.py here, it is still the same, modify the parameters first,
insert image description here
there are some more important parameters below, you can refer to it ( modify according to the actual situation ) ,
insert image description here
and then you can run the code, you can use tensorboard View the training process in real time, enter the following command (of course, this must be installed in advance)

tensorboard --logdir=runs/train

Click on the URL below to view
insert image description here
the main map, accuracy and recall
insert image description here

4. Training parameters

After the training is completed, a folder will be output. You can view some training parameters in the folder. Open the folder as shown below.
insert image description here
Personally, I mainly look at this picture.
insert image description here

The meaning of these parameters is as follows:

  • The loss of border detection, the smaller the more accurate
  • Objectness: Presumably the mean value of the target detection loss, the smaller the target, the more accurate the detection
  • Classification: Inferred to be the mean value of classification loss, the smaller the classification, the more accurate it is
  • Precision: accuracy rate (find the right / found)
  • Recall: recall rate (find the right/should find the right)
  • [email protected] & [email protected]:0.95: AP is the area enclosed by Precision and Recall as two-axis plotting, m means average, the number after @ means the threshold for judging iou as a positive or negative sample, @0.5:0.95 means The threshold value is 0.5:0.05:0.95 and then the average value is taken.

Reference here:

https://blog.csdn.net/flyfish1986/article/details/118858068

5. Use the trained model to make predictions

Here is actually running to the original detection, just modify the model file.
insert image description here

3. Obtain the target center coordinates

Generally, if you use yolo for target tracking, you need to find the center position. Of course, this is a very traditional method, so let's take a look at this classic solution. We can see the frame function when we enter this function. Add a sentence
insert image description here
in
insert image description here
it

print("中心位置的坐标为"+str((c1[0]+c2[0])/2)+','+str((c1[1]+c2[1])/2))

That's it, run it to see the result
insert image description here

Guess you like

Origin blog.csdn.net/m0_51220742/article/details/124581990