YOLOv5 trains its own data set to realize video recognition

written in front

I originally used this model to locate handwritten signatures, but due to the organizer of the competition, the data is not allowed to be made public, so I used an opening video of the animal world to make my data set. During the process of running the whole model, I referred to many good blogs. The reason for writing this blog is to record what I see and feel. My model is running on the mobile nine-day platform. The blogs referenced in this article are as follows:
YOLOv5 trains your own data set (ultra-detailed full version)
target detection - teach you to use yolov5 to train your own target detection model

The purpose of writing this article is to give you some advice and also to record your own growth.

1. Environment

There are many recognition models, but the more practical one is YOLOV5, which is made by a foreign company and is easier to use. Here is the link to github

Everything is difficult at the beginning. I think that as long as the environment is set up for a model, the model will be 90% successful. But fortunately, the various libraries required by YOLOv5 are relatively popular, and there are no pitfalls. (This article mainly increases confidence for everyone), so I think the environment is well matched.
Here I still suggest that you create a virtual environment first. I didn’t create it when I was running on the server. A tutorial on creating a virtual environment , and then activating your own virtual environment. Here I have to say that the server is really fragrant.
The environment we need has been written in the YOLOv5 project, in requirements.txt, so we only need to pull the github code to the local, and then one command can be installed:

pip install -r requirements.txt

I’m running on the server to demonstrate to you:
insert image description here
if the above things appear, it’s roughly OK

Next, you can directly run train.py to check whether your environment is properly configured. I think this step is very important. When I saw that the model can run through, it gave me great confidence.
insert image description here
insert image description here
If you run it directly, the data set used is the default coco128. It needs to be downloaded from the official website for the first time, and then it can be used. If the epoch appears as shown in the figure above, it means that the model has started training and the environment has been configured. In other words, you don’t have to worry about the environment of this model in the future. Congratulations for taking a big step.

If it tells you that there are missing packages, just pip install whatever is missing, and generally no error will be reported.

After running successfully, you can create your own data set to prepare for training.

2. Make your own dataset

2.1 Convert video to picture

In this section, we need to process the video frame by frame and convert it into a picture, which is convenient for our final data set. Here I am using vedio_to_picturesthis code, the program will be placed in the appendix below. The main function of the program is to process the pictures frame by frame into pictures one by one, and save them to the allimages folder in the current directory.
The basic information of my video is:
insert image description here
I used QQ’s built-in screen recorder to record a video on station b (or from other websites, I can’t remember exactly), the obtained frame rate is 19.58 frames per second, and the total duration It is 69 seconds. So the number of pictures obtained is about 19.58*69=1351. In the end, I got 1357 pictures by using image processing, as follows, which roughly matches.
insert image description here

You can choose some pictures to make your own data set based on the pictures you get, because I finally used 45 pictures to make my own data set, so I selected them manually. You can write a random function to automatically select.

2.2 Annotate pictures to create a standard dataset

This section is to label the 45 pictures obtained above and make them into the standard format of the data set that YOLO can train.

What a standard dataset looks like

First look at the format of the YOLO data set is the category number of the object, x, y, w, h.
insert image description here
Take the first piece of data as an example, for example:

0 0.21584984358706988 0.5041743970315399 0.11678832116788321 0.10296846011131726

The first 0 refers to the code of this type of object, which will be automatically numbered after you mark the object. The following coordinates respectively represent the center coordinates of the target and the width and height of the target, where the width and height are the normalized width and height.

Label your own dataset with labelImg

Any image tagging software can be tagged. Here I choose LabelImg (the installation tutorial is attached above). The format of the tag I choose here is the voc format (shown on the left side of the picture below), and the code will be used later. The voc format is converted into the YOLO format. The reason why I did not directly choose the YOLO format is that my labelImgYOLO format is not standard. It may be because the version I installed is too old. Some students directly use the YOLO format to label. You can also to try.

insert image description here
There are some tips when marking here, such as changing the category to be marked in advance, automatic saving, some shortcut keys, w key can quickly open the frame, d key to switch to the next one, etc., remember to set the saved folder. You can search for it.

After the annotation is completed, voc gets the xml file. This is what I
insert image description here
do here. Here is a format conversion.
First, create a new data folder under the YOLOv5 folder. I named it hanzi_data here:

insert image description here
Then create the images folder (the name cannot be changed) and the Annotations folder under the hanzi_data folder, one to store the pictures you want to train, that is, the pictures we marked, and one to store our xml files. As follows:

insert image description here
insert image description here

Divide training set, test set and validation set

Next, divide the training set, test set and verification set. Here is split_train_val.pywhat you get by running the code, and the link will be given later. If your folder has changed, the code will be modified accordingly.
insert image description here
If the folder of the running result has not been modified, a new folder ImageSets will appear after the running is completed. Here we run the results as follows:

insert image description here
Open the folder and there is a main folder in it, and then there are four txt files of test, train, trainval and val, all of which are the names of the pictures without suffixes. I have an uninvited guest here. It is a problem with my code storage. Generally, there will be no such thing. I can just delete it here.

insert image description here

XML format to yolo_txt format

Here is the run text_to_yolo.py, and then you can get the dataSet_path folder and the labels folder, as shown below:

insert image description here
The three txt in the dataSet_path folder store the paths of your own training set, verification set, and test set respectively.

insert image description here
At this time, the txt data under the label folder is also the standard YOLO mode, as shown below:

insert image description here
At this point, the data set is created and ready for training.

Create your own configuration file

Create a new myvoc.yaml file (you can customize the name) under the data folder in the yolov5 directory , and open it with Notepad.
The content is:
the path of the training set and the verification set (train.txt and val.txt) ( can be changed to a relative path )
, the number of categories and the category name of the target.
insert image description here

3. Model training

Change model configuration

Select a model, the model folder under the yolov5 directory is the model configuration file, there are n, s, m, l, x versions, the model is complex in turn, the weights increase in turn, and the training time also increases in turn.
Here I choose yolov5s, and then make changes, as follows:

insert image description here

start training

python train.py --weights weights/yolov5s.pt  --cfg models/yolov5s.yaml  --data data/myvoc.yaml --epoch 200 --batch-size 8

–weights The path of your own weight, see where your yolov5s.pt is located, you may need to change it.
–cfg The path of the model configuration, which is the model configuration changed in the previous step.
–data is the path to create your own configuration file when making your own dataset.
–epoch training rounds
–batch-size The number of photos fed into a training session, if the computer configuration is not good, change it to a smaller size.

training process

During the training process, the location where the training results are stored will be printed, generally stored in the latest exp folder under runs/train/

insert image description here
My trained model is stored under runs/train/exp22.
insert image description here
In addition, there are pictures of your training process under the trained exp:
insert image description here
there are some other pictures of the training process:
R_curve recall rate Recall and confidence degree confidence The relationship between.
The P in the PR curve of PR_curve stands for Precision (precision rate), and R stands for Recall (recall rate), which represents the relationship between precision rate and recall rate. In general, recall is set to the abscissa, and precision is set to is the vertical coordinate. The area under the PR curve is the AP, and the average AP value of all categories is the map. The AP value is an important evaluation index to measure the performance of the object detection model classifier. The larger the AP value, the better the performance of the classifier, and the smaller the AP value, the worse the performance of the classifier.
P_curve is a graph of the relationship between Precision and confidence.
F1_curve is mathematically defined as F1 score, also known as balanced F score, which is defined as the harmonic mean of accuracy and recall.
confusion_matrix refers to the confusion matrix.

model checking

The code for model checking is as follows:

python detect.py --weights runs/train/exp/weights/best.pt --source ../data/video/animal.mp4

--weights where the weights are stored
--source what you want to detect. Support pictures, folders, videos and cameras

Here I am testing Animal.mp4 under test_data

insert image description here
It can be seen that when running a video, it processes images frame by frame, which reflects its fast characteristics.
Finally, it will be saved in runs/detect/exp, as shown below
insert image description here
My final result is as follows:

YOLO demo

The model is basically over here, don't worry if you encounter problems, the machine will not break, and it is always possible to try a few more times.

4 related questions

training cache

The data cache will also be generated during training, under your hanzi_data/dataSet_path folder, if you need to train again later, you may need to delete it. (It's okay if I don't delete it)
insert image description here

training time

My data is 45 pictures, using GPU, 200 epochs about 25 minutes of training.

insert image description here

code appendix

vedio_to_pictures
split_train_val.py
text_to_yolo.py

Guess you like

Origin blog.csdn.net/weixin_53665577/article/details/129648364
Recommended