written in front
I originally used this model to locate handwritten signatures, but due to the organizer of the competition, the data is not allowed to be made public, so I used an opening video of the animal world to make my data set. During the process of running the whole model, I referred to many good blogs. The reason for writing this blog is to record what I see and feel. My model is running on the mobile nine-day platform. The blogs referenced in this article are as follows:
YOLOv5 trains your own data set (ultra-detailed full version)
target detection - teach you to use yolov5 to train your own target detection model
The purpose of writing this article is to give you some advice and also to record your own growth.
1. Environment
There are many recognition models, but the more practical one is YOLOV5, which is made by a foreign company and is easier to use. Here is the link to github
Everything is difficult at the beginning. I think that as long as the environment is set up for a model, the model will be 90% successful. But fortunately, the various libraries required by YOLOv5 are relatively popular, and there are no pitfalls. (This article mainly increases confidence for everyone), so I think the environment is well matched.
Here I still suggest that you create a virtual environment first. I didn’t create it when I was running on the server. A tutorial on creating a virtual environment , and then activating your own virtual environment. Here I have to say that the server is really fragrant.
The environment we need has been written in the YOLOv5 project, in requirements.txt, so we only need to pull the github code to the local, and then one command can be installed:
pip install -r requirements.txt
I’m running on the server to demonstrate to you:
if the above things appear, it’s roughly OK
Next, you can directly run train.py to check whether your environment is properly configured. I think this step is very important. When I saw that the model can run through, it gave me great confidence.
If you run it directly, the data set used is the default coco128. It needs to be downloaded from the official website for the first time, and then it can be used. If the epoch appears as shown in the figure above, it means that the model has started training and the environment has been configured. In other words, you don’t have to worry about the environment of this model in the future. Congratulations for taking a big step.
If it tells you that there are missing packages, just pip install whatever is missing, and generally no error will be reported.
After running successfully, you can create your own data set to prepare for training.
2. Make your own dataset
2.1 Convert video to picture
In this section, we need to process the video frame by frame and convert it into a picture, which is convenient for our final data set. Here I am using vedio_to_pictures
this code, the program will be placed in the appendix below. The main function of the program is to process the pictures frame by frame into pictures one by one, and save them to the allimages folder in the current directory.
The basic information of my video is:
I used QQ’s built-in screen recorder to record a video on station b (or from other websites, I can’t remember exactly), the obtained frame rate is 19.58 frames per second, and the total duration It is 69 seconds. So the number of pictures obtained is about 19.58*69=1351. In the end, I got 1357 pictures by using image processing, as follows, which roughly matches.
You can choose some pictures to make your own data set based on the pictures you get, because I finally used 45 pictures to make my own data set, so I selected them manually. You can write a random function to automatically select.
2.2 Annotate pictures to create a standard dataset
This section is to label the 45 pictures obtained above and make them into the standard format of the data set that YOLO can train.
What a standard dataset looks like
First look at the format of the YOLO data set is the category number of the object, x, y, w, h.
Take the first piece of data as an example, for example:
0 0.21584984358706988 0.5041743970315399 0.11678832116788321 0.10296846011131726
The first 0 refers to the code of this type of object, which will be automatically numbered after you mark the object. The following coordinates respectively represent the center coordinates of the target and the width and height of the target, where the width and height are the normalized width and height.
Label your own dataset with labelImg
Any image tagging software can be tagged. Here I choose LabelImg (the installation tutorial is attached above). The format of the tag I choose here is the voc format (shown on the left side of the picture below), and the code will be used later. The voc format is converted into the YOLO format. The reason why I did not directly choose the YOLO format is that my labelImgYOLO format is not standard. It may be because the version I installed is too old. Some students directly use the YOLO format to label. You can also to try.
There are some tips when marking here, such as changing the category to be marked in advance, automatic saving, some shortcut keys, w key can quickly open the frame, d key to switch to the next one, etc., remember to set the saved folder. You can search for it.
After the annotation is completed, voc gets the xml file. This is what I
do here. Here is a format conversion.
First, create a new data folder under the YOLOv5 folder. I named it hanzi_data here:
Then create the images folder (the name cannot be changed) and the Annotations folder under the hanzi_data folder, one to store the pictures you want to train, that is, the pictures we marked, and one to store our xml files. As follows:
Divide training set, test set and validation set
Next, divide the training set, test set and verification set. Here is split_train_val.py
what you get by running the code, and the link will be given later. If your folder has changed, the code will be modified accordingly.
If the folder of the running result has not been modified, a new folder ImageSets will appear after the running is completed. Here we run the results as follows:
Open the folder and there is a main folder in it, and then there are four txt files of test, train, trainval and val, all of which are the names of the pictures without suffixes. I have an uninvited guest here. It is a problem with my code storage. Generally, there will be no such thing. I can just delete it here.
XML format to yolo_txt format
Here is the run text_to_yolo.py
, and then you can get the dataSet_path folder and the labels folder, as shown below:
The three txt in the dataSet_path folder store the paths of your own training set, verification set, and test set respectively.
At this time, the txt data under the label folder is also the standard YOLO mode, as shown below:
At this point, the data set is created and ready for training.
Create your own configuration file
Create a new myvoc.yaml file (you can customize the name) under the data folder in the yolov5 directory , and open it with Notepad.
The content is:
the path of the training set and the verification set (train.txt and val.txt) ( can be changed to a relative path )
, the number of categories and the category name of the target.
3. Model training
Change model configuration
Select a model, the model folder under the yolov5 directory is the model configuration file, there are n, s, m, l, x versions, the model is complex in turn, the weights increase in turn, and the training time also increases in turn.
Here I choose yolov5s, and then make changes, as follows:
start training
python train.py --weights weights/yolov5s.pt --cfg models/yolov5s.yaml --data data/myvoc.yaml --epoch 200 --batch-size 8
–weights The path of your own weight, see where your yolov5s.pt is located, you may need to change it.
–cfg The path of the model configuration, which is the model configuration changed in the previous step.
–data is the path to create your own configuration file when making your own dataset.
–epoch training rounds
–batch-size The number of photos fed into a training session, if the computer configuration is not good, change it to a smaller size.
training process
During the training process, the location where the training results are stored will be printed, generally stored in the latest exp folder under runs/train/
My trained model is stored under runs/train/exp22.
In addition, there are pictures of your training process under the trained exp:
there are some other pictures of the training process:
R_curve recall rate Recall and confidence degree confidence The relationship between.
The P in the PR curve of PR_curve stands for Precision (precision rate), and R stands for Recall (recall rate), which represents the relationship between precision rate and recall rate. In general, recall is set to the abscissa, and precision is set to is the vertical coordinate. The area under the PR curve is the AP, and the average AP value of all categories is the map. The AP value is an important evaluation index to measure the performance of the object detection model classifier. The larger the AP value, the better the performance of the classifier, and the smaller the AP value, the worse the performance of the classifier.
P_curve is a graph of the relationship between Precision and confidence.
F1_curve is mathematically defined as F1 score, also known as balanced F score, which is defined as the harmonic mean of accuracy and recall.
confusion_matrix refers to the confusion matrix.
model checking
The code for model checking is as follows:
python detect.py --weights runs/train/exp/weights/best.pt --source ../data/video/animal.mp4
--weights where the weights are stored
--source what you want to detect. Support pictures, folders, videos and cameras
Here I am testing Animal.mp4 under test_data
It can be seen that when running a video, it processes images frame by frame, which reflects its fast characteristics.
Finally, it will be saved in runs/detect/exp, as shown below
My final result is as follows:
YOLO demo
The model is basically over here, don't worry if you encounter problems, the machine will not break, and it is always possible to try a few more times.
4 related questions
training cache
The data cache will also be generated during training, under your hanzi_data/dataSet_path folder, if you need to train again later, you may need to delete it. (It's okay if I don't delete it)
training time
My data is 45 pictures, using GPU, 200 epochs about 25 minutes of training.