YoLoV5 learning (5) -- Train.py program file and yolov5s model file explanation

This blog mainly explains the train file and yolov5s.yaml file. Although the yolov5 code has been updated, the overall framework is basically similar.

1.Usage

This section is a note by the author.
The first line indicates that the data dataset we passed in is the coco128 dataset, the weight model is the yolov5s model, –img indicates that the image size is 640, and the main difference between the second line and the first line is that the first line is loading the weight of yolov5s Based on the training, and the second line is to build a model from scratch after configuring the yolov5s network structure, and then start training from scratch.
author note

2. Guide package and guide library operation

The above are some operations of importing packages and importing libraries. There are three parameters in the train file as shown in the figure below. These three parameters are mainly used for distributed training. For us beginners, they are generally default parameters.
insert image description here

3. Parsing parameters

The parameters are passed and then parsed. It mainly includes four parts.
insert image description here

3.1 The first is the verification function

insert image description here

3.2 main function

The first is the verification of the code, and then it will judge whether to execute different operations according to whether Resume is passed in the command line. The third part will judge whether you use the DDP training method, and the fourth part will start the formal training.
checks
First of all, it will judge whether to execute the following three lines of code according to the rank variable. The value of this rank has also been mentioned at the beginning. If distributed training is not performed, the default is -1, so it will execute the following three lines. Lines of code, the first line is responsible for printing the parameter information used in the file, this parameter includes the parameters passed in from the command line and the default parameters, the second line is to check whether the github warehouse of yolov5 is updated, if it is updated, there will be a prompt . The third line is used to check whether the installation package required in requirements has been installed correctly, and if not successful, a certain prompt will be given.
Resume
First of all, it will judge whether you have passed the resume parameter in the command line. The resume parameter means to resume during the interruption. Our previous training is restored. Because we are using the training model yolov5s.pt, there is no need to pass in this parameter, so it will execute the code content in else.
In the else code, it will first check the paths of these files, including the data set data, cfg, weight, project, etc. Here we do not use cfg, so it is passed in as empty, and then judge whether cfg and weights are both Is empty, if it is empty, an error will be reported.
Next, we will judge whether to enter evolve, and decide to save it under that folder, so we did not enter evolve here. This is followed by the name of the saved file.
DDP model
In this part, it will choose whether you use cpu or gpu. If you use distributed training, it will perform some additional operations below. We generally don’t use distributed training here, so we don’t execute anything.
train
In the part of model training, if you enter evolve, the following code will be executed, because we did not enter evolve and it is not distributed training, so the train function will be executed. Therefore, in fact, we only need to pay attention to the train function. Evolve is a method given by the author to purify hyperparameters. Under normal circumstances, we use the default parameters and some manual parameter adjustments, which is good enough.
train

3.3 train function

Then there is the most critical train function. First, some parameters are passed in, and then the file path of the training weight file is defined to save, followed by some hyperparameters that need to be used during the training process. Some hyperparameters are loaded and printed out.
Save run settings
The subsequent save operation of the running configuration process will be saved in the hyp.yaml file in your training directory, and the parameters used during your execution will be saved in the opt.yaml file.
Loggers
In the log file, the visualization operation during the training process is completed based on the two libraries of wandb and tensorboard. In this file, the recording process of the program training log is completed.
Config
In the Config file, first draw the training process and results based on the true or false feedback of the plots, and then judge whether the computer supports cuda. ​​The third line is to ensure whether our training is reproducible, and the fourth line is Related to distributed training, it will not be executed if distributed training is not performed. The fifth line will check and read the data set, the sixth line will take out the training path and verification path of the data set, the seventh line will take out your class name, and the eighth line will check whether the number of classes and the name of the class are the same Judgment, if it is not the same, it will report an error. The last line will judge whether it is a coco data set. If it is, it will perform some additional operations. If it is not this time, reply false.
Model
Then there is the model loading part. First, it will check whether the suffix name of the incoming weight parameter ends with .pt. If it does not exist locally, it will try to download the weight file from the official warehouse of yolov5, load the weight file, and then it will be based on your There will be a yolov5s.yaml file in the weight file, and the code will train the model according to yolov5s.yaml. The main meaning of this block is that our pre-training model is yolov5s.pt, and our new model is based on our own recognition and detection requirements based on yolov5s.
Freeze
Freeze is a freezing process, which is related to our passing parameters. It is not frozen by default. In opt, we pass in 10, which means that we have frozen the backbone part, which means that we only use the head part during the training process. Through the code of Freeze, you can manually control which layers you want to freeze.
Image size
This part of the code is mainly used to check whether the size of the input image satisfies the multiple of 32. If not, it will automatically fill in the multiple of 32 for you.
Batchsize
The Batchsize part is generally not executed unless we manually enter -1, which is 16 by default.
Optimizer
Scheduler/EMA
Next up are methods for creating deep learning optimizers, stochastic decay strategies, and exponential moving averages.
DP mode
#DP mode will judge whether multiple graphics cards are used, #SyncBatchNorm is related to distributed training, and then load training data operations and load verification set data operations.
start training
end training
Next is to start the training work. In this project, "compute_loss=ComputeLoss(model)" defines the loss function. When the training is over, the best.weight will be selected for verification testing on the verification set, and the result will be printed out. .

4.YoloV5S.yaml

Parameters
nc represents the number of species that yolov5s can predict, here are 80 species. Anchors are some rectangular boxes defined in advance, and these rectangular boxes are used to complete the detection. The three layers of anchors correspond to different feature levels, and three different anchors are defined on each layer of anchors. The meaning of the depth multiple of depth_multiple is reflected in the fact that the number parameter in the backbone multiplied by depth_multiple is the actual parameter, width_multiple represents the channel parameter, and the characteristic channel parameter in args multiplied by width_multiple is the actual parameter. Actually comparing yolov5s, 5l, 5m, 5n, and 5x, it can be found that only the depth_multiple depth multiple and width_multiple channel parameters are different in the .yaml file. Therefore, these two parameters can be commonly understood as adjusting the network scale.
backbone
Backbone represents the backbone structure in the yolov5 model structure. This row represents the structural information of each layer of yolov5. Different layer structures are defined in the common.py file. Here, Conv, C3, Concat, etc., args represent parameters, which are based on The form of the previous module determines the corresponding parameters.
head
head means the head part of yolov5, there is no neck in yolov5, and the similar structure is included in the head by the author.
Come here first this time, learn and communicate, and make progress with each other~

Guess you like

Origin blog.csdn.net/qq_43499961/article/details/127589071