Use YOLOv7 to train your own data set


reference tutorial

preliminary work

Label with labelimg

  • Select the folder where the picture is located, and select the folder where the annotation file is saved.
  • Flexible use of shortcut keys, some shortcut keys can be viewed in the menu bar.
    W: Create a box, press and hold the left mouse button and drag (or right-click to generate a rectangular frame),
    A/D: The previous/next picture,
    Space: Highlight the marked picture,
    Ctrl+mouse wheel: Zoom in and out of the picture,
    Ctrl +F: Scale the picture to its original size,
    Ctrl+E: Press Ctrl+E after clicking the bounding box, edit label,
    Ctrl+S: Save,
    you can also change the color of the bounding box in the menu bar,
    the left option bar can Select the saved label file mode, VOC mode is xml, YOLO mode is txt, and json mode.
  • Modify the labeling mode in the view, select auto save mode, so that after labeling one piece, click the next one (or press D) to save it automatically.

reference

Put the file in the corresponding location

  • Put the used pictures in the yolov7-pytorch-master\VOCdevkit\VOC2007\JPEGImages folder;
  • Put the label file (xml format) generated by labelimg in the yolov7-pytorch-master\VOCdevkit\VOC2007\Annotations folder;
  • Add a txt file in yolov7-pytorch-master\model_data, and write what categories to predict;
    insert image description here
  • Modify the classes_path in the voc_annotation.py file to the path of the txt file in the previous article, modify the trainval_percent and train_percent according to the size of the dataset, run the voc_annotation.py file , and generate the following files
    insert image description here
    insert image description here
    . At this point, the preparation work is complete. Note that some other parameters in it may also need to be changed, basically change those parameters, each parameter has written meaning, choose to modify according to the situation.

train

Preparation

Mount the cloud disk:

from google.colab import drive
drive.mount('/content/drive')

Show current working path:

!pwd

Modify the path:

%cd /content/drive/MyDrive/yolov7-pytorch-master

insert image description here

And change the type of runtime to use GPU training:
insert image description here

Modify several parameters in the train.py file , mainly:

  • model_path: Corresponds to the weight parameters used in your prediction;
  • classes_path: Corresponds to the txt file where you want to predict the category;
  • Epoch and batch: set according to the size of the training set, generally require step>50000, step=dataset_size/batch_size*epoch.
  • input_shape: The size of the input image.
  • Probably just change these few.

prevent dropped calls

I heard that the frequency of dropped calls can be reduced by automatically clicking.
Press F12 in Google colab, click the console of the web page, paste the following code and press Enter:

function ConnectButton(){
	console.log("Connect pushed");
	document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click()
}
setInterval(ConnectButton,60000);

start training

Run the train.py file.

It ran for over 3 hours.
insert image description here

predict

Modify several parameters in the yolo.py file , mainly the following:

  • model_path: Corresponds to the weight parameters used in your prediction, which is the weight file trained by ourselves under the logs folder;
  • classes_path: the txt file corresponding to the categories distinguished by the corresponding dataset;
  • input_shape: the size of the input image, which is consistent with that in train.py;
  • letterbox_images: Add gray bars to the image to ensure that the main body of the image is not distorted;
  • nms_iou: nms (non-maximum value suppression) filters the box with the largest predicted score of the same category in a certain area.
    The smaller the nms is set, the stricter the screening, and the fewer final frames.
    Run the predict.py file ,
    enter the image path,
    insert image description here

View Results.

If you want to predict all files under a folder:
Modify the mode parameter in the predict.py file, traverse the img folder by default, and save it in the img_out folder.

Continue training after using YOLOv7 breakpoint

Modify the two parameters in train.py:

  1. model_path:
    insert image description here

  2. init_epoch:
    insert image description here

Use colab to debug

On the first line enter:

import pdb; pdb.set_trace() # debug mode

b / break Set a breakpoint
c / continue Execute to the breakpoint
If there is no breakpoint, execute all directly
l / list View the currently executing code
s / step Enter the function
r / return Execute the code until returning from the current function
q /exit abort and exit (jump out of debug mode)
n /next execute the next line
pp print the value of the variable
help help

Enter these commands after (Pdb) to operate.

Reference: [Colab] Debug method - PDB

What needs to be modified after changing the dataset

1. Put the picture, label and ImageSets\Main generated by voc_annotation.py in the corresponding position under the VOCdevkit\VOC2007 folder.
insert image description here
2. The file corresponding to the modification category is generally stored in the model_data folder, which is a txt file.
3. Pay attention to modifying the parameters of the freezing and unfreezing phases in train.py.
4. Take a general look at whether the parameters in train.py need to be modified, and the rest is to refer to the preparatory work during the previous training.

problems in training

Problem:
The training stops automatically within 3 minutes after starting several times, and there are ^C symbols in several error reports.

Epoch 1/450:  23% 6/26 [00:58<01:30,  4.51s/it, loss=10.3, lr=0.000125]^C

Reason:
This situation usually indicates that some error or exception occurred during the training process, causing the program to be forced to stop. In the information you provided, I saw a "^C" symbol, which usually means that the program was interrupted or forced to terminate.

There are many possible reasons, such as insufficient computer resources, bugs or exceptions in the code, problems with the dataset, etc. Here are some possible workarounds:

  1. Check your code for errors or exceptions. You can view the program's output or log files for exceptions or error messages, or use debugging tools to locate problems.

  2. Check that computer resources are sufficient. Training models requires a lot of computing resources, including CPU, memory, graphics card, etc. Insufficient computer resources may cause the program to run slowly or crash. You can try to reduce the batch size, reduce the model size, use more efficient algorithms, etc. to reduce the consumption of computing resources.

  3. Check the dataset for problems. There may be problems such as missing values, outliers, duplicate values, etc. in the data set, which may cause abnormalities in the program. Datasets can be preprocessed, such as filling missing values, removing outliers, etc.

  4. Try using a more stable training method. For example, use smaller learning rates, smaller batch sizes, use regularization, etc. to improve the stability of the model.

In conclusion, this situation may be due to many reasons, it is necessary to carefully examine the program and the data set, look for the problem, and try to use a more stable training method to avoid the crash of the program.

Solution:
I set the input_shape in the train.py code too large, which may lead to insufficient computing resources during the processing. Set it to [1120,1120] and modify it to [800,800].

Guess you like

Origin blog.csdn.net/ThreeS_tones/article/details/130044034