Operation of Yolov5 target detection project and common errors

Preface

In the last issue, we introduced a target detection model and built the required environment configuration. This issue mainly records and shares how the project runs and common errors during operation. After all, errors are also very common in deep learning environment construction. It is necessary to analyze and learn how to solve error reporting problems.

1. Preparation

Please refer to my previous blog for the source code and environment construction of this issue:

Pytorch builds yolov5 target detection environment configuration_yutu-7's blog-CSDN bloghttps://blog.csdn.net/m0_73414212/article/details/129770438For software configuration, please refer to the article (I use VScode and Anaconda):

vscode and Anaconda installation and related environment configuration_vscode configuration anaconda environment_yutu-7's blog-CSDN bloghttps://blog.csdn.net/m0_73414212/article/ details/129704221

The project training data set (voc data set used) can be obtained through network disk resources:

Link:Baidu Cloud Disk Please enter the extraction code Extraction code: j5ge

The weights required for training can be downloaded from Baidu Cloud Disk: Link: Please enter the extraction code for Baidu Cloud Disk Extraction code: 3mjs

After obtaining the above related resource package, we first open the voc data set, which is the VOC07+12+test folder, and click all the way to the VOC2007 folder. My path here is: F:\VOC07+12+test\VOCdevkit\ VOC2007:

 Copy the three corresponding files here, and then return to the yolov5-pytorch-main\yolov5-pytorch-main\VOCdevkit\VOC2007 path in the source code file, because I obtained the source files directly by downloading the installation package. If it is the git clone method, the path should be: yolov5-pytorch\VOCdevkit\VOC2007. When you open three folders with the same name, just paste them directly to replace the original files:

 

 Then we copy the trained weight file and paste it into the yolov5-pytorch-main\yolov5-pytorch-main\model_data path:

We use yolov5_s when training here. If you want to achieve better results, you can choose yolov5_x, but the corresponding training time will be longer and it will consume more video memory.

2. Running process

2.1 Open the project file

First we need to use vscode to open the root directory of the project folder. This is very important. If you open something other than the root directory, the source code will also appear, but when running, there will be an error that the corresponding module cannot be found. If you obtained the source code by downloading the installation package, the root directory refers to the second yolov5-pytorch-main file directory under the yolov5-pytorch-main\yolov5-pytorch-main path:

If it is the git clone method, the corresponding first yolov5-pytorch file is the corresponding root directory file.

2.2 Run the voc_annotation.py file

Click on the voc_annotation.py file and change the annotation_mode to 2. This is to generate label data for the training set and validation set. When the value here is 0, the entire data set will be divided and the corresponding label data will be generated. If it is 1 At this time, only the data set will be divided, because we have already divided the data set in advance, so we can directly generate the label data:

 

The division of the data set can be viewed from yolov5-tf2-main\VOCdevkit\VOC2007\ImageSets\Main:

 

The 2007_train.txt and 2007_val.txt files will be generated in the final root directory:

 

2.3 Run the train.py file

Then run the train.py file. If the data set you are using is an official data set like me, then there is no need to modify it here, just run it. As for the errors and solutions that occur during the running process, we will talk about them later:

During the training process, the weights obtained during the training process will be saved in the logs folder in the root directory. When the training reaches 100%, the final weight file generated can be used to detect the target graphic:

 

In addition, switch to the task manager during the running process, click Performance, and set the GPU to CUDA to see the GPU occupancy during the training process:

2.4 Run the predict.py file

Before running, you need to return to the yolo.py file to modify the model_path and classes_path related paths. The model_data/yolov5_s.pth corresponding to model_path here is the weight file required for prediction, which can be replaced by the weight data of the last training, such as hypothesis logs/ep096-loss0.075-val_loss0.054.pth:

Then the model_data/coco_classes.txt corresponding to classes_path is the type to be recognized. There are 20 types recognized in coco_classes.txt:

 It depends on the type of target you want to recognize. For example, if you only want to recognize people and cars, then you can create a clss.txt file in the model_data folder, enter the person and car and save it, using this path That is model_data/clss.txt instead of coco_classes.txt:

Of course, if you happen to recognize the same content as the above 20 categories, and you don’t want to train, you can use the provided weight file without making any changes, that is, you do not need to perform the above operations, and the final prediction can be obtained Not a bad result.

After that, you can return to the predict.py file to make predictions. After running the program:

Configurations:
----------------------------------------------------------------------
|                     keys |                                   values|
----------------------------------------------------------------------
|               model_path |                  model_data/yolov5_s.pth|
|             classes_path |              model_data/coco_classes.txt|
|             anchors_path |              model_data/yolo_anchors.txt|
|             anchors_mask |        [[6, 7, 8], [3, 4, 5], [0, 1, 2]]|
|              input_shape |                               [640, 640]|
|                 backbone |                               cspdarknet|
|                      phi |                                        s|
|               confidence |                                      0.5|
|                  nms_iou |                                      0.3|
|          letterbox_image |                                     True|
|                     cuda |                                     True|
----------------------------------------------------------------------
Input image filename:

 Enter img\street.jpg to identify the street view image:

Input image filename:img\street.jpg
b'person 0.90' 550 69 940 279
b'person 0.88' 506 913 994 1144
b'person 0.87' 519 508 854 671
b'person 0.80' 540 438 858 545
b'person 0.78' 566 384 697 424
b'person 0.53' 580 208 693 262
b'person 0.51' 560 331 690 372
b'bicycle 0.90' 714 781 1029 1250
b'car 0.86' 585 657 771 964
b'car 0.81' 610 1 678 49
b'car 0.69' 544 584 718 797

 If you feel that this image recognition does not meet your expectations, and some targets with relatively small IoU do not want to be detected to avoid recognition boxes everywhere, then you can increase the IoU recognition threshold through non-maximum suppression. , in the code, you can return to the yolo.py file and modify the value of nms_iou to achieve this. The original value was 0.3, now change it to 0.01 and try again:

It was found that many targets with small overlap were suppressed.

3. Common errors and solutions during operation

3.1 The computing power does not support error reporting

UserWarning: 
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at PyTorch

 This type of error occurs because the cuda version you installed is too low. For 30 graphics cards, the cuda version needs to be installed above 11.0.

3.2no module problem

ModuleNotFoundError: No module named ‘tensorboard‘

 Here, if the environment is configured according to the requirements file, such a module is indeed missing. We can just go back to the torch environment to install this software package. There is no need to specify the version:

(torch) C:\Users\Hasee>pip install tensorboard

 If there are other module errors such as:

no module name utils.utils、no module named 'matplotlib'

 The reason for this problem is that the root directory is incorrect. I have mentioned it above, and you will understand it after reading it.

3.3no attribute problem

AttributeError: 'numpy.ndarray' object has no attribute 'split'
TypeError: cat() got an unexpected keyword argument 'axis',Traceback (most recent call last),AttributeError: 'Tensor' object has no attribute 'bool'

 This problem usually occurs because the installation is not according to the specified version. Just re-specify the version to install.

3.4RuntimeError

RuntimeError: CUDA out of memory.

The reason for this is actually because the video memory is limited, not necessarily because of insufficient video memory.

You can try to reduce the size of batch_size, but make sure it is a multiple of 2, that is, the minimum size must be 2.

If it still cannot be solved, you can solve it by increasing the virtual memory. The specific method is:

Click Advanced system settings:

Click Advanced, then click Settings in Performance:

Then click Advanced and click Change in Virtual Memory:

 If your environment is set up on the d drive, you can increase the virtual memory of the d drive and make it as large as possible. You don’t need to care about the actual video memory of your computer. Here I set it to 50G:

If you are prompted after setting that you need to restart your computer for it to take effect, just restart it.

In addition, if you are also playing games or running other projects while running the project, you can check the video memory usage through the command nvidia-smi:

If the corresponding value is too large, you can terminate some processes to release video memory. For example, if I want to release the process with PID 836, I can enter:

taskkill -PID 836 -F

 After pressing Enter:

(base) C:\Users\Hasee>taskkill -PID 836 -F
成功: 已终止 PID 为 836 的进程。

 Enter nvidia-smi again:

(base) C:\Users\Hasee>nvidia-smi

 After pressing Enter:

The video memory usage has been reduced by 40%. Of course, there is no need to cancel these small processes, but when your video memory is about to be full, the effect of using commands to terminate some large processes will be immediate.

3.5cuDNN error

cuDNN error:CUDNN_STATUS_INTERNAL_EPPOR

The reason here is that the pytorch version does not correspond to the cuda version. During installation, just go to the pytorch official website to install the corresponding torch version.

 

 

 

Guess you like

Origin blog.csdn.net/m0_73414212/article/details/129779475