Common mistakes
python train.py --img 640 --batch 16 --epochs 10 --data ./data/custom_data.yaml --cfg ./models/custom_yolov5.yaml --weights ./weights/yolov5s.pt
1. Gradient explosion problem
Reason for error
This error is caused by using a leaf node variable view that requires a gradient when performing an in-place operation. In PyTorch, if a variable requires a gradient, its view also inherits this property. The in-place operation is to operate the variable in place, that is, directly modify the value of the variable, which will lead to the loss or inconsistency of the gradient information.
Solution
The simplest and crudest method is to find the yolo.py file in the models folder and add with torch.no_grad() under page 149: as shown below:
2. Insufficient GPU memory
Solution:
3. Tensor computing device conflict
The model and data are put into the GPU respectively, and the data and model are printed out on the GPU. The GPU's video memory is indeed occupied during runtime, but this error is still reported.
Solution:
I accidentally saw it here, and it happened to be solved. It was solved easily. Thank you cctv, thank you csdn, and thank you to this big guy.
source:
yolov5_obb error collection_while-L's blog-CSDN blog
4. Tensor computing device conversion
can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Solution:
5. nice, training result test
python detect.py --source ./inference/test/02.jpg --weight ./weights/helmet_head_person_s.pt