Yolov5 encounters nan problem when training model

I encountered a problem today. When I was training the yolov5 model, the weight file I selected used the yolov5s.pt file (GPU) to run normally. However, when I changed to a larger yolov5m.pt for training, all my data changed. nan, then I tried to change the CPU to run yolov5m.p, and the data and model returned to normal. Then I changed to smaller weight yolov5n.pt (running on GPU) and the model and data were normal.

My guess is that the performance of the graphics card limits the choice of weight models (I personally use mx450 2g), but strangely there is no video memory problem reported when running.

I see very few related questions on blogs, but I'm still puzzled.

In addition, I encountered another problem today. When I divided the training set and the verification set, the ratio I chose was 7:3. The model ran normally until 80% faster in the first round, but when it reached 80%-100%, all the data was lost. It became nan, and then I changed the ratio to 6:4 and ran it again, and the data returned to normal.

I hope someone with experience or similar experience can give me an answer.

Guess you like

Origin blog.csdn.net/qq_54575112/article/details/128508028