It is obvious that there is enough storage but the error CUDA out of memory is reported

I encountered this problem when running yolov5. It is useless to reduce the batch-size=1, and it is useless to adjust the num workers to 1. The data set is reduced to only 50 pictures and cannot run, and it is not due to multiple cards. The problem of training (my computer has only one graphics card). Another computer with the same configuration can run, and it can run yesterday, but suddenly it can’t run today. The error shows Tried to allocate 64.00 MiB, but 2Gib free, I don’t have a picture here and there is no storage. The implication is that I still have 2GB of free video memory, but an error is reported when loading 64mb. The free video memory here refers to the fact that pytorch has been removed The free memory of the reserved video memory.

Solution process:

At first I thought there was a background process taking up memory, but both the task manager and nvidia-smi showed that there was no problem.

Later, I thought it was a problem with CUDA or cudnn. Uninstalling and reinstalling CUDA was useless. It was originally reported as CUDA out of memory, and later reported as broken pipe.

Later I thought it was a conflict between tensorflow and pytorch. Some debugging is still useless.

Solution:

In the end, I found out that it was an environmental problem. It should be due to the conflict caused by not knowing what modules were added during development yesterday. I have been in the habit of backing up the environment. I put the original environment back into the anaconda env and deleted the problem. environment, you can run.

Guess you like

Origin blog.csdn.net/m0_50317149/article/details/132308495