Solve the way that GPU has no running process, but the video memory has been occupied

Under normal circumstances, the video memory will be released when the process is stopped

But if the process is closed in an abnormal situation, it may not be released. At this time, this situation will occur:

Mon Oct 19 16:00:00 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:0D.0 Off |                    0 |
| N/A   38C    P0    35W / 250W |  16239MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The solution, of course, is to kill the process that normally uses video memory

To release the process, of course you need to find the process

fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        root      26031 F...m python
                     root      26035 F...m python
                     root      26041 F...m python
                     root      26050 F...m python
                     root      32512 F...m ZMQbg/1
/dev/nvidiactl:      root      26031 F...m python
                     root      26035 F...m python
                     root      26041 F...m python
                     root      26050 F...m python
                     root      32512 F.... ZMQbg/1
/dev/nvidia-uvm:     root      26031 F.... python
                     root      26035 F.... python
                     root      26041 F.... python
                     root      26050 F.... python
                     root      32512 F.... ZMQbg/1

Then use kill -9 26031 to kill the process, and the process releases resources. It is necessary to kill the processes queried above one at a time

It's normal without accident:

Guess you like

Origin blog.csdn.net/zhou_438/article/details/109162654