Hengyuan Cloud (Gpushare)_How to check the usage of the graphics card? Skills Giveaway 2

Article Source| Hengyuan Cloud Community

Original Address | [Tips - Graphics Card]


1. How to check the graphics card usage?

Execute the nvidia-smi command through the terminal to view the status of the graphics card, power consumption, memory usage, etc. of the graphics card.

root@I15b96311d0280127d:~# nvidia-smi
Mon Jan 11 13:42:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    On   | 00000000:02:00.0 Off |                  N/A |
| 63%   55C    P2   298W / 370W |  23997MiB / 24268MiB |     62%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------

Because the instances are all Docker containers, using nvidia-smi will not see the process due to the limitation of container PID isolation.

Execute the py3smi command in the terminal to see if any process is using the graphics card.

root@I15b96311d0280127d:~# py3smi
Mon Jan 11 13:43:00 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI                        Driver Version: 460.27.04                 |
+---------------------------------+---------------------+---------------------+
| GPU Fan  Temp Perf Pwr:Usage/Cap|        Memory-Usage | GPU-Util Compute M. |
+=================================+=====================+=====================+
|   0 63%   55C    2  284W / 370W | 23997MiB / 24268MiB |      80%    Default |
+---------------------------------+---------------------+---------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
| GPU        Owner      PID      Uptime  Process Name                   Usage |
+=============================================================================+
|   0          ???    10494                                          23995MiB |
+-----------------------------------------------------------------------------+

2. Does the GPU utilization increase during training?

Check the utilization rate of the graphics card during the training process, and find that the core utilization and power consumption of the graphics card are low, and the graphics card is not fully utilized.

This situation may be that in each training step, in addition to using the GPU, most of the time is consumed in the CPU, resulting in periodic changes in GPU utilization.

To solve the problem of utilization rate, the code needs to be improved. You can refer to Xi Xiaoyao's  low training efficiency? Does the GPU utilization go up?  of this article.

3. What are the versions of CUDA and CUDNN?

The CUDA Version checked by nvidia-smi is the version supported by the current driver, and does not represent the installed version of the instance.

The specific version is subject to the official image version selected when creating the instance.

# 查看 CUDA 版本
root@I15b96311d0280127d:~# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

# 查看 CUDNN 版本
root@I15b96311d0280127d:~# dpkg -l | grep libcudnn | awk '{print $2}'
libcudnn8
libcudnn8-dev

# 查看 CUDNN 位置
root@I15b96311d0280127d:~# dpkg -L libcudnn8 | grep so
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
...

4. Will it get stuck when starting training on an RTX 30 series graphics card?

Check to see if the CUDA version used by the library is lower than 11.0.

RTX 3000 series graphics cards require at least CUDA 11 and above. Using a version lower than 11 will cause the process to be stuck.

Guess you like

Origin blog.csdn.net/weixin_39881439/article/details/123898988