What does the output of nvidia-smi represent?

nvidia-smi(NVIDIA System Management Interface)It is the system management interface of the GPU based on nvml, mainly used for the management and status monitoring of the graphics card.

nvidia-smiNVSMI for short, provides the function of monitoring GPU usage and changing GPU status. It is a cross-platform tool that supports all standard NVIDIA drivers supported by Linux and 64-bit systems starting from Windows Server 2008 R2. This tool is attached to the N card driver, as long as the driver is installed, there will be this command

1. Basic commandsnvidia-smi

After the installation is complete, enter the command on the command line or terminal nvidia-smi, and you can see the following information:

insert image description here

Contains the signal, temperature, fan, power, video memory, usage, computing mode and other information of the graphics card.

Detailed table parameters:

  • GPU: The number of the GPU in this machine (when there are multiple graphics cards, the number starts from 0). The number of the GPU on the picture is: 0

  • Fan: fan speed (0%-100%), N/A means no fan

  • Name: GPU type, the type of GPU on the picture is: Tesla T4

  • Temp: The temperature of the GPU (GPU temperature is too high will cause the frequency of the GPU to drop)

  • Perf: The performance status of the GPU, from P0 (maximum performance) to P12 (minimum performance), the picture is: P0

  • Persistence-M: The state of the persistence mode. Although the persistence mode consumes a lot of energy, it takes less time to start a new GPU application. The figure shows: off

  • Pwr: Usage/Cap: energy consumption display, Usage: how much is used, how much is the total Cap

  • Bus-Id: GPU bus related display, domain: bus: device.function

  • Disp.A: Display Active, indicating whether the display of the GPU is initialized

  • Memory-Usage: memory usage

  • Volatile GPU-Util: GPU usage

  • Uncorr. ECC: About ECC, whether to enable error checking and correction technology, 0/disabled, 1/enabled

  • Compute M: computing mode, 0/DEFAULT, 1/EXCLUSIVE_PROCESS, 2/PROHIBITED

  • Processes: Display the video memory usage, process number, and GPU occupied by each process

2. Practical commands

#帮助
nvidia-smi -h

#持续监控gpu状态 (-lms 可实现毫秒级监控)
nvidia-smi -l 3   #每三秒刷新一次状态,持续监控

#列出所有GPU,大写L
nvidia-smi -L

#查询所有信息
nvidia-smi -q

#查询特定卡的信息,0.1.2.为GPU编号
nvidia-smi -i 0

#显示特定的信息  MEMORY, UTILIZATION, ECC, TEMPERA-TURE, POWER, CLOCK, COMPUTE, PIDS, PERFORMANCE, SUPPORTED_CLOCKS, #PAGE_RETIREMENT, ACCOUNTING 配合-q使用
nvidia-smi -q -d MEMORY

#监控线程
nvidia-smi pmon

#监控设备
nvidia-smi dmon

#此外还有一系列可以配置模式的属性,可以直接利用nvidia-smi配置
#查询命令外的配置命令,请慎重使用!!!
#对于配置类命令,务必事先确认命令的含义!!!

REF:http://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf

Guess you like

Origin blog.csdn.net/weixin_45277161/article/details/131943221