[Audio and video related] nvidia-smi command extension and problem analysis example (dmon/pmon/GPU performance related)

nvidia-smi common commands:

nvidia-smi dmon -s xxx (device monitor) View the relevant information of the device, the following are optional values

	如 nvidia-smi dmon -s t -i 1 查看gpu卡1的pcie读写带宽

p:电源使用情况和温度(pwr:功耗,temp:温度)
u:GPU使用率(sm:流处理器,mem:显存,enc:编码资源,dec:解码资源)
c:GPU处理器和GPU内存时钟频率(mclk:显存频率,pclk:处理器频率)
v:电源和热力异常
m:FB内存和Bar1内存
e:ECC错误和PCIe重显错误个数
t:PCIe读写带宽

nvidia-smi dmon -i 0 -s mutc -d 1 -o TD Use this command to directly view the information related to the current operation of gpu card 0

#Date       Time        gpu    fb  bar1    sm   mem   enc   dec rxpci txpci  mclk  pclk
#YYYYMMDD   HH:MM:SS    Idx    MB    MB     %     %     %     %  MB/s  MB/s   MHz   MHz
 20221215   15:43:54      1  3217    13    10    15   100    30    15    14  6250  1455
 20221215   15:43:55      1  3217    13     9    14    88    32    45    12  6250  1507
 20221215   15:43:56      1  3217    13     9    13    80    30    23     9  6250  1260
 20221215   15:43:57      1  3217    13     9    14    95    31    33    22  6250  1372
 20221215   15:43:58      1  3217    13    10    15   100    30    44    25  6250  1440
 20221215   15:43:59      1  3217    13    10    15   100    28    14    12  6250  1530
 20221215   15:44:00      1  3217    13    10    15   100    30    39    15  6250  1297

nvidia-smi pmon -i 1 Use this command to view the current running program usage of gpu card 1

# gpu        pid  type    sm   mem   enc   dec   command
# Idx          #   C/G     %     %     %     %   name
    1    3524892     C     8    13    95    29   Pangu          
    1    3524892     C     8    14    94    29   Pangu          
    1    3524892     C     8    14    95    30   Pangu          
    1    3524892     C     9    14    96    30   Pangu          
    1    3524892     C     8    14    96    30   Pangu 

lspci -vv |grep xxx -C50(xxx is the device name, such as A16) or nvidia-smi -q -i 1view the relevant information of the pci device, you can find the pci rate of the GPU.

Timestamp                                 : Thu Dec 15 15:51:49 2022
Driver Version                            : 515.43.04
CUDA Version                              : 11.7

Attached GPUs                             : 4
GPU 00000000:47:00.0
    Product Name                          : NVIDIA A16
    Product Brand                         : NVIDIA
    Product Architecture                  : Ampere
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : xxxxxxxxxxxx
    GPU UUID                              : xxxxxxxxxxxx
    Minor Number                          : 1
    VBIOS Version                         : 94.07.54.00.01
    MultiGPU Board                        : Yes
    Board ID                              : xxxxxxxxxxxxxxx
    GPU Part Number                       : xxxxxxxxxxxxxxx
    Module ID                             : 0
    Inforom Version
        Image Version                     : G171.0200.00.04
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : 515.43.04
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x47
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : xxxxxxxxxxx
        Bus Id                            : 00000000:47:00.0
        Sub System Id                     : xxxxxxxxxxx
        GPU Link Info
            PCIe Generation
                Max                       : 4
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 4x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 2000 KB/s
        Rx Throughput                     : 6000 KB/s
    Fan Speed                             : 0 %
    Performance State                     : P8

Example of problem analysis

12-way cuda->nv12->nv12->cuda codec performance and pcie read and write speed

#Date       Time        gpu    fb  bar1    sm   mem   enc   dec rxpci txpci  mclk  pclk
#YYYYMMDD   HH:MM:SS    Idx    MB    MB     %     %     %     %  MB/s  MB/s   MHz   MHz
 20221215   15:33:04      1  1149     7    22     6    31    11   566   635  6250  1755
 20221215   15:33:05      1  1149     7    23     7    46    13   670   672  6250  1755
 20221215   15:33:06      1  1149     7    22     6    35    11   619   738  6250  1755
 20221215   15:33:07      1  1149     7    19     5    31    10   565   548  6250  1755
 20221215   15:33:08      1  1149     7    20     6    36    11   485   641  6250  1755
 20221215   15:33:09      1  1149     7    18     6    36    11   466   555  6250  1755
 20221215   15:33:10      1  1149     7    20     5    31    10   481   595  6250  1755
 20221215   15:33:12      1  1149     7    21     7    43    12   512   518  6250  1755
 20221215   15:33:13      1  1149     7    18     6    32    10   564   593  6250  1755
 20221215   15:33:14      1  1149     7    18     6    35    10   383   605  6250  1755
 20221215   15:33:15      1  1149     7    21     6    39    11   497   601  6250  1755
 20221215   15:33:16      1  1149     7    19     6    35    11   488   565  6250  1755
 20221215   15:33:17      1  1149     7    20     6    36    11   504   539  6250  1755
 20221215   15:33:18      1  1149     7    20     6    37    11   486   655  6250  1755
 20221215   15:33:19      1  1149     7    19     6    36    10   643   703  6250  1755
 20221215   15:33:20      1  1149     7    19     6    34    11   408   609  6250  1755
 20221215   15:33:21      1  1149     7    21     6    36    11   356   580  6250  1755
 20221215   15:33:22      1  1149     7    23     6    41    11   513   582  6250  1755
 20221215   15:33:23      1  1149     7    21     6    36    11   691   654  6250  1755
 

30-way cuda->cuda codec performance

#Date       Time        gpu    fb  bar1    sm   mem   enc   dec rxpci txpci  mclk  pclk
#YYYYMMDD   HH:MM:SS    Idx    MB    MB     %     %     %     %  MB/s  MB/s   MHz   MHz
 20221215   15:43:07      1  3203    13     9    14   100    30    47     8  6250  1260
 20221215   15:43:08      1  3203    13     9    15   100    30    28     9  6250  1500
 20221215   15:43:09      1  3203    13     9    15   100    30    27    13  6250  1567
 20221215   15:43:10      1  3203    13     9    15   100    30    37    15  6250  1552
 20221215   15:43:11      1  3203    13    10    15   100    30    19    20  6250  1710
 20221215   15:43:13      1  3203    13    10    15   100    30    15     4  6250  1747
 20221215   15:43:14      1  3205    13    10    15   100    30    62    12  6250  1102
 20221215   15:43:15      1  3207    13    10    15   100    30    22    14  6250  1432
 20221215   15:43:16      1  3209    13     8    13    88    31    39     6  6250  1590
 20221215   15:43:17      1  3209    13     8    13    82    31    21    13  6250  1485
 20221215   15:43:18      1  3215    13     8    13    87    31    42    16  6250  1372
 20221215   15:43:19      1  3215    13     9    13    83    31    43    23  6250  1447
 20221215   15:43:20      1  3215    13    10    15   100    30    16    12  6250  1590
 20221215   15:43:21      1  3215    13    10    15   100    30    47     6  6250  1470

Judging from the results at 12 channels, no matter the GPU memory, performance, or codec performance has not reached the peak value, and the previously suspected transmission rate is only read: 600MB/s write: 600MB/s, and check the card's PCIe setting

 	        LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s (ok), Width x4 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

LnkCap is the graphics card configuration, LnkSta is the actual value, and the Speed ​​will be increased from 2.5GT/s (PCIE 1.0) to 16GT/s (PCIE 4.0) when there is a program running, and the Width(lane) is degraded, but it can be seen from the table It is concluded that there is still a bandwidth of 7.88GB/s. . .

insert image description here

Factors that may affect performance include temperature, etc., but the nvidia-smi -q -i 1temperature does not reach the temperature that slows down the GPU by checking the temperature. . .

Temperature
        GPU Current Temp                  : 87 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 88 C

Guess you like

Origin blog.csdn.net/Daibvly/article/details/128416006