RTX 4090 deep learning performance measurement is here! Model training can be improved by 60~80%

Recently, we have conducted a complete machine test on the RTX 4090 turbo version. This article will share the performance test results of single-card, 4-card, and 8-card RTX4090 to fully evaluate its performance advantages compared to the previous generation RTX30 series.
First let's take a look at the hardware configuration for this test.

Test hardware configuration

insert image description here
Briefly introduce the platform used this time is AMD SYS-420GP-TNR , this GPU system is designed for AI and graphics-intensive workloads, 4U dual processors (3rd generation Intel® Xeon®), dual root GPU system, up to 10 PCIe GPUs , detailed product parameters can be found at https://www.hynx.com.cn/product/detail/65

Software Environment

insert image description here

Supermicro server installed 8 turbo version RTX 4090 graphics card sample picture

Supermicro SYS-420GP-TNR server is installed with 8 RTX 4090 (turbo version) graphics cards in good condition, with sufficient front and rear space and no structural interference.
insert image description here
insert image description here
insert image description here

Turbo version RTX 4090 performance test

  • Comparison of Graphics Card Hardware Parameters
    For a more intuitive expression, we compare the performance parameters of the Geforce RTX 4090 graphics card with Geforce RTX 3090 and RTX 3080. First, the hardware parameters of the three GPU cards are as follows
    :insert image description here
  • Single graphics card FP32/16 ResNet50 performance test
    Test task
    TensorFlow-1.15.5: ResNet50, fp32 and fp16
    to test single graphics card TensorFlow FP32, FP16 performance, using NVIDIA official NGC container nvcr.io/nvidia/tensorflow:23.01-tf1-py3 , command example:
    python resnet.py --layers=50 --precision=fp16 --batch_size=128
    python resnet.py --layers=50 --precision=fp32 --batch_size=128
    insert image description here
    ​Result analysis:
    RTX4090 graphics card benefits Due to the new architecture and process, the performance is 40%-80% higher than that of RTX3090, and even greater than that of RTX 3080 (RTX3080 is a 10GB video memory version, and some test items will indicate that the memory capacity is insufficient).

- 8-card RTX 4090 performance test
Test task
TensorFlow-1.15.5: ResNet50, fp32 and fp16
Test the TensorFlow FP32 and FP16 performance of 8 RTX4090 graphics cards, using the NVIDIA official NGC container nvcr.io/nvidia/tensorflow:23.01-tf1-py3
command Example:
mpiexec --allow-run-as-root --bind-to socket -np 8 python resnet.py …
​Resultinsert image description here
analysis:
Under the 420GP-TNR platform, the performance of GPU multi-card is greatly improved compared with the overall performance of a single card , due to the pcie bandwidth limitation and additional communication overhead, the overall improvement is not linear. The actual application can optimize the code according to the specific environment, and there is still room for improvement in the multi-GPU performance acceleration ratio.

We have conducted corresponding tests on the temperature and power consumption of the whole machine. SYS-420GP-TNR is equipped with 2000W titanium level (2+2) redundant power supply, and its conversion efficiency is 96%, which can meet the power supply requirements of 8-card GPU whole machine . The turbo version of the RTX 4090 has enhanced its own turbo fan heat dissipation. It does not need to install an auxiliary fan at the rear of the chassis, but it can also effectively control the temperature and ensure continuous and stable operation . (For detailed reports, please pay attention to the official account reply: 4090 to receive)

test summary

The performance of the new generation RTX 4090 graphics card has been greatly improved compared with the previous generation RTX 30 series, up to nearly 80% . After matching, it can provide powerful computing performance of the whole machine.

If you want to learn more about the RTX 4090 machine test, you can follow the public account [Haoyuan Nuoxin], reply 4090 to get a detailed test report, or visit: www.hynx.com.cn, or call【 400-6997-916】~

おすすめ

転載: blog.csdn.net/weixin_50197960/article/details/129557219
おすすめ