==== Test the bandwidth between dual 4090 graphics cards <Nvidia official test case>====
Refer to Li Mu's video: Single card, multi-card BERT, GPT2 training performance [10 billion model plan]_哔哩哔哩_bilibili
Refer to Li Mu project: GitHub - mli/transformers-benchmarks: real Transformer TeraFLOPS on various GPUs
Refer to other people's tests: https://gist.github.com/joshlk/bbb1aca6e70b11d251886baee6423dcb
Refer to the specific project: cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest at master NVIDIA/cuda-samples GitHub
Nvidia official general project address: GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Nvidia official general project download address: https://github.com/NVIDIA/cuda-samples.git
==First check the connection between dual 4090 graphics cards==
$ nvidia-smi topo -m
Re-implementation ideas: download source code -> compile program -> execute
==Download==
$ git clone https://github.com/NVIDIA/cuda-samples.git //Download the total project
$ sudo apt install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev //Install possible dependencies
==Compile==
$ cd ~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest //Enter the test project folder
$ make //compile the program
== execute ==
$ cd ~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest; ./p2pBandwidthLatencyTest //执行
or
$ cd ~/cuda-samples/bin/x86_64/linux/release; ./p2pBandwidthLatencyTest // execute