GPU selection research! 3090 is still the king of cost performance

Recently, the computing power is not enough, some 3D image segmentation with Transfomer, the existing graphics card memory can not be carried, or an experiment will run for more than a week. So recently, I have devoted time to investigate the GPU selection.

There are two 3090 graphics cards, because they are public version, the card shape is relatively large, dell's server can only fit two cards. The original idea was to make 3090 with 8 cards, but after consulting Dell's supplier, they said that they don't make 8 cards anymore, generally only 4 cards, and some models can achieve 6 cards. However, the 3090 video memory is only 24G, and this video memory is not enough to experience the 3D image segmentation calculation under the condition of large batch. So I turned my attention from consumer-grade graphics cards to professional computing cards.

Nvidia graphics card models seem to be dazzling, but according to the specific needs of use, there are only a few graphics cards that meet the requirements. The introduction of several mainstream GPU models of Nvidia is shown in the table below.

fc5ebc51b721dae66a62509422782007.png

As can be seen from the table, except for the professional computing cards of the A series and V series, the rest are consumer-grade graphics cards. Among them, TITAN Xp, 1080Ti and 3060 can all be used as entry-level players . The video memory is not so large, but it is no problem as a small and medium-sized model for entry-level running. As an advanced, 2080Ti, A4000, A5000, 3080Ti and 3090 are all suitable , especially 3090, which can be regarded as the king of cost performance , because of its relatively large memory bandwidth, although single-fine and semi-fine are weaker than A40 professional computing cards, But the measured speed on most algorithms is not worse than the A40. As for the A40, it can be regarded as a 3090 with an expanded video memory. For those who have certain requirements for video memory like the author currently does, the A40 is a good choice. V100 is the king of the old generation of professional computing cards, while A100 is the king of the new generation of professional computing cards. Graphics cards of this level have no other disadvantages except for being expensive.

05c8d1adf5dd1d835404ae6033df1c3e.png

Nvidia RTX 3090

For more specific GPU parameter information, please refer to this address:

https://www.techpowerup.com/gpu-specs/

The following is the performance measurement of 3090 and A40 on ResNet50 and ViT.

3090:

>>> ResNet50
Namespace(device=0, model='resnet50', precision='float16', train=False)
Iteration 0, 2294.06 images/s in 0.837s.
Iteration 1, 2391.29 images/s in 0.803s.
Iteration 2, 2396.06 images/s in 0.801s.
Iteration 3, 2394.62 images/s in 0.802s.
Iteration 4, 2402.61 images/s in 0.799s.
Namespace(device=0, model='resnet50', precision='float32', train=False)
Iteration 0, 1453.34 images/s in 1.321s.
Iteration 1, 1490.90 images/s in 1.288s.
Iteration 2, 1491.79 images/s in 1.287s.
Iteration 3, 1493.76 images/s in 1.285s.
Iteration 4, 1494.50 images/s in 1.285s.


>>> ViT Transformer
Namespace(device=0, model='vit_base_patch16_224', precision='float16', train=False)
Iteration 0, 1044.44 images/s in 1.838s.
Iteration 1, 1047.37 images/s in 1.833s.
Iteration 2, 1046.37 images/s in 1.835s.
Iteration 3, 1044.68 images/s in 1.838s.
Iteration 4, 1043.91 images/s in 1.839s.
Namespace(device=0, model='vit_base_patch16_224', precision='float32', train=False)
Iteration 0, 596.59 images/s in 3.218s.
Iteration 1, 599.41 images/s in 3.203s.
Iteration 2, 598.86 images/s in 3.206s.
Iteration 3, 597.92 images/s in 3.211s.
Iteration 4, 597.46 images/s in 3.214s.

A40:

>>> ResNet50
Namespace(device=0, model='resnet50', precision='float16', train=False)
Iteration 0, 1837.41 images/s in 1.045s.
Iteration 1, 1892.04 images/s in 1.015s.
Iteration 2, 1893.29 images/s in 1.014s.
Iteration 3, 1892.99 images/s in 1.014s.
Iteration 4, 1892.73 images/s in 1.014s.
Namespace(device=0, model='resnet50', precision='float32', train=False)
Iteration 0, 1102.49 images/s in 1.742s.
Iteration 1, 1115.45 images/s in 1.721s.
Iteration 2, 1118.49 images/s in 1.717s.
Iteration 3, 1117.32 images/s in 1.718s.
Iteration 4, 1117.80 images/s in 1.718s.


>>> ViT Transformer
Namespace(device=0, model='vit_base_patch16_224', precision='float16', train=False)
Iteration 0, 1155.09 images/s in 1.662s.
Iteration 1, 1153.70 images/s in 1.664s.
Iteration 2, 1152.89 images/s in 1.665s.
Iteration 3, 1150.99 images/s in 1.668s.
Iteration 4, 1150.53 images/s in 1.669s.
Namespace(device=0, model='vit_base_patch16_224', precision='float32', train=False)
Iteration 0, 675.17 images/s in 2.844s.
Iteration 1, 680.69 images/s in 2.821s.
Iteration 2, 679.15 images/s in 2.827s.
Iteration 3, 678.90 images/s in 2.828s.
Iteration 4, 678.21 images/s in 2.831s.

It can be seen that although A40 is a professional computing card with large memory, and both single-precision and semi-precision are stronger than 3090, the measured performance of the model may not be as good as that of 3090 due to the disadvantage of memory bandwidth.

So, to sum up, try to buy 3090 when buying a graphics card!

References:

https://www.autodl.com/docs/gpu_perf/

Past highlights:

"Machine Learning Formula Derivation and Code Implementation" with the book PPT example

 A year later! Deep Learning Semantic Segmentation Theory and Code Practice Guide.pdf The second edition is here!

 New Book First Release | "Machine Learning Formula Derivation and Code Implementation" is officially published!

"Machine Learning Formula Derivation and Code Implementation" will be accompanied by PPT and video explanations!

 In 2021, I read 32 books!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324486205&siteId=291194637