How to choose GPU for running projects, summary of different GPU features

The GPU is very important. Although the GPU does not directly participate in the calculation of the deep learning model, the GPU needs to provide data processing capabilities greater than the model training throughput.

Here's a rundown of some GPU models:

  • Tesla P40, 24GB of video memory, single-precision (FP32) 11.76T, half-precision (FP16) 11.76T, it is an older Pascal architecture GPU, and it is a very good choice for algorithms before cuda11.x that require large video memory
  • TITAN Xp, memory 12GB, single-precision (FP32) 12.15T, half-precision (FP16) 12.15T, it is an older Pascal architecture GPU, it is more suitable for getting started.
  • 1080Ti, memory 11GB, single-precision 11.34T, half-precision 11.34T, it is the same generation card as TITAN NXp, it is also suitable for entry, but 11GB of memory is occasionally embarrassing.
  • 2080Ti, memory 11GB, single-precision 13.45, half-precision 53.8, it is a Turing architecture GPU, the performance is not bad, the older generation of GPU is more suitable for mixed-precision computing, cost-effective.
  • V100, video memory 16/32GB, single-precision 15.7T, half-precision 125T, it is the old generation of professional computing card king, half-precision performance is high and suitable for mixed-precision calculations
  • 3060, memory 12GB, single-precision 12.74T, semi-precision about 24T, if the memory of 1080Ti is just awkward, 3060 is a good choice, suitable for beginners, need to use cuda11.x
  • A4000 has 16GB of video memory, 19.17T of single-precision, and about 76T of semi-precision. The video memory and computing power are relatively balanced, suitable for advanced processes, and cuda11.x is required
  • 3080Ti, memory 13GB, single-precision 34.10T, semi-precision about 70T, performance steel gun, if the memory requirements are not high, it is a very suitable choice, need to use cuda11.x
  • A5000, video memory 24GB, single-precision 27.77T, semi-precision about 117T, performance steel gun, if you think the memory of 3080Ti is not enough, A5000 is a suitable choice, and the semi-actuarial computing power is high for mixed precision, you need to use cuda11.x
  • 3090, video memory 24GB, single-precision 35.58T, half-precision about 71T, you can make the expanded memory version of 3080Ti first, the performance and memory size are very sufficient, the usability is very strong, the first choice for cost performance, need to use cuda11.x
  • A40, memory 48GB, single-precision 37.42T, half-precision 149.7T, can be regarded as the expanded memory version of 3090, the computing power is basically the same as that of 3090, so choose according to the size of memory, you need to use cuda11.x
  • A100SXM4, video memory 40/80GB, single-precision 19.5T, semi-precision 312T, a new generation of professional computing card king, except that it is expensive and has no disadvantages, large memory, very suitable for semi-fine computing, because of NVLinK blessing, multi-card parallel join ratio is very high High, need to use cuda11.x

Guess you like

Origin blog.csdn.net/m0_73939236/article/details/132042884