Large model training graphics card comparison
A100 is the first choice for large model training, A40 is used for inference, and H100 is currently launched as the next generation replacement for A100.
Can I use 4090 for training large models?
It is not possible to use 4090 for training large models, but it is not only feasible to use 4090 for inference/serving, but it is also slightly more cost-effective than H100. In fact, the biggest difference between H100/A100 and 4090 is communication and memory, and there is not much difference in computing power.
H100 |
A100 |
4090 |
|
Tensor FP16 computing power |
989 Tflops |
312 Tflops |
330 Tflops |
Tensor FP32 computing power |
495 Tflops |
156 Tflops |
83 Tflops |
Memory Capacity |
80 GB |
80 GB |
24 GB |
memory bandwidth |
3.35 TB/s |
2 TB/s |
1 TB/s |
Communication bandwidth |
900 GB/s |
900 GB/s |
64 GB/s |
Communication delay |
~1 us |
~1 us |
~10 us |
selling price |
$30000~$40000 |
$15000 |
$1600 |