Optimal (SOTA) of deep learning models tend to take up a huge memory. Many GPU usually do not have enough VRAM to store and train these models.
In this article, we will be different types of existing GPU test, given in memory does not exceed the conditions in which they can support the language / image size model of training SOTA testing; performance will also be training for each GPU's benchmarking . GPU may be required to purchase and deploy engineering friend some advice .
The latest common GPU models and prices
As of February 2020, the GPU can train all SOTA language and image model:
· RTX 8000: 48 GB VRAM, ~$5,500.
· RTX 6000: 24 GB VRAM, ~$4,000.
· Titan RTX: 24 GB VRAM, ~$2,500.
The following GPU can train most (but not all) SOTA model:
· RTX 2080 Ti: 11 GB VRAM, ~$1,150. *
· GTX 1080 Ti: 11 GB VRAM, ~$800 refurbished. *
· RTX 2080: 8 GB VRAM, ~$720. *
· RTX 2070: 8 GB VRAM, ~$500. *
The following GPU is not suitable for training SOTA model:
· RTX 2060: 6 GB VRAM, ~$359.
* Means standard in the training on the GPU need mini-batch mode, thus reducing the accuracy of the model.
Overall GPU choice recommendations
RTX 2060 (6 GB): for personal spare time to explore the depth of learning.
RTX 2070 or 2080 (8 GB): for want-depth study depth study, but the GPU budget is $ 600-800. 8gb virtual memory can accommodate most models.
RTX 2080 Ti (11 GB): for want-depth study and deep learning, and GPU budget is around $ 1200. RTX 2080 Ti about 40% faster than RTX 2080.
Titan RTX and Quadro RTX 6000 (24 GB): suitable for all kinds of SOTA want to study deep learning model, but do not have enough budget to buy RTX 8000.
Quadro RTX 8000 (48GB): invest in the future, and may even have the opportunity to study SOTA depth learning model in 2020.
Image Model
The maximum supported memory Batch-Size
* Indicates GPU does not have enough memory to run the model.
Performance, the number of images to be processed per second measured
* Indicates GPU does not have enough memory to run the model.
Language Model
The maximum supported memory Batch-Size
* Indicates GPU does not have enough memory to run the model.
performance
* Indicates GPU does not have enough memory to run the model.
The results obtained by the Quadro RTX 8000
Figure 2. Throughput training for Quadro · RTX 8000's. Left: image model. Right: language model.
in conclusion
The language model than the model image from a larger benefit more GPU memory. Note that the right of the graph steeper than the left. This suggests that the language model is more memory-bound, and the image computing model by more restrictive.
GPU VRAM has a higher performance better, since a larger batch size helps saturated CUDA kernels.
GPU support with higher VRAM larger batch size is proportional. Behind the conclusion produce reasonable results: a 24 GB virtual memory, virtual memory GPU GPU can fit about three times larger than the batch size has 8 GB.
For long sequences, memory, language model is too dense, because attention is a quadratic function of the length of the sequence.
Related Image / language model resource
Image Model
Language Model
Original link (github address accessible in the figure):
https://lambdalabs.com/blog/choosing-a-gpu-for-deep-learning/