How to choose a graphics card for Stable Diffusion AI training?

Mainly used to generate images from text, Stable Diffusion is a growing application of artificial intelligence technology in the content creation industry. To run Stable Diffusion on your local computer, you need a powerful GPU for its heavy demands. A powerful GPU will allow you to generate images faster, and a more powerful GPU with lots of VRAM will allow you to create higher resolution images faster. So, what is the best consumer GPU for Stable Diffusion? Let's look at the Stable Diffusion performance on some GPUs from NVIDIA and AMD to find out.

About Stable Diffusion

What is Stable Diffusion?

Stable Diffusion is a machine learning model. It is increasingly used in content creation due to its ability to generate images based on text cues. Stable Diffusion is unique in that it lacks commercially developed software, instead relying on various open source applications. Also, unlike other similar text-to-image models, it is usually used locally on the local system rather than using an online web service.

Stable Diffusion can run on mid-range GPUs with at least 8GB of VRAM. However, it greatly benefits from a powerful modern GPU with more VRAM.

Composition framework of Stable Diffusion

You can directly use the version of Stable Diffusion developed by Stability AI and Runway. However, most people use web-based versions created by third parties. The most commonly used Stable Diffusion is:

  • Automatic 1111: This is primarily for use with NVIDIA GPUs, although AMD and Apple Silicon also have offshoots. It allows you to use xformers, which can significantly increase the performance of NVIDIA GPUs.
  • SHARK : SHARK is an alternative to Automatic 1111. It natively supports NVIDIA and AMD GPUs. However, AMD GPUs tend to have higher performance, while NVIDIA GPUs tend to have lower performance.
  • Customization: Some people create their own applications with the functionality they need, since Stable Diffusion is public, anyone can use it out of the box.

Each implementation has unique strengths and weaknesses in terms of functionality and usability. From a performance and benchmarking standpoint, Automatic 1111 and SHARK are recommended. Depending on the GPU you want to test, it is recommended to use both Automatic 1111 and SHARK. NVIDIA GPUs were tested using Automatic 1111 and AMD GPUs were tested using SHARK.

Note : Stable Diffusion is constantly being updated, so performance may vary depending on the version you are using.

What affects the performance of Stable Diffusion?

First, the Stable Diffusion setup and model

The most commonly tuned settings, such as hints, negative hints, cfg scale , and seeds , do not have a noticeable impact on performance. Generating an image of a dog or a mountain landscape takes the same amount of time. Even the chosen model will often result in only small differences in generation time. Looking at the images below, despite having different hints and cfg scales, they generate at almost the same time.

Other settings such as step size, resolution , and sampling method will affect the performance of Stable Diffusion.

  • Steps : Adjusting the steps will affect the time it takes to generate the image, but will not change the processing speed of iterations per second. Although many users choose 20 to 50 steps, increasing the number of steps to around 200 tends to produce more consistent results from run to run.
  • Resolution : Not only does image resolution have the greatest impact on performance, it also affects the amount of VRAM required to generate the image. For benchmarking purposes, you can use 512×512 resolution to ensure compatibility with various GPU models.
  • Sampling method (Euler, DPM, etc.). It can significantly affect build time, with some options taking about twice as long as others. "Euler" and "Euler a" are the most widely used and tend to give the best performance. Other methods (such as DPM2) tend to take about twice as long. For GPU benchmarking purposes, it is recommended to stick to Euler's variant for consistency.

followed by hardware

  • GPU : The GPU has the greatest impact on speed and image quality. More powerful GPUs with higher memory bandwidth and more VRAM can generate stable diffusion images faster, especially at higher resolutions. The amount of VRAM on the GPU determines the highest resolution image that can be generated. At least 8GB is recommended, 12GB or more is required for higher resolutions.
  • CPU : While the GPU handles most of the heavy lifting, a fast CPU can still improve performance to a lesser extent. A CPU with a higher clock speed and more cores can provide a small boost.
  • RAM : System memory helps provide data to the GPU, so having at least 16GB of RAM ensures optimal performance. More RAM (up to 32GB or 64GB) can further increase speed.

Best GPU for Stable Diffusion

To understand which consumer GPUs are best suited for Stable Diffusion, we will examine the Stable Diffusion performance of these GPUs on their two most popular implementations, their latest public releases.

Many Stable Diffusion implementations show how fast they work by counting "iterations per second " or " it/s ". Therefore, to check Stable Diffusion performance, this metric is commonly used and a good measure. Iterations per second is calculated by dividing the number of iterations by the number of seconds required to generate the image. For example, if it takes 15 seconds to generate an image with 200 iterations, then the number of iterations per second is approximately 13.3 ( that is, 200 iterations divided by 15 seconds).

First, let's take a look at the benchmark results that Puget Systems tested on the 4000-series GPUs, as well as the last three generations of top-of-the-line GPUs from NVIDIA and AMD, the RX 7900 XTX and RX 6900 XT.

Automatic 1111 performance

Automatic 1111 is the most commonly used representation of Stable Diffusion and generally provides the best performance on NVIDIA GPUs.

NVIDIA is significantly better than AMD in this regard. In NVIDIA's list of GPUs, the RTX 4090 is the winner, delivering the highest performance results over the Automatic 1111. Even the RTX 3060 Ti is twice as fast as the Radeon GPU. Only the GTX 1080 Ti is worse than the RX 7900 XTX.

The newer 4000-series GPUs offer a clear advantage in image generation speed, with performance scaling linearly with price. The fact that the RTX 4070 Ti was around 5% faster than its predecessor, the RTX 3090 Ti, and the RTX 4060 Ti was almost 43% faster than the 3060 Ti shows. If you still have a 2000 or 1000 series GPU, even a mid-range 4000 series GPU can provide a significant performance boost.

Shark performance test

 

Although SHARK is not as commonly used as Automatic 1111, many AMD users prefer it. Looking at the benchmark results above, it's clear why.

The performance of the RX 7900 XTX quadrupled with the help of SHARK, with similar iterations per second as the RTX 4090 running 1111. Likewise, the RX 6900 XT has an even bigger performance boost of 1,100%, but that only makes it competitive with the lower end. NVIDIA GPUs tested.

With SHARK, the NVIDIA GPU performed about 30% worse than Auto 1111, although the relative performance remained the same.

Important Note : It is very important to use Stable Diffusion correctly, as it can greatly affect performance. It can go from a 30% reduction to a massive 1100% increase! The above GTX 1080 Ti results bear this out. In this test by Puget Systems, it failed to run SHARK.

Summarize

What stands out the most is the huge difference in performance between the various Stable Diffusion implementations. NVIDIA GPUs provide maximum performance on Automatic 1111, while AMD GPUs work best on SHARK. The respective implementations of the top GPUs have similar performance.

If you haven't decided on a particular implementation, both NVIDIA and AMD's high-end GPUs offer excellent performance. GeForce RTX 4090 and Radeon RX 7900 XTX both offer around 21 it/s in the preferred implementation of Stable Diffusion.

It's worth noting that Stable Diffusion is an evolving model with a set of tools. The way things work today is very different from how they work a few months ago or in the future . Its performance will change in the coming months and years. Therefore, performance results in this article may change over time. As an informed reader, we hope you understand that these benchmark results are for informational purposes only.

If you're interested in testing the performance of your current implementation of Stable Diffusion on a top-of-the-line GPU like the RTX 4090, check out our service below.

Zanqi Cloud Workstation - Cloud Service Platform of Stable Diffusion

Stable Diffusion is primarily designed for single GPU use; however, with some additional software and configuration, it can take advantage of multiple GPUs. By spreading the work across multiple GPUs, the overall iteration speed can be increased. While most Stable Diffusion implementations are designed to run on a single GPU by default, one commonly used implementation (Automatic1111) has the option to enable multi-GPU support with minimal additional configuration.

The stronger the computing power of running Stable Diffusion, the faster the drawing will be. The larger the video memory, the higher the resolution of the set picture, so the general configuration of the computer is still unable to support stable diffusion, so it is recommended to choose Zanqi Cloud Workstation. Ready-to-use, on-demand use, efficient design assistance.

Shangzanqi Cloud Workstation does not require complicated installation and deployment, and can enjoy the industry-leading configuration of the machine anytime, anywhere, with high-quality and stable output works, reducing local configuration time and cost investment, completely free from worrying about computer freezes and not running And other issues.

 

 

Guess you like

Origin blog.csdn.net/XDEMO_/article/details/132454910