The principle and use of Stable Diffusion Web UI

Stable Diffusion is a set of image generation solutions based on Diffusion diffusion model generation technology. With the continuous development of technology and the continuous optimization of the engineering details in the industry, it can finally run on personal computers. This article will start from the github download Talk about how to use Stable Diffusion Web UI to generate AI images.

1. Talk about Diffusion

1.1 Concept introduction

StableDiffusion is an application based on the diffusion model, so let's first talk about what a diffusion model is.

We know that before the emergence of the diffusion model, GAN (confrontation generation network) was relatively popular. GAN is composed of a generator and a discriminator.

But GAN also has disadvantages. First of all, the intermediate N steps of the continuous evolution of the generator and the discriminator are completely black boxes and cannot be debugged. Secondly, there are many problems such as difficult training and model collapse.

The Diffusion model is not a new technology, but more like a continuous advancement in another technical direction. Compared with GAN, Diffusion will disassemble every step of generation for repeated training.

Just like drawing a stick figure, GAN is the continuous evolution of the generator and the discriminator, and finally draws the stick figure. And Diffusion is to disassemble each step of the painting and train it continuously, and finally draw all the steps completely:
insert image description here

1.2 Talk about the principle

Duffsion is a model that is generated in the process of image denoising and denoising. It is assumed that an image with no noise at all will go through 1000 steps from an image full of noise that cannot be recognized:
insert image description here

Then in the training phase, the Diffusion model will first randomly select a step among the 1000 steps, first add noise, and then denoise through the network, and use the original image at this stage as the Loss during backpropagation. The specific steps of noise addition and denoising are not directly adding or subtracting. It is necessary to calculate the noise change value through the network and then subtract it back. I won’t explain too much here.

This denoising operation passes through the Unet network, which is a residual convolutional network, named because the structure is U-shaped. And the prompt words we input will first pass through the CLIP model and then be encoded into the noise.

Later, Latent Diffusion Models appeared, that is, through compression technology such as VAE, the original image is encoded into a picture with a relatively small size and latent space data through the VAE network, and then diffused, and the final image is decoded into the original image through the VAE network. picture. This technology greatly reduces the memory usage, and also laid the foundation for the later Stable Diffusion.

2.Stable Diffusion Web UI

Next, I will talk about practical things, about Stable Diffusion and third-party open source Web UI warehouses.

2.1 Introduction to Web UI

If you search StableDiffusion directly on github, you can see that there are 3 warehouses at the top:
insert image description here
in fact, Stable Diffusion is developed by CompVis, Stability AI and LAION, so the warehouse addresses of CompVis and Stability-AI are theoretically official warehouses .
The stable diffusion web ui is actually an unofficial open source project, but this is the warehouse we will use later. This warehouse is really out of the box, no need to configure Cuda, there will be no strange error reports, even The basic model will be automatically downloaded for you.

2.2 Download and configuration

2.2.1 Start Stable Diffusion

First download from the Stable Diffusion Web UI of the AUTOMATIC1111 warehouse, which will automatically download StableDiffusion and the basic model:
https://github.com/AUTOMATIC1111/stable-diffusion-webui

According to the tutorial instructions of the warehouse, finally run webui-user.bat.
insert image description here
After installing and downloading the contents of webui-user.bat, you can open the StableDiffusion interface at ip:127.0.0.1:7860: the
insert image description here
upper left corner shows the basic model.

2.2.2 Install the large model

SDXL is considered a large model, while Lora and ControlNet are considered small models. The small model depends on the version of the large model. If the version of the large model changes, the small model will fail and report an error.

To download the large model, you need to go to two other github repositories to find it.

If you don't want to bother, you can also download the large model here:
https://rentry.org/sdmodels
https://civitai.pro/

2.2.3 Installing plugins

Stable Diffusion web ui can also be used to expand plug-ins, such as the more popular Lora and Control Net, which were not originally developed as SD plug-ins, but the academic content of serious papers, and later expanded Stable Diffusion plug-in versions.

Control Net1.1 family bucket can be downloaded on hugging face:
https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main

There is also a little monk who is relatively popular on the short video platform recently, and the mouth shape generates the SadTalker plug-in:
https://github.com/OpenTalker/SadTalker

Guess you like

Origin blog.csdn.net/grayrail/article/details/132248058