Practical operation of stable diffusion-large model introduction-SDXL1 large model

Series Article Directory

Everyone move to the link below, which introduces the principle and operation of stable diffusion in detail (this article is just a writing template for the following series of articles).
stable diffusion practical operation


提示:写完文章后,目录可以自动生成,如何生成可参考右边的帮助文档


foreword

After iterations of SDXL0.9 and SDXL Beta, the official version of SDXL1.0 is finally released! Before using AIGC to generate pictures, generally 512 512 pictures were generated, and then enlarged to meet the requirements of high-definition picture output. But there is a problem here is that the bottom model is actually trained from a large number of 512 512 pictures, so the effect of the picture is often unsatisfactory, and the details will be worse. This time SDXL1.0 directly uses 1024 1024 massive pictures to train the base model, and it is divided into a base model for Vincent graphs and a refiner model for optimized and enlarged graph-generated graphs, so as to achieve a graph output effect that is not inferior to Midjourney.
After iterations of SDXL0.9 and SDXL Beta, the official version of SDXL1.0 is finally released!
Before using AIGC to generate pictures, generally 512
512 pictures were generated, and then enlarged to meet the requirements of high-definition picture output. But there is a problem here is that the bottom model is actually trained from a large number of 512 512 pictures, so the effect of the picture is often unsatisfactory, and the details will be worse. This time, SDXL1.0 directly uses a large number of 1024 1024 images to train the base model, and it is divided into a base model for Vincent graphs and a refiner model for optimizing and enlarging the graphs. The output effect of Midjourney.


提示:以下是本篇文章正文内容,下面案例可供参考

1. What are the optimizations of SDXL

1. What is the difference between SDXL and SD1.5 models

Apart from the size difference between SDXL and the original SD1.5 model, the biggest difference is that SDXL consists of two sets of models, the base model and the refiner optimization model. You need to run the base model first, and then run the refinement model. The base model sets the global composition, while the refinement model adds finer details. You can also choose to run only the base model.

The language model (the module that understands your prompts) is a combination of the largest OpenClip model (ViT-G/14) and OpenAI's proprietary CLIP ViT-L. This is a smart choice, since Stable Diffusion v2 only uses OpenClip, it is difficult to prompt success. Reintroducing OpenAI's CLIP could make prompting easier. Hints that work on v1.5 also work well or even better on SDXL.
U-Net, the most important part of the diffusion model, is now 3 times larger. Coupled with a larger language model, the SDXL model can generate high-quality images that closely match the cues.
Because the bottom model is trained by 1024 1024, which is 4 times larger than the original 512 512, the base model size of the bottom model is also close to 7GB, and the refiner is also almost 7GB, which requires higher hardware (GPU memory)! If you only have less than 8G of video memory, it is recommended not to touch SDXL

2. The picture has a stronger sense of reality

Because the semantic understanding of the prompt words is more accurate, and the resolution of the bottom mold is higher, so the rendering of light, image quality, lens, angle, focus, etc. is more in place. The following is the prompt I used to generate directly based on the XL base model picture of.

photo of young Chinese woman, highlight hair, sitting outside restaurant, wearing dress,
 rim lighting, studio lighting, looking at the camera, dslr, ultra quality, sharp focus, 
 tack sharp, dof, film grain, Fujifilm XT3, crystal clear, 8K UHD, highly detailed glossy eyes, 
 high detailed skin, skin pores

## 3. The rendering of text is more reliable. Before SD1.5, it was impossible to improve the text in the picture in the prompt word. Now in SDXL, the rendering of English words can be better realized, but sometimes It also depends on luck, and some flaws are often exaggerated, but the flaws are not concealed, and it is better than nothing, which is always a big improvement. Here are the prompt words:
A fast food restaurant on the moon with name "zhoulilian"

2. Install and download

SDXL1.0 large model and vae download
Currently we have not downloaded the bottom model of SDXL1.0, we need to download it manually from HuggingFace, the specific URL:
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve /main/sd_xl_refiner_1.0.safetensors
https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors
These two files are base models, about 7 G each files.
After downloading to the GPU server, it needs to be placed in the stable-diffusion-webui/models/Stable-diffusion folder. There is also a VAE file, which is optional, and can be downloaded from:
https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
and placed in the stable-diffusion-webui/models/VAE file in the middle.
Then we come to SD WebUI, refresh the list of bottom molds, and you can see the base and refiner models of XL.
As for VAE, it is not displayed in the UI by default.


Summarize

`1. Before the release of SDXL1.0, you need to install the Demo extension to sd-webui to use SDXL. Now it is not needed, so if you have installed the Demo extension before, you can delete it.
2. Directly output a picture with a size of 1024 or above, not a 512*512 picture.
3. The previously downloaded Lora, base model, Embedding, etc. cannot be used on SDXL1.0, so you need to download the special SDXL version from C station again. Many Loras have not released the XL version of Lora, so everyone should wait until the ecology is enriched before using it as a production tool.
4. The Lora training tool has also been updated, and there is a corresponding SDXL version branch, so students who want to engage in Lora training, remember to switch the version of the training tool and retrain your own XL Lora.

Guess you like

Origin blog.csdn.net/weixin_43360707/article/details/132680125