This article mainly introduces the actual operation method of Stable Diffusion WebUI, covering prompt derivation, lora model, vae model and controlNet application, etc., and gives practical examples of operable Vincent graphs and graph-generated graphs. It is suitable for students who are interested in Stable Diffusion but are confused about the use of Stable Diffusion WebUI. I hope this article can reduce the learning cost of Stable Diffusion WebUI and experience the charm of AIGC image generation more quickly.
introduction
Stable Diffusion (sd for short) is a deep learning text-to-image generation model. Stable Diffusion WebUI is a tool software that encapsulates the Stable Diffusion model and provides an operable interface. The model loaded on the Stable Diffusion WebUI is based on the Stable Diffusion pedestal model, in order to obtain a higher quality generation effect in a certain style, after retraining the model. Currently Stable Diffusion version 1.5 is the most popular base model in the community.
▐Installation _
For the installation of sd web-ui, please refer to: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs
SD web-ui uses the gradio component package. When gradio configures share=True, it will create a frpc tunnel and link to aws. For details, please refer to (https://www.gradio.app/guides/sharing-your-app), Therefore, when the sd web-ui application starts, please consider whether to disable the share=True configuration or delete the frpc client according to your own security production or privacy protection requirements.
▐Model _
https://civitai.com/ is an open source sd model community, which provides a wealth of models for free download and use. Here is a brief description of the classification of the model, which will help improve the use of sd web-ui. The sd model training methods are mainly divided into four categories: Dreambooth, LoRA, Textual Inversion, Hypernetwork.
Dreambooth: On the basis of the sd base model, the large model obtained through the Dreambooth training method is a complete new model, the training speed is slow, and the generated model file is relatively large, generally several G, and the model file format is safetensors or ckpt . The characteristic is that the drawing effect is good, and there are obvious improvements in certain artistic styles. As shown in the figure below, this type of model in sd web-ui can be selected here.
LoRA: A lightweight model fine-tuning training method, which fine-tunes the model on the basis of the original large model, and is used to output people or things with fixed characteristics. It is characterized by good output effect for specific style of graphs, fast training speed, and small model files, usually dozens to more than one hundred MB, which cannot be used independently and need to be used together with the original large model. sd web-ui provides the lora model plug-in and the way to use the lora model. For specific operations, see "Operation Process -> Lora Model" in this article.
Textual Inversion: A method of fine-tuning the training model using text prompts and corresponding style pictures. The text prompts are generally special words. After the model training is completed, these words can be used in the text prompts to realize the generation of picture styles and styles for the model. The control of details needs to be used together with the original large model.
Hypernetwork: A method similar to LoRA for fine-tuning and training a large model, which needs to be used with the original large model.
Operating procedures
▐ prompt derivation
upload a picture in sd
Reverse derivation of keywords, there are two models CLIP and DeepBoru, take Figure 1 as an example:
Figure 1: High-definition photos taken by the original camera of iPhone 14 pro max
The result of prompt inversion using CLIP:
a baby is laying on a blanket surrounded by balloons and balls in the air and a cake with a name on it, Bian Jingzhao, phuoc quan, a colorized photo, dada
The result of prompt inversion using DeepBooru:
1boy, ball, balloon, bubble_blowing, chewing_gum, hat, holding_balloon, male_focus, military, military_uniform, open_mouth, orb, solo, uniform, yin_yang
The inverse result of CLIP is a sentence, and the inverse result of DeepBooru is a keyword.
The forward prompt can be modified, and the reverse prompt can also be added. The reverse prompt is used to restrict the model from adding elements that appear in the reverse prompt when producing pictures. The reverse prompt is not required and can be left blank.
▐ lora model
The lora model has a strong intervention or enhancement effect on the style and quality of the generated diagram of the large model, but the lora model needs to be used together with the supporting large model, and cannot be used alone. There are two main ways to use the lora model in sd-webui:
method one
Install the additional-network plug-in, the github address of the plug-in: https://github.com/kohya-ss/sd-webui-additional-networks, you can download and install it directly in the extension on sd-webui. This plugin only supports lora models trained using sd-script scripts. At present, most open source lora models on https://civitai.com/ are trained based on this script, so this plugin supports most lora models. The downloaded lora model needs to be placed in
*/stable-diffusion-webui/extensions/sd-webui-additional-networks/models/lora
Under the path, the new model needs to restart sd-webui. After the plug-in and model are loaded correctly, "Optional additional network (LoRA plug-in)" will appear in the lower left corner of the webui operation interface. To trigger lora when generating pictures, you need to select the lora model in the plug-in, and add Trigger Words to the forward prompt words. The lora model selected in the figure below is blinndbox_v1_mix, and the trigger words are full body, chibi. Each lora model has its own unique Trigger Words, which will be noted in the model introduction.
If the plug-in does not respond after clicking Install, or prompts an error caused by Flag, it is because the setting of allowing extension plug-ins is disabled when the webui starts. You need to add a startup parameter when the webui starts: --enable-insecure-extension-access
./webui.sh --xformers --enable-insecure-extension-access
Method Two
Do not use the additional-network plug-in, use the lora model supported by sd-webui by default, you need to put the lora model in
*/stable-diffusion-webui/models/Lora
directory, restart sd-webui to automatically load the model.
Add the lora model enabling statement in the forward prompt word, and the lora model can be triggered when producing pictures:
The web-ui provides the function of automatically filling lora prompts. Click the icon shown in the figure to open the list of lora models, and then click the model area, and the sentences will be automatically filled in the forward prompt area:
Using any of the above two methods can make the lora model effective in content production, and using both methods at the same time will not cause problems.
▐ ControlNet
controlNet attempts to control pretrained large models such as Stable Diffusion by supporting additional input conditions. The pure text control method makes the production of content like drawing cards by chance, the result is uncontrollable and not easy to achieve the expected effect. The emergence of controlNet makes the content generation of stable diffusion large models enter a controllable period, making the creation controllable and making AIGC Further advances in industrial applications.
Install controlNet
On the sd-webui, click the extension to enter the plug-in installation page, find the controlNet plug-in, and click install to complete the plug-in installation.
Download the open source controlnet model
Download address: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main
A model consists of two files: .pth and .yaml, which need to be downloaded at the same time. The letter after "V11" in the file name, p: indicates that it can be used, e: indicates that it is still in the test, and u: indicates that it is not completed. The downloaded model is placed in the following directory, and the controlnet model can be loaded after restarting sd-webui.
*\stable-diffusion-webui\extensions\sd-webui-controlnet\models
▐Example of picture generation
model selection
1. Selection of stable diffusion large model: revAnimated_v11 (https://civitai.com/models/7371?modelVersionId=46846)
2. The lora model uses blind_box_v1_mix (https://civitai.com/models/25995?modelVersionId=32988)
3. Sampling method Euler a
4. Use Figure 1 as the source image, use the DeepBooru model to generate forward prompts, add revAnimated_v11 specific prompts, delete some forward prompts, and add reverse prompts. The final prompts used are as follows.
Forward:
(masterpiece),(best quality), (full body:1.2), (beautiful detailed eyes), 1boy, hat, male, open_mouth, smile, cloud, solo, full body, chibi, military_uniform, <lora:blindbox_v1_mix:1>
Reverse:
(low quality:1.3), (worst quality:1.3)
The resulting image is:
Figure 1: Original picture
Figure 2: sd generated pictures
5. Keep the conditions for generating the image unchanged, add the ControlNet model, select Openpose, and select balance for the control mode. The generated image is as shown below. The generated character's movements are constrained by the effect of Openpose, which is more similar to the original image.
Figure 3: sd generates pictures (add openpose)
Figure 4: Image generated by openpose
▐ Vincent diagram example
model selection
The stable diffusion large model selection: revAnimated_v11 (https://civitai.com/models/7371?modelVersionId=46846)
The lora model uses blind_box_v1_mix (https://civitai.com/models/25995?modelVersionId=32988)
Sampling method Euler a
Example 1
prompt word
Forward:
(masterpiece),(best quality),(ultra-detailed), (full body:1.2), 1girl, youth, dynamic, smile, palace,tang dynasty, shirt, long hair, blurry, black hair, blush stickers, black hair, (beautiful detailed face), (beautiful detailed eyes), <lora:blindbox_v1_mix:1>, full body, chibi
Reverse:
(low quality:1.3), (worst quality:1.3)
The resulting image is:
Figure 5: Vincent Diagram Example 1
Example 2
prompt word
Forward:
(masterpiece),(best quality),(ultra-detailed), (full body:1.2), 1girl,chibi,sex, smile, open mouth, flower, outdoors, beret, jk, blush, tree, :3, shirt, short hair, cherry blossoms, blurry, brown hair, blush stickers, long sleeves, bangs, black hair, pink flower, (beautiful detailed face), (beautiful detailed eyes), <lora:blindbox_v1_mix:1>,
Reverse:
(low quality:1.3), (worst quality:1.3)
The generated picture is:
Figure 6: Vincent Diagram Example 2
Analysis of prompt words
(masterpiece),(best quality),(ultra-detailed), (full body:1.2), (beautiful detailed face), (beautiful detailed eyes) These words with () are supporting prompts for the revAnimated_v11 model to improve the image quality build quality.
<lora:blindbox_v1_mix:1> is the prompt to trigger the blind_box_v1_mix model.
full body, chibi are the trigger words of the blind_box_v1_mix model.
The remaining prompts are descriptions of the image content.
The revAnimated_v11 model is sensitive to the order of the prompts, and the prompt words ranked in the front have a greater impact on the results than the prompts ranked in the back.
▐ VAE
In the actual use of sd, the vae model plays the role of filter and fine-tuning. Some sd models have their own vae, and there is no need to mount vae separately. The vae model that matches the model usually has a vae download link on the release page of the model.
model installation
Download the vae model to the following directory of sd web-ui, restart sd web-ui, and the vae model can be loaded automatically.
/stable-diffusion-webui/models/VAE
As shown in the figure below, the vae model can be switched on the sd web-ui.
If you do not see this selection box on we-ui, go to Settings -> User Interface -> Quick Settings list to add the configuration "sd_vae", as shown below:
Effect
On the basis of keeping the generation conditions in Figure 6 unchanged, and adding the Blessed2 (https://huggingface.co/NoCrypt/blessed_vae/blob/main/blessed2.vae.pt) model, the color and contrast of the picture have changed significantly.
Figure 7: Before adding the vae model
Figure 8: After adding the vae model, the saturation and contrast of the picture are improved
conclusion
The learning curve of sd web-ui is relatively steep, and having certain knowledge in the field of image processing can help users better select and combine models.
It is easy for beginners with zero foundation to randomly select models, randomly combine them, and perform a series of operations on the sd web-ui interface. Make a selection.
sd is open source, and sd web-ui is a toolbox, not a commercial product. There are many models with great effects in the community. The upper limit of the drawing is high, but the lower limit is also very low. Open source does not mean that there is no cost, because sd We-ui deployment requires high hardware configuration. In order to save learning costs, relatively stable drawing effect, simple and convenient user experience, and no hardware configuration requirements, midjourney is the current first choice, but a subscription fee is required.
team introduction
We are the technology intelligence strategy team of Taobao FC, responsible for mobile phone Tmall search, recommendation, Polaroid and other business research and development and technology platform construction, comprehensive use of cutting-edge technologies such as search and push algorithms, machine vision, AIGC, etc., and are committed to relying on technological progress support The improvement of scene efficiency and the innovation of products will bring users a better shopping experience.
¤ Extended reading ¤
3DXR Technology | Terminal Technology | Audio and Video Technology