Stable Diffusion WebUI from zero basis to entry

143e5c0474fcc925328b46b7cfcb1c32.gif

This article mainly introduces the actual operation method of Stable Diffusion WebUI, covering prompt derivation, lora model, vae model and controlNet application, etc., and gives practical examples of operable Vincent graphs and graph-generated graphs. It is suitable for students who are interested in Stable Diffusion but are confused about the use of Stable Diffusion WebUI. I hope this article can reduce the learning cost of Stable Diffusion WebUI and experience the charm of AIGC image generation more quickly.

eeb0e23c36195e6f924f089776f96950.png

introduction

Stable Diffusion (sd for short) is a deep learning text-to-image generation model. Stable Diffusion WebUI is a tool software that encapsulates the Stable Diffusion model and provides an operable interface. The model loaded on the Stable Diffusion WebUI is based on the Stable Diffusion pedestal model, in order to obtain a higher quality generation effect in a certain style, after retraining the model. Currently Stable Diffusion version 1.5 is the most popular base model in the community.

▐Installation   _

For the installation of sd web-ui, please refer to: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs



SD web-ui uses the gradio component package. When gradio configures share=True, it will create a frpc tunnel and link to aws. For details, please refer to (https://www.gradio.app/guides/sharing-your-app), Therefore, when the sd web-ui application starts, please consider whether to disable the share=True configuration or delete the frpc client according to your own security production or privacy protection requirements.

▐Model   _

https://civitai.com/ is an open source sd model community, which provides a wealth of models for free download and use. Here is a brief description of the classification of the model, which will help improve the use of sd web-ui. The sd model training methods are mainly divided into four categories: Dreambooth, LoRA, Textual Inversion, Hypernetwork.

  1. Dreambooth: On the basis of the sd base model, the large model obtained through the Dreambooth training method is a complete new model, the training speed is slow, and the generated model file is relatively large, generally several G, and the model file format is safetensors or ckpt . The characteristic is that the drawing effect is good, and there are obvious improvements in certain artistic styles. As shown in the figure below, this type of model in sd web-ui can be selected here.

    72b9c1b72375ee586d1b36f87057806b.png

  2. LoRA: A lightweight model fine-tuning training method, which fine-tunes the model on the basis of the original large model, and is used to output people or things with fixed characteristics. It is characterized by good output effect for specific style of graphs, fast training speed, and small model files, usually dozens to more than one hundred MB, which cannot be used independently and need to be used together with the original large model. sd web-ui provides the lora model plug-in and the way to use the lora model. For specific operations, see "Operation Process -> Lora Model" in this article.

  3. Textual Inversion: A method of fine-tuning the training model using text prompts and corresponding style pictures. The text prompts are generally special words. After the model training is completed, these words can be used in the text prompts to realize the generation of picture styles and styles for the model. The control of details needs to be used together with the original large model.

  4. Hypernetwork: A method similar to LoRA for fine-tuning and training a large model, which needs to be used with the original large model.

28ae693cb6d310b5cf8bfde911ca0819.png

Operating procedures

   prompt derivation

  1. upload a picture in sd

  2. Reverse derivation of keywords, there are two models CLIP and DeepBoru, take Figure 1 as an example:

8f15663cb0b8f1532aec6e80caee7c06.png

Figure 1: High-definition photos taken by the original camera of iPhone 14 pro max

The result of prompt inversion using CLIP:

a baby is laying on a blanket surrounded by balloons and balls in the air and a cake with a name on it, Bian Jingzhao, phuoc quan, a colorized photo, dada

The result of prompt inversion using DeepBooru:

1boy, ball, balloon, bubble_blowing, chewing_gum, hat, holding_balloon, male_focus, military, military_uniform, open_mouth, orb, solo, uniform, yin_yang

The inverse result of CLIP is a sentence, and the inverse result of DeepBooru is a keyword.

The forward prompt can be modified, and the reverse prompt can also be added. The reverse prompt is used to restrict the model from adding elements that appear in the reverse prompt when producing pictures. The reverse prompt is not required and can be left blank.

   lora model

The lora model has a strong intervention or enhancement effect on the style and quality of the generated diagram of the large model, but the lora model needs to be used together with the supporting large model, and cannot be used alone. There are two main ways to use the lora model in sd-webui:

  • method one

Install the additional-network plug-in, the github address of the plug-in: https://github.com/kohya-ss/sd-webui-additional-networks, you can download and install it directly in the extension on sd-webui. This plugin only supports lora models trained using sd-script scripts. At present, most open source lora models on https://civitai.com/ are trained based on this script, so this plugin supports most lora models. The downloaded lora model needs to be placed in

*/stable-diffusion-webui/extensions/sd-webui-additional-networks/models/lora

Under the path, the new model needs to restart sd-webui. After the plug-in and model are loaded correctly, "Optional additional network (LoRA plug-in)" will appear in the lower left corner of the webui operation interface. To trigger lora when generating pictures, you need to select the lora model in the plug-in, and add Trigger Words to the forward prompt words. The lora model selected in the figure below is blinndbox_v1_mix, and the trigger words are full body, chibi. Each lora model has its own unique Trigger Words, which will be noted in the model introduction.

fc3c26d16ed4d53f49d542426b3e728e.png

If the plug-in does not respond after clicking Install, or prompts an error caused by Flag, it is because the setting of allowing extension plug-ins is disabled when the webui starts. You need to add a startup parameter when the webui starts: --enable-insecure-extension-access

./webui.sh --xformers --enable-insecure-extension-access
  • Method Two

Do not use the additional-network plug-in, use the lora model supported by sd-webui by default, you need to put the lora model in

*/stable-diffusion-webui/models/Lora

directory, restart sd-webui to automatically load the model.

Add the lora model enabling statement in the forward prompt word, and the lora model can be triggered when producing pictures:

e42f73a1d0927302588104b5eed8e56d.png

The web-ui provides the function of automatically filling lora prompts. Click the icon shown in the figure to open the list of lora models, and then click the model area, and the sentences will be automatically filled in the forward prompt area:

92604da6ff2598d11821cbd5d7c24f62.png

Using any of the above two methods can make the lora model effective in content production, and using both methods at the same time will not cause problems.

  ControlNet

controlNet attempts to control pretrained large models such as Stable Diffusion by supporting additional input conditions. The pure text control method makes the production of content like drawing cards by chance, the result is uncontrollable and not easy to achieve the expected effect. The emergence of controlNet makes the content generation of stable diffusion large models enter a controllable period, making the creation controllable and making AIGC Further advances in industrial applications.

  • Install controlNet

On the sd-webui, click the extension to enter the plug-in installation page, find the controlNet plug-in, and click install to complete the plug-in installation.

6185919ac3f16d18fb478030ceb71d72.png

Download the open source controlnet model

Download address: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main

A model consists of two files: .pth and .yaml, which need to be downloaded at the same time. The letter after "V11" in the file name, p: indicates that it can be used, e: indicates that it is still in the test, and u: indicates that it is not completed. The downloaded model is placed in the following directory, and the controlnet model can be loaded after restarting sd-webui.

*\stable-diffusion-webui\extensions\sd-webui-controlnet\models

▐Example   of picture generation

  • model selection

1. Selection of stable diffusion large model: revAnimated_v11 (https://civitai.com/models/7371?modelVersionId=46846)

2. The lora model uses blind_box_v1_mix (https://civitai.com/models/25995?modelVersionId=32988)

3. Sampling method Euler a

4. Use Figure 1 as the source image, use the DeepBooru model to generate forward prompts, add revAnimated_v11 specific prompts, delete some forward prompts, and add reverse prompts. The final prompts used are as follows.

Forward:

(masterpiece),(best quality), (full body:1.2), (beautiful detailed eyes), 1boy, hat, male, open_mouth, smile, cloud, solo, full body, chibi, military_uniform, <lora:blindbox_v1_mix:1>

Reverse:

(low quality:1.3), (worst quality:1.3)

The resulting image is:

29fdaaac3b31b46d313fbc8e9a216691.jpeg

Figure 1: Original picture

2fa3bc952baf7942005f065423565af4.png

Figure 2: sd generated pictures

5. Keep the conditions for generating the image unchanged, add the ControlNet model, select Openpose, and select balance for the control mode. The generated image is as shown below. The generated character's movements are constrained by the effect of Openpose, which is more similar to the original image.

559911c06acb9684d04793ef5b06d5ac.png

Figure 3: sd generates pictures (add openpose)

dd831f5c06f8afa377a3f145162f7105.png

Figure 4: Image generated by openpose

   Vincent diagram example

  • model selection

  1. The stable diffusion large model selection: revAnimated_v11 (https://civitai.com/models/7371?modelVersionId=46846)

  2. The lora model uses blind_box_v1_mix (https://civitai.com/models/25995?modelVersionId=32988)

  3. Sampling method Euler a

Example 1

prompt word

Forward:

(masterpiece),(best quality),(ultra-detailed), (full body:1.2), 1girl, youth, dynamic, smile, palace,tang dynasty, shirt, long hair, blurry, black hair, blush stickers, black hair, (beautiful detailed face), (beautiful detailed eyes), <lora:blindbox_v1_mix:1>, full body, chibi

Reverse:

(low quality:1.3), (worst quality:1.3)

The resulting image is:

69d0db04c873ed88181305fe921b658c.png

Figure 5: Vincent Diagram Example 1

Example 2

prompt word

Forward:

(masterpiece),(best quality),(ultra-detailed), (full body:1.2), 1girl,chibi,sex, smile, open mouth, flower, outdoors, beret, jk, blush, tree, :3, shirt, short hair, cherry blossoms, blurry, brown hair, blush stickers, long sleeves, bangs, black hair, pink flower, (beautiful detailed face), (beautiful detailed eyes), <lora:blindbox_v1_mix:1>,

Reverse:

(low quality:1.3), (worst quality:1.3)

The generated picture is:

cac55344623ebfa70e034cb50a6c2abe.png

Figure 6: Vincent Diagram Example 2

Analysis of prompt words

  1. (masterpiece),(best quality),(ultra-detailed), (full body:1.2), (beautiful detailed face), (beautiful detailed eyes) These words with () are supporting prompts for the revAnimated_v11 model to improve the image quality build quality.

  2. <lora:blindbox_v1_mix:1> is the prompt to trigger the blind_box_v1_mix model.

  3. full body, chibi are the trigger words of the blind_box_v1_mix model.

  4. The remaining prompts are descriptions of the image content.

  5. The revAnimated_v11 model is sensitive to the order of the prompts, and the prompt words ranked in the front have a greater impact on the results than the prompts ranked in the back.

  VAE

In the actual use of sd, the vae model plays the role of filter and fine-tuning. Some sd models have their own vae, and there is no need to mount vae separately. The vae model that matches the model usually has a vae download link on the release page of the model.

  • model installation

Download the vae model to the following directory of sd web-ui, restart sd web-ui, and the vae model can be loaded automatically.

/stable-diffusion-webui/models/VAE

As shown in the figure below, the vae model can be switched on the sd web-ui.

374c1eebf411e9888fdb8a37e792bd82.png

If you do not see this selection box on we-ui, go to Settings -> User Interface -> Quick Settings list to add the configuration "sd_vae", as shown below:

93f1d6142c091ce4525a9b8f9f8d677d.png

  • Effect

On the basis of keeping the generation conditions in Figure 6 unchanged, and adding the Blessed2 (https://huggingface.co/NoCrypt/blessed_vae/blob/main/blessed2.vae.pt) model, the color and contrast of the picture have changed significantly.

224788fa718edebbc871737318bba789.png

Figure 7: Before adding the vae model

e28713ed9ee3f3a1ea54aa775a3c4a2c.png

Figure 8: After adding the vae model, the saturation and contrast of the picture are improved

f28929f2d7d40123980923cc8766ec08.png

conclusion

  1. The learning curve of sd web-ui is relatively steep, and having certain knowledge in the field of image processing can help users better select and combine models.

  2. It is easy for beginners with zero foundation to randomly select models, randomly combine them, and perform a series of operations on the sd web-ui interface. Make a selection.

  3. sd is open source, and sd web-ui is a toolbox, not a commercial product. There are many models with great effects in the community. The upper limit of the drawing is high, but the lower limit is also very low. Open source does not mean that there is no cost, because sd We-ui deployment requires high hardware configuration. In order to save learning costs, relatively stable drawing effect, simple and convenient user experience, and no hardware configuration requirements, midjourney is the current first choice, but a subscription fee is required.

0ee2460d5265df6083f5e18eb8f1896d.png

team introduction

We are the technology intelligence strategy team of Taobao FC, responsible for mobile phone Tmall search, recommendation, Polaroid and other business research and development and technology platform construction, comprehensive use of cutting-edge technologies such as search and push algorithms, machine vision, AIGC, etc., and are committed to relying on technological progress support The improvement of scene efficiency and the innovation of products will bring users a better shopping experience.

¤  Extended reading  ¤

3DXR Technology  |  Terminal Technology  |  Audio and Video Technology

Server Technology  |  Technical Quality  |  Data Algorithms

Guess you like

Origin blog.csdn.net/Taobaojishu/article/details/132241942