Stable Diffusion Webui source code analysis

1. Key python dependencies

(1) xformers: optimize the acceleration scheme. It can properly optimize the model to speed up image generation and reduce memory usage. The disadvantage is that the output image is not stable, which may be slightly worse than without Xformers.

(2) GFPGAN: It is Tencent's open source face repair algorithm, which uses the rich and diverse prior factors encapsulated in the pre-trained face GAN (such as styleGAN2) to perform blind face repair. Practical Algorithms for World Face Inpainting.

(3) CLIP: Contrastive Language-Image Pre-Training, multi-modal direction algorithm. A model that can process images and text can be trained so that the model can understand images and describe images at the same time.

(4) OPEN-CLIP: An open source clip implementation.

(5) Pyngrok: The python implementation of the Ngrok tool, which can achieve intranet penetration

2. Core directory file

(1) repositories under the sd root directory

Store algorithm source code

1) stable-diffusion-stability-ai: sd algorithm

2) taming-transformers: high-resolution image synthesis algorithm

3) k-diffusion: diffusion algorithm

4) CodeFormer: Image HD repair algorithm

5) BLIP: Multimodal Algorithm

(2) sd root directory/models

store model files

3. Instructions for using Gradio

[Stable diffusion webui source code analysis] - interface articles ui.py - Programmer Sought

SD is built on top of Gradio, the python library that allows constructing an html interface with just a few lines of code.

Test example:

gr.Interface is a layout with only left and right columns, and it has 3 input parameters:

Parameter 1: processing function, corresponding to the input parameters of the function in order according to the components passed in the inputs

Parameter 2: Component information

Parameter 3: output data type

4. Model processing flow of webui

(1) cleanup_models function move model file

Move the files in the models directory to relevant subdirectories, such as ckpt files and safetensors files in the Stable-diffusion subdirectory.

(2) Start the SD model setup_model process

The model is located at: /data/work/xiehao/stable-diffusion-webui/models/Stable-diffusion

Mainly through the list_models function to traverse all the model information and store it in checkpoint_alisases.

Step 1, check if there are files ending with cpkt and safetensors under sd/models/Stable-diffusion, if there are, put them in the model_list list, if not, download the model from hugginface.

In the second step, check the checkpoint information of each model in the model_list through the CheckpointInfo function. If it is a safetensors file, read the file information through read_metadata_from_safetensors. The parameters of the Safetensors model are all stored in json, and the key-value pairs are read out and stored in the metadata field.

Step 3. Finally, store each model in the checkpoint_alisases global variable according to the key-value pair of {id: model object}.

(3) Start the setup_model process of the codeformer model

The model is located at: /data/work/xiehao/stable-diffusion-webui/models/Codeformer

Mainly put the instance of Codeformer after initialization into the shared.face_restorers list. During this process, the model parameters are not loaded into the Codeformer network.

(4) Start the setup_model process of the GFPGAN model

(5) traverse and load the built-in upscaler algorithm

These algorithms are located at: /data/work/xiehao/stable-diffusion-webui/modules

Traverse the files at the end of _model.py in this directory, and load them through importlib.import_module(). This step does not see the actual effect.

Initialize the following upscaler algorithms [<class 'modules.upscaler.UpscalerNone'>, <class 'modules.upscaler.UpscalerLanczos'>, <class 'modules.upscaler.UpscalerNearest'>, <class 'modules.esrgan_model.UpscalerESRGAN'>, < class 'modules.realesrgan_model.UpscalerRealESRGAN'>], the first one does not have any algorithm, the second-fourth is implemented by the img.resize() method, the fifth and sixth need to load the model separately, and the data is stored in UpscalerData format, The local_data_path of the object stores the local address information of the model.

For example: shared.sd_upscalers[5].local_data_path is:

'/data/work/xiehao/stable-diffusion-webui/models/RealESRGAN/RealESRGAN_x4plus_anime_6B.pth'

(6) Load py to execute the script load_scripts

Traverse the py scripts under the sd root directory/scripts and the py scripts of each extension component under extensions, put them in the scripts_list variable, the format is as follows: ScriptFile(basedir='/data/work/xiehao/stable-diffusion-webui/extensions/sd -webui-controlnet', filename='processor.py', path='/data/work/xiehao/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/processor.py')

Traverse and import the py files whose type is Script or ScriptPostprocessing in scripts_list:

Load_module(path) may output log information when loading third-party components:

 

(7) Traverse the VAE model

There are currently no vae models installed

(8) Load the model load_model

Select_checkpoint() function , get sd model information, majicmixRealistic_v4.safetensors/majicmixRealistic_v4.safetensors [d819c8be6b]

do_inpainting_hijack function . Set p_sample_plms of PLMSSampler. Regarding the method, the method is applied to the image at each step of the inverse denoising process of the reconstructed image.

get_checkpoint_state_dict function . If it is safetensors, use safetensors.torch.load_file to load the model parameters, otherwise use torch.load to load the model parameters. Loaded into the dict type variable of pl_sd.

   The pl_sd dictionary is further processed: if the outermost layer is the key of state_dict, the value under the key is taken. At this time, under pl_sd is the name of each node of the model and the corresponding weights value. Then replace the following key values:

       find_checkpoint_config function . First find the yaml configuration file from the model directory, if not, execute the guess_model_config_from_state_dict function, that is, obtain the model configuration from the model parameters, and finally return /data/work/xiehao/stable-diffusion-webui/configs/v1-inference.yaml as Configuration file, the information is as follows:

       Then load the yaml file with OmegaConf.load, and then load the yaml information through /data/work/xiehao/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py(82)instantiate_from_config() to get the model . The specific steps are:

Step 1, through the target information of yaml, you can know the LatentDiffusion class whose model is ldm.models.diffusion.ddpm. The source code of the model is located at: sd root directory/modules/models/diffusion/ddpm_edit.py.

       Step 2. Obtain the class of the model through getattr(obj of module, class_name).

The load_model_weights function loads the model parameters into the model. Loaded by model.load_state_dict(state_dict, strict=False). Because the program parameter no_half is false, the model quantization needs to be changed from float32 to half-precision tensor, and the vae module is not processed during half(). The Vae module is the model.first_stage_model part, so it is stored in a temporary variable first, and the half() quantization is completed and then assigned back. Vae finally becomes float16 alone. Then put the model on cuda.

The Hijack function processes the embedding information input by the user. If an initial value is given, unknown things will be generated through SD. We add additional information (such as prompts) to let SD generate things in the direction we want. This is the hijacking function, and hijacking is at the embeddings layer. The embedding class of the model is: transformers.models.clip.modeling_clip.CLIPTextEmbeddings, and its token_embeddings class is: torch.nn.modules.sparse.Embedding.

The embedding processing class for prompts is: FrozenCLIPEmbedderWithCustomWords. There are about 4.9W tokens. Then deal with the weight of the token, ordinary words are 1.0, square brackets are divided by 1.1, and parentheses are multiplied by 1.1.

Specify the optimization method apply_optimizations to optimize the CrossAttention in the sd model through the xformers tool. (Cross-attention mechanism is a technology that extends self-attention mechanism. Self-attention mechanism is a method that assigns weights to each element in the input sequence by calculating the degree of association between query query, key key and value value method, while the cross-attention mechanism fuses information from two different sources by introducing an additional input sequence to achieve more accurate modeling).

The load_textual_inversion_embeddings function loads the embedding file under the root directory/embeddings. Load [('/data/work/xiehao/stable-diffusion-webui/embeddings', <modules.textual_inversion.textual_inversion.DirWithTextualInversionEmbeddings object at 0x7ff2900b39d0>)] two embeddings information. For example: badhandv4, easynegative, EasyNegativeV2, ng_deepnegative_v1_75t, etc.

The model_loaded_callback function traverses all callback functions in callback_map['callbacks_model_loaded'], and then passes in the sd_model model to execute these callback functions in turn. For example, the get_embeddings method of /data/work/xiehao/stable-diffusion-webui/extensions/a1111-sd-webui-tagcomplete/scripts/tag_autocomplete_helper.py, /data/work/xiehao/stable-diffusion-webui/extensions-builtin/ The assign_lora_names_to_compvis_modules method of Lora/scripts/lora_script.py.

5. Page layout

Based on gradio, the interface entry function is create_ui() of modules/ui.py.

to be continued

Guess you like

Origin blog.csdn.net/benben044/article/details/132238552