Stable Diffusion training Lora model

Refer to the following content: https://www.bilibili.com/video/BV1Qk4y1E7nv/?spm_id_from=333.337.search-card.all.click&vd_source=3969f30b089463e19db0cc5e8fe4583a

1. Two key steps to train Lora

The first step is to prepare the pictures to be used for training, that is, high-quality pictures

The second part is to mark these pictures, that is, the precise tag

2. Picture requirements

The recommended quantity is 20-50 pictures, with a maximum of 100 pictures

Bad pictures: blurry, distorted motion, blocked faces, pictures with complex backgrounds (subtract the background)

Resolution: If sd2 is used as the base model, it needs to be above 768*768

Batch image size adjustment: https://www.birme.net/?target_width=512&target_height=512

Adjust image format in batches: https://www.wdku.net/image/imageformat

3. Image marking

Two plug-ins need to be installed: Tagger and dataset tag editor (address: https://github.com/toshiaki1729/stable-diffusion-webui-dataset-tag-editor )

(1) Tagger plug-in

The image generates a txt file of tag information, and the input directory is usually the same as the output directory.

(2)Dataset Tag Editor

Process the tag

1) Delete duplicate words, Remove duplicate tags

2) Delete the tags that belong to the characteristics of the character, such as the character's eyes, eyebrows, nose, hair length and other attributes that represent the character itself. Those who are bound to the characters must be deleted. ( Because we need to generate these features directly according to the lora name in the future, the model needs to learn these features directly according to the lora name without providing other prompt words )

Refer to the following content: https://www.jianshu.com/p/e8cb3ba45b1a

4. Training

Install the training graphical tool kohya, written by the Japanese.

(1) Download

Project address: https://github.com/bmaltais/kohya_ss

The location on the server after downloading: /data/work/xiehao/kohya_ss

(2) Install project dependencies

Enter the directory and install the dependency package: pip install -r requirements.txt

(3) Generate configuration files for execution

Execute the accelerate config command, my configuration is as follows:

(4) Start the training graphical interface

Execute the command: python kohya_gui.py --listen 0.0.0.0 --server_port 12348 --inbrowser

5. Actual combat

(1) Downloaded 25 pictures of zhangluyi from Baidu

(2) The picture is cropped to 768*768

https://www.birme.net/?target_width=768&target_height=768

(3) All pictures are converted to jpt format

https://www.wdku.net/image/imageformat

(4) Use the Tag plugin to extract tags

The method of batch extraction

After execution, the corresponding txt file is generated on linux

(5) Processing tags through Dataset Tag Editor

First, remove duplicates and character trait prompts

Then, save this modification.

(6) Process the file name of the training set in the SD training module

The generated file information is as follows:

These files need to be placed in the 10_zly directory. The number_letter in front of the directory name is the number of times the network trains a single image during each training process . The naming of this directory is very important. It took me an hour to locate this bug .

(7) Training in kohya

After completing the data set preparation, it can be trained in kohya.

First, configure the base model information.

The model corresponding to the linux location specified by the Pretrained model name or path needs to include information such as model_index.json and the tokenizer directory, and there cannot be only one safetensors file . https://huggingface.co/digiplay/majicMIX_realistic_v4 (18G ) can be downloaded via git lfs clone .

 This key point is very important, and the positioning plus download process took me several hours .

Then, configure the training directory

Next, configure the training parameters

The Optimizer cannot use the default value. Currently, only the following 5 types are supported in the source code:

Try one by one to see which one does not report an error.

After successful execution, the log is shown in the figure below. The training takes up about 6G of GPU memory resources, the training time is 20 minutes, and the final generated lora is about 10M.

(8) Detect the effect of the lora model in the stable diffusion webui

After the training is complete, put the Lora directory under the sd root directory extensions/sd-webui-additional-networks/models/lora

The interface operation on Webui is as follows:

Guess you like

Origin blog.csdn.net/benben044/article/details/132365625