Detailed tutorial on refining large models of stable diffusion 1.5. You can make large models for just a few dollars.

img

Preface

I have already talked about the training process of SDXL_LORA and SD1.5_LORA before. It is basically explained clearly. If you are interested, you can take a look. Now I will explain the refining of the SD1.5 large model.

SDXL_LORA:https://tian-feng.blog.csdn.net/article/details/132955577

SD1.5_LORA:https://tian-feng.blog.csdn.net/article/details/132133361

DreamBooth is a way to customize a personalized TextToImage diffusion model. Excellent results can be achieved with only a small amount of training data. Dreambooth is developed based on Imagen. When using it, you only need to export the model as ckpt, and then it can be loaded into various UIs.

However, neither Imagen's model nor pretrained weights are available. So the original Dreambooth was not suitable for stable diffusion. But later diffusers implemented the Dreambooth function and fully adapted to Stable Diffusion.img

Model refining

data preparation

Just like LORA, crop it to a size of 512x512 and then tag it. Please refer to the previous article for data processing. I won’t go into details. The amount of data depends on your task (real person, two-dimensional, painting style). Don’t worry, it’s best to listen to me. After the process is explained, you can prepare the data and start refining.

Two-dimensional: 20-50 (it is recommended to cut out the picture and fill it in with white. Facial training has little effect, but you can try it yourself, maybe the effect will be better)

Real person: 50-100 pictures (it is recommended to cut out the pictures and fill in the blanks, and strengthen facial training)

Painting style: 1000 and up

This is just a reference range, if your picture quality is first-rate. Of course, the more the better. Don’t force yourself to include some ordinary photos just for the sake of quantity. If the quantity increases, the training will be poor.

Suppose I come to refine a real-life model. This is my data set, just to make a sample.

Insert image description here

After cutting out the picture, there are two folders under the Cyberdan furnace, which are the cropped characters and faces.img

Then prepare your base film, which is similar to the refining style of your picture, and then compress and upload it to Baidu Cloud Disk. This is for convenience of downloading, because you can create a new folder in Autodl and Baidu Cloud Disk to share a space. If Manually dragging files to Autodl is really slow; especially for large models of 2G. Please see the SDXL_LORA article for this tutorial. This is not difficult. The Chinese tutorial explains it very clearly. Follow his instructions step by step.

Autodl refining

https://www.autodl.com/create

Directly search for dreambooth and it will come out. Create it immediately.

镜像名:  Akegarasu/dreambooth-autodl/dreambooth-autodl:v3

img

Open the terminal and move the dreambooth-aki of the system disk to the data disk, because the system is only 30G and the data disk is 50G.

mv  /root/dreambooth-aki/ /root/autodl-tmp/
ipython kernel install --user --name=diffusers          下载diffusers内核等下测试用

img

Click on the ipynb file training script

img

Direct method: simple but average upload speed

File placement path: model files are placed directly in model-sd, and data set images are placed directly in instance-images. You can drag them manually.

Indirect method: a shared space needs to be established but the upload speed is fast

After downloading the file, it is in the Autodl-tmp path.

Emphasis: It is best to upload compressed zip files. The decompression instructions are as follows:

unzip /root/autodl-tmp/dreambooth-aki/文件名.zip

Move files

 数据集移动
mv /root/autodl-tmp/dreambooth-aki/文件名/* /root/autodl-tmp/dreambooth-aki/instance-images/    
模型移动
mv /root/autodl-tmp/dreambooth-aki/模型名 /root/autodl-tmp/dreambooth-aki/model-sd/

mv means move, move the file from the source address to the destination address

Data preparation is complete

training parameters

autodl-tmp/dreambooth-aki/dreambooth-aki.ipynb

In fact, Mr. Qiuye explained it very clearly. You can basically change the file path and run it step by step. No matter what, I will explain it step by step.

Defined some model conversion files and global variables, etc., directly Ctrl enter to run,

img

Qiuye has converted an animation base film. Under model-hf, we have previously placed the real-life base model in model-sd. Change model.safetensors to the name of our own base model. After running, it will automatically overwrite the model- hf file

img

It can be seen that after the separation, the model is disassembled into text encoder, word segmenter, unet, and vae.

img

Change bocchitherock to a self-named tag (but an English word that does not exist), just like a trigger word. In this way, the training text encoder will link this word with the picture style you trained, and your custom tag will be easier to generate. The style of your training pictures

  • Instance ImageThe
    target data set you are training on.
  • Instance Prompt (custom tag)
    is implemented by default to globally share a prompt, which means putting this tag in all training image tags.
  • Class/Regularization Image is regularization and should be an automatically generated image that is used to detect the prior knowledge of AI. No non-AI generated images should be placed. Don’t worry about automatic generation
  • Class Prompt is to write a simple tag based on the character you are training. For example, to train a girl model, it is 1girl and add some quality words.

Basically no need to change, just run

img

These two can be run directly without modification.

img

# 常用参数 我就不一一讲了,有兴趣看看我sdxl_lora训练,参数讲的挺清楚了,没有的我讲一下
## 最大训练步数
max_train_steps = 3000    #二次元步数3000-5000;真人5000-10000;画风那就使劲把
## 学习率调整
learning_rate = 5e-6        #默认数值,根据训练loss调整,
## 学习率调整策略
## ["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup", "cosine_with_restarts_mod", "cosine_mod"]
lr_scheduler = "cosine_with_restarts"
lr_warmup_steps = 100
train_batch_size = 1 # batch_size
num_class_images = 20 # 自动生成的 class_images 数量

with_prior_preservation = True   #训练人物开启;训练画风关闭
train_text_encoder = False # 训练文本编码器
use_aspect_ratio_bucket = False # 使用 ARB

# 从文件名读取 prompt
read_prompt_from_filename = False
# 从 txt 读取prompt
read_prompt_from_txt = False
append_prompt = "instance"
# 保存间隔
save_interval = 500               #没多少步保存,基本得1000以上要不你数据盘保存几个就满了
# 使用deepdanbooru
use_deepdanbooru = False

# 高级参数
resolution = 512
gradient_accumulation_steps = 1
seed = 1337
log_interval = 10
clip_skip = 1
sample_batch_size = 4
prior_loss_weight = 1.0  #越低则越难过拟合,但是也越难学到东西。
 # 一种学习率调度策略,通常用于训练深度学习模型时的优化器。这个策略的主要思想是随着训练的进行,
#逐渐减小学习率的大小,以帮助模型更好地收敛。
scale_lr = False       
scale_lr_sqrt = False   # 同上
gradient_checkpointing = True
pad_tokens = False
debug_arb = False
debug_prompt = False
use_ema = False
#only works with _mod scheduler
restart_cycle = 1
last_epoch = -1

Just adjust these basics

Finetune basic learning rate: 3e-6; automatic marking accuracy: above 0.35

batch size:

  • Scale/sqrt Ir can be opened for small sample sets (hundred-shot level) within 3
  • How big can a large sample set (thousands, ten thousand) be opened?
  • Number of steps: about a hundred pictures, 1w steps, more than 5 epochs

After determining the batch_size, you can adjust your learning rate and run

img

You can continue training with the last result. For the first time, you don’t need to run it. Just start the training and run it to completion.

Then convert it to a ckpt file, modify the model name saved under your output, and you can convert it and run

img

The next step is to test. You can take a look at whether your model is OK and decide whether to download it. However, it is not the same as the local model and may not be as effective. Modify the test model name, the same as above

img

run

img

Summarize

In fact, the refining of the model is only part of it. All large models now are fine-tuned step by step based on the original model. Because it is basically impossible for us to refining a large model by ourselves, but fine-tuning allows us to do it. If you participate, you can see that the ecosystem of sd1.5 is now blooming with the efforts of different authors;

The model creator is more like a bartender. Just like what Mr. Maiju said before, everyone seems to know how to blend different large models based on different proportions to create a better large model, but I don’t talk much about it. After all, there are some problems in publicly integrating other people’s models, so I’ll just talk about it briefly. I’ll talk about it in detail when I have the opportunity! ! !

Guess you like

Origin blog.csdn.net/weixin_62403633/article/details/133357040