Article Directory
Fine-tuning of Stable Diffusion based on LoRA
data set
The data set used in this fine-tuning is: Pokemon data set of LambdaLabs
Download the dataset using the git clone command
git clone https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions
The data set has a total of 883 samples, including two parts: image (picture) and text (text), as shown in the figure below.
Model download
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
Environment configuration
# 创建一个新的conda环境
conda create -n diffusers python==3.10
# 激活conda环境
conda activate diffusers
# 下载模型仓库
git clone https://github.com/huggingface/diffusers
# 进入diffusers目录
cd diffusers
# 进行安装
pip install .
cd examples/text_to_image
# 安装环境所需的包
pip install -r requirements.txt
fine-tuning process
When fine-tuning, you only need to run the train_text_to_image_lora.py file with the following command. The corresponding parameters need to be modified according to the downloaded path file address, such as MODEL_NAME, DATASET_NAME, etc.; corresponding parameters can also be adjusted according to GPU resources, such as train_batch_size, gradient_accumulation_steps , etc.
export MODEL_NAME="/data/sim_chatgpt/stable-diffusion-v1-5"
export OUTPUT_DIR="./finetune/lora/pokemon"
export DATASET_NAME="./pokemon-blip-captions"
nohup accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--dataloader_num_workers=8 \
--resolution=512 --center_crop --random_flip \
--train_batch_size=2 \
--gradient_accumulation_steps=4 \
--max_train_steps=7500 \
--learning_rate=1e-04 \
--max_grad_norm=1 \
--lr_scheduler="cosine" --lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR} \
--checkpointing_steps=500 \
--validation_prompt="Totoro" \
--seed=1337 \
>> finetune_log0725.out 2>&1 &
Remarks : Refer to here for parameter settings , remove
export HUB_MODEL_ID="pokemon-lora"
–push_to_hub
–hub_model_id=${HUB_MODEL_ID}
–report_to=wandb
sample data size is 883, here set train_batch_size to 2, max_train_steps to 7500,
video memory occupies about 11 G, the training time is about 8 hours.
The video memory usage is as follows:
reasoning
After the fine-tuning is completed, the following code can be used for inference.
from diffusers import StableDiffusionPipeline
import torch
model_path = "./finetune/lora/pokemon"
pipe = StableDiffusionPipeline.from_pretrained("/data/sim_chatgpt/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A pokemon with green eyes and red legs."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")
After the code runs, a picture of pokemon.png will be generated, as shown in the figure below.
WebUI deployment
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
You need to put the original model file and the fine-tuned lora model file under ~/stable-diffusion-webui//models/Stable-diffusion
cp -r /data/sim_chatgpt/stable-diffusion-v1-5/* ~/stable-diffusion-webui//models/Stable-diffusion/
mkdir ~/stable-diffusion-webui//models/Lora
cp -r ~/diffusers/examples/text_to_image/finetune/lora/pokemon/* ~/stable-diffusion-webui//models/Lora/
./webui.sh --no-download-sd-model --xformers --no-gradio-queue
Error:
RuntimeError: Couldn’t install gfpgan.
Solution:
Install
https://github.com/TencentARC/GFPGAN
git clone https://github.com/TencentARC/GFPGAN
pip install basicsr -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
pip install facexlib
pip install -r requirements.txt
# 报错,无法安装(待解决)
python setup.py develop
# If you want to enhance the background (non-face) regions with Real-ESRGAN,
# you also need to install the realesrgan package
pip install realesrgan
Reference:
https://huggingface.co/blog/lora
https://huggingface.co/blog/zh/lora
https://github.com/AUTOMATIC1111/stable-diffusion-webui