Ubuntu20.04 uses Doka to train the HyperNetwork model and LoRA model for the whole process and solutions to difficult problems


Hardware and software configuration:
CPU: AMD 5800 8core 16Thread
GPU: NVIDIA RTX 3090 *1
NVIDIA TITAN RTX *1
OS: Ubuntu20.04

1. LoRA model multi-card training

1.1 Install libraries such as xformer

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
git clone https://github.com/facebookresearch/xformers/
cd xformers
git submodule update --init --recursive
export FORCE_CUDA="1"
# 进入https://developer.nvidia.com/cuda-gpus#compute
# 设置所用显卡对应的Compute Capability,3090和A5000都是8.6
export TORCH_CUDA_ARCH_LIST=8.6
pip install -r requirements.txt
pip install -e .

Download training code:
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts.git

cd LoRA_Easy_Training_Scripts
git submodule init
git submodule update
cd sd_scripts
pip install  --upgrade -r requirements.txt

1.2 Set path

Generally speaking, three paths need to be set, the large model path, the image input path, and the image output path:
insert image description here
Next, generate the training configuration file:

accelerate config

According to the actual situation of the working machine and the training strategy, select the corresponding configuration

- This machine
- 1
- No
- NO
- NO
- NO
- 0,1
- fp16

After the configuration is complete, a training configuration file will be automatically generated.

1.3 Doka training

accelerate launch main.py 

With the same model and configuration, the dual-card training takes 3:46, while the single-card training takes 7:57, which shows that the dual-card acceleration strategy is effective.
Dual card time:
insert image description here
single card time:
insert image description here

2. HyperNetwork model multi-card training

2.1 HyperNetwork training through WebUI

First select preprocessing, then select HyperNetwork training
insert image description here

Troubleshooting solution

Doka training error

After executing the multi-card training command accelerate launch main.py, the following error occurs:
insert image description here
the reason is that the Pytorch version corresponding to xformer0.18.0 is 2.0.0, which is a higher version and should be downgraded to pytorch1.13.0,xformer0.16.0
and no longer uses xformer, ie self.xformers: bool = False.
insert image description here

Guess you like

Origin blog.csdn.net/m0_46339652/article/details/130247397