Stable Diffusion: Fine-tuning the LoRA model with your own dataset

insert image description here

foreword

prerequisite

  1. Install Python 3.10.6 : https://www.python.org/downloads/release/python-3106/
  2. Install git : https://git-scm.com/download/win
  3. 安装 Visual Studio 2015, 2017, 2019, and 2022 redistributable:https://aka.ms/vs/17/release/vc_redist.x64.exe

related introduction

  • Python is a cross-platform computer programming language. It is a high-level scripting language that combines interpretability, compilation, interactivity and object-oriented. Originally designed for writing automation scripts (shell), as the version is continuously updated and new language features are added, it is more and more used for the development of independent and large-scale projects.
  • PyTorch is a deep learning framework, which encapsulates many network and deep learning related tools for us to call, instead of writing them one by one. It is divided into CPU and GPU versions, and other frameworks include TensorFlow, Caffe, etc. PyTorch is launched by Facebook Artificial Intelligence Research Institute (FAIR) based on Torch. It is a Python-based sustainable computing package that provides two advanced features: 1. Tensor computing with powerful GPU acceleration (such as NumPy); 2. , Automatic differentiation mechanism when constructing deep neural network.
  • AIGC (Artificial Intelligence Generated Content) refers to content created or generated by artificial intelligence systems. It involves the use of artificial intelligence techniques, such as machine learning, natural language processing, and computer vision, to generate various forms of content, including text, images, video, music, and more.
  • Stable Diffusion is a method used in probabilistic modeling and image processing. It is based on the theory of diffusion processes and aims to smooth and denoise images while preserving important image structure and details.
  • The stable diffusion method achieves smoothing and denoising by applying a nonlinear diffusion operator on the image. Different from traditional linear diffusion methods, stable diffusion introduces non-linear terms to better preserve the edges and details of images.
  • The core idea of ​​stable diffusion is to consider the gradient information during the diffusion process, and adjust the diffusion speed according to the gradient size and direction. This effectively suppresses blurring of edges and loss of detail while smoothing the image.
  • The stable diffusion method has a wide range of applications in image denoising, edge preservation, texture enhancement and so on. It provides a way to balance smoothing and preserving image structure, which can be applied in fields such as computer vision, image processing and pattern recognition.
  • The full name of the LoRA model is: Low-Rank Adaptation of Large Language Models, which can be understood as a plug-in in Stable-Diffusion, a model that can be trained with only a small amount of data.
  • The LoRA model can be used for fine-tuning of large language models and can be used to reduce the cost of fine-tuning.
  • The LoRA model can be used in conjunction with the large model to interfere with the results produced by the large model.
  • The method adopted by LoRA is to insert a new data processing layer into the original model, thereby avoiding modifying the original model parameters, thus avoiding the situation of copying the entire model, and at the same time, it also optimizes the parameters of the inserted layer, and finally A very lightweight model tuning method is realized.
  • LoRA proposes to freeze the weights of pre-trained models and inject trainable layers (rank-decomposition matrices) in each Transformer block. LoRA can also be used in the cross-attention layer in Stable-diffusion to improve the effect of generating images from text.
  • The size of the LoRA model is relatively small, and the common one is about 144MB. When using it, it should be used in conjunction with the Stable Diffusion 1.5 model of the prund version.
    insert image description here

Fine-tuning the trained LoRA model

Download the kohya_ss project

insert image description hereAfter downloading and decompressing, the project directory is shown in the figure below.
insert image description here

Install the kohya_ss project

  1. Open a terminal and navigate to the desired installation directory.
    Enter kohya_ssthe directory:

    cd kohya_ss
    
  2. Execute the following command to run the setup script:

    .\setup.bat
    

If no error is reported, the installation is successful.

Run the kohya_ss project

On Windows, use the gui.bat script and run it in a terminal with the following command:

gui.bat --listen 127.0.0.1 --server_port 7860 --inbrowser --share

After running successfully, it can be opened in the browser input http://127.0.0.1:7860/, as shown in the figure below.
insert image description here

README.mdNote: For detailed tutorials, please refer to the files in this project

Prepare dataset

Under the kohya_ss project, create a train directory, the specific content is as follows:
insert image description here

  • image : Images are placed here.
  • log: training record
  • model: model save path
  • The image directory also has a subdirectory. For example, this article is 100_Freeman, 100 means 100 steps, which will directly affect the number of steps and effects of training, and Freeman means the name of the person in the picture.
  • Create a directory, put the processed pictures in the 00_Freeman directory, and then prepare for keyword generation.

generate keywords

  • Specific steps: Utilities->Captioning->BLIP Captioning

insert image description here
After successful generation, a TXT file will appear, as shown in the figure below.
insert image description here

insert image description here
insert image description here

Model parameter settings

Pretrained model settings

insert image description here

folder settings

insert image description here

Training parameter settings

insert image description here

Start training the LoRA model

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
.safetensorsAfter the training is completed, a model file will be generated in the model folder
insert image description here

TensorBoard view training status

On the page, click Start TensorBoardto open the URL http://127.0.0.1:6006and view it.
insert image description here

insert image description here
insert image description here

Test the trained LoRA model

To test the model, you need to use the stable-diffusion-webui project. For the specific installation method, please refer to Deploying and Using Stable Diffusion AI Open Source Project Drawing under Window

  1. Copy the model files in the kohya_ss/train/Freeman/model directory Freeman_bs2_epoch50_fp16.safetensorsto the stable-diffusion-webui/models/Lora directory and the stable-diffusion-webui/models/Stable-diffusion directory in the stable-diffusion-webui project.
    insert image description here
    insert image description here

  2. In the stable-diffusion-webui directory, use webui-user.batthe script and run it in a terminal with the following command:

webui-user.bat

insert image description here

After running successfully, it can be opened in the browser input http://127.0.0.1:7861/, as shown in the figure below.
insert image description here

  1. Choose your own trained LoRA model
    insert image description here
    insert image description here
    insert image description here

Generate images from text (txt2img)

 <lora:Freeman_bs2_epoch50_fp16:1>Freeman a beautiful woman with glasses and a white dress,modelshoot style,beautiful light,photo realistic game cg

insert image description here

reference

[1] https://github.com/bmaltais/kohya_ss.git
[2] https://github.com/AUTOMATIC1111/stable-diffusion-webui
[3] https://github.com/camenduru/stable-diffusion-webui
[4] https://www.kaggle.com/code/camenduru/stable-diffusion-webui-kaggle
[5] https://blog.csdn.net/wpgdream/article/details/130607099
[6] https://zhuanlan.zhihu.com/p/620583928

Guess you like

Origin blog.csdn.net/FriendshipTang/article/details/132395013