Stable Diffusion: Fine-tuning the LoRA model with your own dataset
foreword
- Due to my limited level, it is inevitable that there will be mistakes and omissions, please criticize and correct.
- For more exciting content, click to enter the YOLO series column, natural language processing
column or my personal homepage to view- Face Masquerade Detection Based on DETR
- YOLOv7 trains its own data set (mask detection)
- YOLOv8 trains its own data set (football detection)
- YOLOv5: TensorRT accelerates YOLOv5 model reasoning
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- Playing with Jetson Nano (5): TensorRT accelerates YOLOv5 target detection
- YOLOv5: Add SE, CBAM, CoordAtt, ECA attention mechanism
- YOLOv5: Interpretation of yolov5s.yaml configuration file, adding small target detection layer
- Python converts the COCO format instance segmentation dataset to the YOLO format instance segmentation dataset
- YOLOv5: Use version 7.0 to train your own instance segmentation model (instance segmentation of vehicles, pedestrians, road signs, lane lines, etc.)
- Use Kaggle GPU resources to experience the Stable Diffusion open source project for free
prerequisite
- Install Python 3.10.6 : https://www.python.org/downloads/release/python-3106/
- Install git : https://git-scm.com/download/win
- 安装 Visual Studio 2015, 2017, 2019, and 2022 redistributable:https://aka.ms/vs/17/release/vc_redist.x64.exe
related introduction
- Python is a cross-platform computer programming language. It is a high-level scripting language that combines interpretability, compilation, interactivity and object-oriented. Originally designed for writing automation scripts (shell), as the version is continuously updated and new language features are added, it is more and more used for the development of independent and large-scale projects.
- PyTorch is a deep learning framework, which encapsulates many network and deep learning related tools for us to call, instead of writing them one by one. It is divided into CPU and GPU versions, and other frameworks include TensorFlow, Caffe, etc. PyTorch is launched by Facebook Artificial Intelligence Research Institute (FAIR) based on Torch. It is a Python-based sustainable computing package that provides two advanced features: 1. Tensor computing with powerful GPU acceleration (such as NumPy); 2. , Automatic differentiation mechanism when constructing deep neural network.
- AIGC (Artificial Intelligence Generated Content) refers to content created or generated by artificial intelligence systems. It involves the use of artificial intelligence techniques, such as machine learning, natural language processing, and computer vision, to generate various forms of content, including text, images, video, music, and more.
- Stable Diffusion is a method used in probabilistic modeling and image processing. It is based on the theory of diffusion processes and aims to smooth and denoise images while preserving important image structure and details.
- The stable diffusion method achieves smoothing and denoising by applying a nonlinear diffusion operator on the image. Different from traditional linear diffusion methods, stable diffusion introduces non-linear terms to better preserve the edges and details of images.
- The core idea of stable diffusion is to consider the gradient information during the diffusion process, and adjust the diffusion speed according to the gradient size and direction. This effectively suppresses blurring of edges and loss of detail while smoothing the image.
- The stable diffusion method has a wide range of applications in image denoising, edge preservation, texture enhancement and so on. It provides a way to balance smoothing and preserving image structure, which can be applied in fields such as computer vision, image processing and pattern recognition.
- The full name of the LoRA model is: Low-Rank Adaptation of Large Language Models, which can be understood as a plug-in in Stable-Diffusion, a model that can be trained with only a small amount of data.
- The LoRA model can be used for fine-tuning of large language models and can be used to reduce the cost of fine-tuning.
- The LoRA model can be used in conjunction with the large model to interfere with the results produced by the large model.
- The method adopted by LoRA is to insert a new data processing layer into the original model, thereby avoiding modifying the original model parameters, thus avoiding the situation of copying the entire model, and at the same time, it also optimizes the parameters of the inserted layer, and finally A very lightweight model tuning method is realized.
- LoRA proposes to freeze the weights of pre-trained models and inject trainable layers (rank-decomposition matrices) in each Transformer block. LoRA can also be used in the cross-attention layer in Stable-diffusion to improve the effect of generating images from text.
- The size of the LoRA model is relatively small, and the common one is about 144MB. When using it, it should be used in conjunction with the Stable Diffusion 1.5 model of the prund version.
Fine-tuning the trained LoRA model
Download the kohya_ss project
- Official source address : https://github.com/bmaltais/kohya_ss.git
After downloading and decompressing, the project directory is shown in the figure below.
Install the kohya_ss project
-
Open a terminal and navigate to the desired installation directory.
Enterkohya_ss
the directory:cd kohya_ss
-
Execute the following command to run the setup script:
.\setup.bat
If no error is reported, the installation is successful.
Run the kohya_ss project
On Windows, use the gui.bat script and run it in a terminal with the following command:
gui.bat --listen 127.0.0.1 --server_port 7860 --inbrowser --share
After running successfully, it can be opened in the browser input http://127.0.0.1:7860/
, as shown in the figure below.
README.md
Note: For detailed tutorials, please refer to the files in this project
Prepare dataset
Under the kohya_ss project, create a train directory, the specific content is as follows:
- image : Images are placed here.
- log: training record
- model: model save path
- The image directory also has a subdirectory. For example, this article is 100_Freeman, 100 means 100 steps, which will directly affect the number of steps and effects of training, and Freeman means the name of the person in the picture.
- Create a directory, put the processed pictures in the 00_Freeman directory, and then prepare for keyword generation.
generate keywords
- Specific steps: Utilities->Captioning->BLIP Captioning
After successful generation, a TXT file will appear, as shown in the figure below.
Model parameter settings
Pretrained model settings
folder settings
Training parameter settings
Start training the LoRA model
.safetensors
After the training is completed, a model file will be generated in the model folder
TensorBoard view training status
On the page, click Start TensorBoard
to open the URL http://127.0.0.1:6006
and view it.
Test the trained LoRA model
To test the model, you need to use the stable-diffusion-webui project. For the specific installation method, please refer to Deploying and Using Stable Diffusion AI Open Source Project Drawing under Window
-
Copy the model files in the kohya_ss/train/Freeman/model directory
Freeman_bs2_epoch50_fp16.safetensors
to the stable-diffusion-webui/models/Lora directory and the stable-diffusion-webui/models/Stable-diffusion directory in the stable-diffusion-webui project.
-
In the stable-diffusion-webui directory, use
webui-user.bat
the script and run it in a terminal with the following command:
webui-user.bat
After running successfully, it can be opened in the browser input http://127.0.0.1:7861/
, as shown in the figure below.
- Choose your own trained LoRA model
Generate images from text (txt2img)
<lora:Freeman_bs2_epoch50_fp16:1>Freeman a beautiful woman with glasses and a white dress,modelshoot style,beautiful light,photo realistic game cg
reference
[1] https://github.com/bmaltais/kohya_ss.git
[2] https://github.com/AUTOMATIC1111/stable-diffusion-webui
[3] https://github.com/camenduru/stable-diffusion-webui
[4] https://www.kaggle.com/code/camenduru/stable-diffusion-webui-kaggle
[5] https://blog.csdn.net/wpgdream/article/details/130607099
[6] https://zhuanlan.zhihu.com/p/620583928
- Due to my limited level, it is inevitable that there will be mistakes and omissions, please criticize and correct.
- For more exciting content, click to enter the YOLO series column, natural language processing
column or my personal homepage to view- Face Masquerade Detection Based on DETR
- YOLOv7 trains its own data set (mask detection)
- YOLOv8 trains its own data set (football detection)
- YOLOv5: TensorRT accelerates YOLOv5 model reasoning
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- Playing with Jetson Nano (5): TensorRT accelerates YOLOv5 target detection
- YOLOv5: Add SE, CBAM, CoordAtt, ECA attention mechanism
- YOLOv5: Interpretation of yolov5s.yaml configuration file, adding small target detection layer
- Python converts the COCO format instance segmentation dataset to the YOLO format instance segmentation dataset
- YOLOv5: Use version 7.0 to train your own instance segmentation model (instance segmentation of vehicles, pedestrians, road signs, lane lines, etc.)
- Use Kaggle GPU resources to experience the Stable Diffusion open source project for free