HCP-Diffusion: Sun Yat-sen University open source unified general diffusion model code

The source of this article is the editorial department of the heart of the machine

In recent years, image generation models based on Diffusion Models have emerged in an endless stream, showing amazing generation effects. However, the existing related research model code framework has the problem of excessive fragmentation and lack of a unified framework system, resulting in code implementation problems of "difficult migration", "high threshold" and "poor quality".

To this end, Sun Yat-sen University Human-Computer-Physical Intelligence Fusion Laboratory (HCP Lab) built the HCP-Diffusion framework, which systematically realized related algorithms based on the Diffusion model, such as model fine-tuning, personalized training, reasoning optimization, and image editing. The structure is as follows: Figure 1 shows.

f06cc95daa7f3d76dd09c51794fcbff2.png

Figure 1 HCP-Diffusion framework structure diagram, unifies existing diffusion-related methods through a unified framework, and provides a variety of modular training and inference optimization methods.

HCP-Diffusion deploys various components and algorithms through a unified configuration file, which greatly improves the flexibility and scalability of the framework. Developers combine algorithms like building blocks without repeating implementation code details.

For example, based on HCP-Diffusion, we can complete the deployment and combination of LoRA, DreamArtist, ControlNet and other common algorithms by simply modifying the configuration file. This not only lowers the threshold for innovation, but also makes the frame compatible with various customized designs.

  • HCP-Diffusion code tool: https://github.com/7eu7d7/HCP-Diffusion

  • HCP-Diffusion GUI: https://github.com/7eu7d7/HCP-Diffusion-webui

HCP-Diffusion: Function Module Introduction

Framework Features

HCP-Diffusion realizes the versatility of the framework by modularizing the current mainstream diffusion training algorithm framework. The main features are as follows:

  • Unified architecture: build a unified code framework for Diffusion series models

  • Operator plug-in: supports operator algorithms such as data, training, reasoning, and performance optimization, such as deepspeed, colossal-AI, and offload to accelerate optimization

  • One-click configuration: Diffusion series models can complete model realization by modifying configuration files with high flexibility

  • One-click training: Provides Web UI, one-click training and inference

data module

HCP-Diffusion supports the definition of multiple parallel datasets. Each dataset can use different image sizes and annotation formats. Each training iteration will extract a batch from each dataset for training, as shown in Figure 2. In addition, each data set can be configured with multiple data sources, supports txt, json, yaml and other annotation formats or custom annotation formats, and has a highly flexible data preprocessing and loading mechanism.

d95ae32fa1b0b03db3dfd19193dcd6e8.png

Figure 2 Schematic diagram of the data set structure

The dataset processing part provides an aspect ratio bucket with automatic clustering, which supports the processing of datasets with different image sizes. The user does not need to do additional processing and alignment on the size of the dataset, and the framework will automatically select the optimal grouping method according to the aspect ratio or resolution. This technology greatly reduces the threshold of data processing, optimizes user experience, and enables developers to focus more on the innovation of the algorithm itself.

For the preprocessing of image data, the framework is also compatible with various image processing libraries such as torch vision, albumentations, etc. Users can directly configure the preprocessing method in the configuration file according to their needs, or expand the custom image processing method on this basis.

1082d646f2146f2a663586dbb51b11ea.png

Figure 3 Dataset configuration file example

In terms of text labeling, HCP-Diffusion has designed a flexible and clear prompt template specification, which can support complex and diverse training methods and data labeling. It corresponds to the word_names under the source directory of the above configuration file, in which you can customize the embedded word vectors and category descriptions corresponding to the special characters in the curly brackets in the figure below, so as to be compatible with models such as DreamBooth and DreamArtist.

a2f2696745e5aa75092ba67f1da0c476.png

Figure 4 prompt template

And for text annotation, it also provides a variety of text enhancement methods such as sentence-by-sentence erasure (TagDropout) or sentence-by-sentence scramble (TagShuffle), which can reduce the over-fitting problem between images and text data, and make the generated images more diverse change.

Model Framework Module

HCP-Diffusion realizes the versatility of the framework by modularizing the current mainstream diffusion training algorithm framework. Specifically, Image Encoder and Image Decoder complete image encoding and decoding, Noise Generator generates noise in the forward process, Diffusion Model realizes the diffusion process, Condition Encoder encodes the generation conditions, Adapter fine-tuning model is aligned with downstream tasks, positive and negative double Channels represent positive and negative conditional control generation of images.

4fe3877f532af04f9752dea6db6e286c.png

Figure 5 Model structure example configuration (model plugins, custom words, etc.)

As shown in Figure 5, HCP-Diffusion can implement various mainstream training algorithms such as LoRA, ControlNet, and DreamArtist through a simple combination in the configuration file. At the same time, it supports the combination of the above algorithms, such as simultaneous training of LoRA and Textual Inversion, binding exclusive trigger words for LoRA, etc. In addition, through the plug-in module, any plug-in can be easily customized, which is compatible with all current mainstream access methods. Through the above modularization, HCP-Diffusion realizes the framework construction of any mainstream algorithm, lowers the development threshold, and promotes the collaborative innovation of models.

HCP-Diffusion abstracts various Adapter algorithms such as LoRA and ControlNet into model plug-ins. By defining some general model plug-in base classes, all such algorithms can be treated uniformly, reducing user costs and development costs. All Adapter classes Algorithms are unified.

The framework provides four types of plug-ins, which can easily support all current mainstream algorithms:

+ SinglePluginBlock: A single-layer plug-in that changes the output according to the input of this layer, such as the lora series. Supports regular expressions (re: prefix) to define the insertion layer, does not support pre_hook: prefix.

+ PluginBlock: There is only one input layer and output layer, such as defining residual connections. Supports regular expressions (re: prefix) to define the insertion layer, and both input and output layers support pre_hook: prefix.

+ MultiPluginBlock: There can be multiple input and output layers, such as controlnet. Does not support regular expressions (re: prefix), both input and output layers support pre_hook: prefix.

+ WrapPluginBlock: Replace a layer of the original model, and use the layer of the original model as an object of this class. Supports regular expressions (re: prefix) to define replacement layers, does not support pre_hook: prefix.

training, inference module

02ea8c6684969f1cacedea5e70bb8214.png

Figure 6 Custom optimizer configuration

The configuration file in HCP-Diffusion supports the definition of python objects, which are automatically instantiated at runtime. This design allows developers to easily access any pip-installable custom modules, such as custom optimizers, loss functions, noise samplers, etc., without modifying the framework code, as shown in the figure above. The configuration file structure is clear, easy to understand, and highly reproducible, which helps to smoothly connect academic research and engineering deployment.

Accelerated optimization support

HCP-Diffusion supports multiple training optimization frameworks such as Accelerate, DeepSpeed, and Colossal-AI, which can significantly reduce the memory usage during training and speed up training. It supports EMA operation, which can further improve the generation effect and generalization of the model. In the inference stage, operations such as model offload and VAE tiling are supported, and image generation can be completed with a minimum of 1GB of video memory.

35e47da3fdb6a0ed74f21ca2efd41f3a.png

Figure 7 Modular configuration file

Through the above simple file configuration, you can complete the configuration of the model without spending a lot of effort to find the relevant framework resources, as shown in the figure above. The HCP-Diffusion modular design completely separates the model method definition, training logic, and inference logic. When configuring the model, it does not need to consider the logic of the training and inference parts, helping users to better focus on the method itself. At the same time, HCP-Diffusion has provided framework configuration samples of most mainstream algorithms, and deployment can be realized only by modifying some of the parameters.

HCP-Diffusion: Web UI Graphical Interface

In addition to directly modifying configuration files, HCP-Diffusion has provided a corresponding Web UI image interface, including multiple modules such as image generation and model training, to improve user experience, greatly reduce the learning threshold of the framework, and accelerate the algorithm from theory to practice. transform.

293de67589a5ba600d67986c59303935.png

Figure 8 HCP-Diffusion Web UI image interface

Laboratory Introduction

The HCP Lab of Sun Yat-sen University was founded by Professor Lin Ji in 2010. In recent years, it has made rich academic achievements in multimodal content understanding, causal and cognitive reasoning, embodied learning, etc., and has won several awards. Domestic and foreign science and technology awards and best paper awards, and is committed to creating product-level AI technology and platforms. Laboratory website: http://www.sysu-hcp.net

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read

Lying down, 60,000 words! 130 articles in 30 directions! CVPR 2023's most complete AIGC paper! read it in one go

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

1d052b5a2e39d7d6b9aa0303a5027174.png Click me to view GAN's series albums~!

A cup of milk tea, become the frontier of AIGC+CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join  the planet of AI-generated creation and computer vision  knowledge!

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/132486268