The source of this article is the editorial department of the heart of the machine
In recent years, image generation models based on Diffusion Models have emerged in an endless stream, showing amazing generation effects. However, the existing related research model code framework has the problem of excessive fragmentation and lack of a unified framework system, resulting in code implementation problems of "difficult migration", "high threshold" and "poor quality".
To this end, Sun Yat-sen University Human-Computer-Physical Intelligence Fusion Laboratory (HCP Lab) built the HCP-Diffusion framework, which systematically realized related algorithms based on the Diffusion model, such as model fine-tuning, personalized training, reasoning optimization, and image editing. The structure is as follows: Figure 1 shows.
Figure 1 HCP-Diffusion framework structure diagram, unifies existing diffusion-related methods through a unified framework, and provides a variety of modular training and inference optimization methods.
HCP-Diffusion deploys various components and algorithms through a unified configuration file, which greatly improves the flexibility and scalability of the framework. Developers combine algorithms like building blocks without repeating implementation code details.
For example, based on HCP-Diffusion, we can complete the deployment and combination of LoRA, DreamArtist, ControlNet and other common algorithms by simply modifying the configuration file. This not only lowers the threshold for innovation, but also makes the frame compatible with various customized designs.
HCP-Diffusion code tool: https://github.com/7eu7d7/HCP-Diffusion
HCP-Diffusion GUI: https://github.com/7eu7d7/HCP-Diffusion-webui
HCP-Diffusion: Function Module Introduction
Framework Features
HCP-Diffusion realizes the versatility of the framework by modularizing the current mainstream diffusion training algorithm framework. The main features are as follows:
Unified architecture: build a unified code framework for Diffusion series models
Operator plug-in: supports operator algorithms such as data, training, reasoning, and performance optimization, such as deepspeed, colossal-AI, and offload to accelerate optimization
One-click configuration: Diffusion series models can complete model realization by modifying configuration files with high flexibility
One-click training: Provides Web UI, one-click training and inference
data module
HCP-Diffusion supports the definition of multiple parallel datasets. Each dataset can use different image sizes and annotation formats. Each training iteration will extract a batch from each dataset for training, as shown in Figure 2. In addition, each data set can be configured with multiple data sources, supports txt, json, yaml and other annotation formats or custom annotation formats, and has a highly flexible data preprocessing and loading mechanism.
Figure 2 Schematic diagram of the data set structure
The dataset processing part provides an aspect ratio bucket with automatic clustering, which supports the processing of datasets with different image sizes. The user does not need to do additional processing and alignment on the size of the dataset, and the framework will automatically select the optimal grouping method according to the aspect ratio or resolution. This technology greatly reduces the threshold of data processing, optimizes user experience, and enables developers to focus more on the innovation of the algorithm itself.
For the preprocessing of image data, the framework is also compatible with various image processing libraries such as torch vision, albumentations, etc. Users can directly configure the preprocessing method in the configuration file according to their needs, or expand the custom image processing method on this basis.
Figure 3 Dataset configuration file example
In terms of text labeling, HCP-Diffusion has designed a flexible and clear prompt template specification, which can support complex and diverse training methods and data labeling. It corresponds to the word_names under the source directory of the above configuration file, in which you can customize the embedded word vectors and category descriptions corresponding to the special characters in the curly brackets in the figure below, so as to be compatible with models such as DreamBooth and DreamArtist.
Figure 4 prompt template
And for text annotation, it also provides a variety of text enhancement methods such as sentence-by-sentence erasure (TagDropout) or sentence-by-sentence scramble (TagShuffle), which can reduce the over-fitting problem between images and text data, and make the generated images more diverse change.
Model Framework Module
HCP-Diffusion realizes the versatility of the framework by modularizing the current mainstream diffusion training algorithm framework. Specifically, Image Encoder and Image Decoder complete image encoding and decoding, Noise Generator generates noise in the forward process, Diffusion Model realizes the diffusion process, Condition Encoder encodes the generation conditions, Adapter fine-tuning model is aligned with downstream tasks, positive and negative double Channels represent positive and negative conditional control generation of images.
Figure 5 Model structure example configuration (model plugins, custom words, etc.)
As shown in Figure 5, HCP-Diffusion can implement various mainstream training algorithms such as LoRA, ControlNet, and DreamArtist through a simple combination in the configuration file. At the same time, it supports the combination of the above algorithms, such as simultaneous training of LoRA and Textual Inversion, binding exclusive trigger words for LoRA, etc. In addition, through the plug-in module, any plug-in can be easily customized, which is compatible with all current mainstream access methods. Through the above modularization, HCP-Diffusion realizes the framework construction of any mainstream algorithm, lowers the development threshold, and promotes the collaborative innovation of models.
HCP-Diffusion abstracts various Adapter algorithms such as LoRA and ControlNet into model plug-ins. By defining some general model plug-in base classes, all such algorithms can be treated uniformly, reducing user costs and development costs. All Adapter classes Algorithms are unified.
The framework provides four types of plug-ins, which can easily support all current mainstream algorithms:
+ SinglePluginBlock: A single-layer plug-in that changes the output according to the input of this layer, such as the lora series. Supports regular expressions (re: prefix) to define the insertion layer, does not support pre_hook: prefix.
+ PluginBlock: There is only one input layer and output layer, such as defining residual connections. Supports regular expressions (re: prefix) to define the insertion layer, and both input and output layers support pre_hook: prefix.
+ MultiPluginBlock: There can be multiple input and output layers, such as controlnet. Does not support regular expressions (re: prefix), both input and output layers support pre_hook: prefix.
+ WrapPluginBlock: Replace a layer of the original model, and use the layer of the original model as an object of this class. Supports regular expressions (re: prefix) to define replacement layers, does not support pre_hook: prefix.
training, inference module
Figure 6 Custom optimizer configuration
The configuration file in HCP-Diffusion supports the definition of python objects, which are automatically instantiated at runtime. This design allows developers to easily access any pip-installable custom modules, such as custom optimizers, loss functions, noise samplers, etc., without modifying the framework code, as shown in the figure above. The configuration file structure is clear, easy to understand, and highly reproducible, which helps to smoothly connect academic research and engineering deployment.
Accelerated optimization support
HCP-Diffusion supports multiple training optimization frameworks such as Accelerate, DeepSpeed, and Colossal-AI, which can significantly reduce the memory usage during training and speed up training. It supports EMA operation, which can further improve the generation effect and generalization of the model. In the inference stage, operations such as model offload and VAE tiling are supported, and image generation can be completed with a minimum of 1GB of video memory.
Figure 7 Modular configuration file
Through the above simple file configuration, you can complete the configuration of the model without spending a lot of effort to find the relevant framework resources, as shown in the figure above. The HCP-Diffusion modular design completely separates the model method definition, training logic, and inference logic. When configuring the model, it does not need to consider the logic of the training and inference parts, helping users to better focus on the method itself. At the same time, HCP-Diffusion has provided framework configuration samples of most mainstream algorithms, and deployment can be realized only by modifying some of the parameters.
HCP-Diffusion: Web UI Graphical Interface
In addition to directly modifying configuration files, HCP-Diffusion has provided a corresponding Web UI image interface, including multiple modules such as image generation and model training, to improve user experience, greatly reduce the learning threshold of the framework, and accelerate the algorithm from theory to practice. transform.
Figure 8 HCP-Diffusion Web UI image interface
Laboratory Introduction
The HCP Lab of Sun Yat-sen University was founded by Professor Lin Ji in 2010. In recent years, it has made rich academic achievements in multimodal content understanding, causal and cognitive reasoning, embodied learning, etc., and has won several awards. Domestic and foreign science and technology awards and best paper awards, and is committed to creating product-level AI technology and platforms. Laboratory website: http://www.sysu-hcp.net
Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read
In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm!
Classic GAN has to read: StyleGAN
Click me to view GAN's series albums~!
A cup of milk tea, become the frontier of AIGC+CV vision!
The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models
ECCV2022 | Summary of some papers on generating confrontation network GAN
CVPR 2022 | 25+ directions, the latest 50 GAN papers
ICCV 2021 | Summary of GAN papers on 35 topics
Over 110 articles! CVPR 2021 most complete GAN paper combing
Over 100 articles! CVPR 2020 most complete GAN paper combing
Dismantling the new GAN: decoupling representation MixNMatch
StarGAN Version 2: Multi-Domain Diversity Image Generation
Attached download | Chinese version of "Explainable Machine Learning"
Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"
Attached download | "Mathematical Methods in Computer Vision" share
"A review of surface defect detection methods based on deep learning"
A Survey of Zero-Shot Image Classification: A Decade of Progress
"A Survey of Few-Shot Learning Based on Deep Neural Networks"
"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."
Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join the planet of AI-generated creation and computer vision knowledge!