The most complete and detailed illustration of Diffusion Model

Transformer has become very popular in the field of AI painting recently. AI painting effects have woken up from the nightmare of DeepDream in 2022, and have started to attract illustration effects and associative effects to amazing effects from OpenAI's DALL·E 2[2] in 2022. .
Insert image description here
But you need to understand: Transformer brings AI + art, starting from language, encountering multi-modality, and colliding with the spark of art. This theme requires a lot of additional knowledge points. It may be different from the way that miracles are achieved in fields such as CV and NLP. AI + In addition to encountering the Transformer structure, the Art Session will also involve a series of mathematics-related knowledge such as VAE, ELBO, and Diffusion Model.
In the Transformer + Art series, a new Diffusion Models pit was dug today. Like VAE, the principle is very complicated and the implementation is very rough. It is said that the generative diffusion model is known for its mathematical complexity, and it seems to be much more difficult to understand than VAE and GAN. Is it true? Can Diffusion Models Be Less Mathematics? Can the diffusion model really fail to understand a simple point?
In this article, we examine the theoretical basis of diffusion models and then demonstrate how to use diffusion models in PyTorch to generate images. Let's dive in!

1. Basic introduction of Diffusion Model

Diffusion Models have not received much attention since their publication because they are not as simple and easy to understand as GAN. However, in recent years, the field of generative models has emerged. The two most advanced text-generated images, OpenAI's DALL·E 2 and Google's Imagen, are both based on the diffusion model.

Insert image description here
Today's hot topic of generating diffusion models started with the DDPM (Denoising Diffusion Probabilistic Model) proposed in 2020. The groundbreaking paper DDPM published in 2020 alone demonstrated the capabilities of the diffusion model to the world, defeating it in image synthesis. GAN, so many subsequent image generation fields began to turn to research in the DDPM field.

After reading many articles on the Internet, when introducing DDPM, the probability transition distribution is introduced, followed by variational inference, and then the maximum likelihood solution and the introduction of evidence lower bound (Evidence Lower Bound). A bunch of mathematical notations scared me away in the past few weeks (of course, from this introduction we can see again that the theoretical relationship between DDPM and VAE is actually very close), coupled with people's inherent concerns about traditional diffusion models Impression, so the illusion of "requires very deep mathematical knowledge" is formed.

2. Comparison of generated models

Let’s first take a look at the recently popular generative models GAN, VAE, Flow-based Models, and Diffusion Models.

GAN consists of a generator (generator) and a discriminator (discriminator). The generator is responsible for generating realistic data to "fool" the discriminator, and the discriminator is responsible for judging whether a sample is real or "manufactured". The training of GAN is actually two models learning from each other. Can it not be called "confrontation" and be more harmonious.

VAE also wants to train a generative modelInsert image description here

, this model is able to map the sampled probability distribution to the probability distribution of the training set. Generate a latent variable z, and z contains both data information and noise. In addition to restoring the input sample data, it can also be used to generate new data.
Insert image description here
Diffusion Models are inspired by non-equilibrium thermodynamics. The theory first defines a Markov chain of diffusion steps to slowly add random noise to the data, and then learns a reverse diffusion process to construct the desired data samples from the noise. Unlike VAE or flow models, diffusion models are learned through a fixed process, and the latent space
has a relatively high dimensionality.

Generally speaking, the field of Diffusion Models is in a state of blooming. This field is a bit like when GAN was first proposed. The current training technology allows Diffusion Models to directly skip the model adjustment stage in the GAN field and can be directly used for downstream purposes. Task.

3. Intuitive understanding of Diffusion model

A generative model is essentially a set of probability distributions. As shown in the figure below, on the left is a training data set. All the data in it are derived from a certain data set.Insert image description here

A random sample drawn from the independent and identical distribution. On the right is its generative model (probability distribution). In this probability distribution, find a distribution Pθ that makes it the closest to Pdata. Then
take a new sample on , you can get a steady stream of new data.
Insert image description here
However, the form of Pdata is often very complex, and the image dimension is very high. It is difficult for us to traverse the entire space, and the data samples we can observe are also limited.

What does Diffusion do?

We can add noise to any distribution, including of course the Pdata we are interested in, so that it eventually becomes a pure noise distribution N(0,I). How to understand it?

From the perspective of probability distribution , consider the Swiss roll-shaped two-dimensional joint probability distribution P(x, y) in the figure below. The diffusion process q is very intuitive. The originally concentrated and ordered sample points are disturbed by noise and diffuse outward. Eventually becomes a completely disordered noise distribution.
Insert image description here
The diffusion model is actually the inverse process P on the picture, which gradually denoises a noise distribution N(0,1) and maps it to Pdata. With such a mapping, we sample from the noise distribution and finally get a The desired image can be generated.

Looking at this process from a single image sample, the diffusion process q is to continuously add noise to the image until the image becomes pure noise, and the inverse diffusion process P is the process of generating an image from pure noise.
As shown: Variation of a single image sample
Insert image description here

4. Formal analysis of Diffusion model

Diffusion Models are called generative models, which means that Diffusion Models are used to generate data similar to the training data. Fundamentally, Diffusion Models work by continuously adding Gaussian noise to destroy the training data, and then learning to recover the data by reversing the noise process.

After training, you can use Diffusion Models to pass randomly sampled noise into the model and generate data by learning the denoising process. That is the basic principle corresponding to the picture below, but the picture here is still a bit thick.
Insert image description here
More specifically, the diffusion model is a latent variable model that is mapped to latent space using a Markov Chain (MC). Through the Markov chain, noise is gradually added to the data xi at each time step t to obtain the posterior probability q(x1:T|x0), where x1,...,xT represent the input data and are also the latent space. That is to say, the latent space of Diffusion Models has the same dimension as the input data.

  • Posterior probability. In Bayesian statistics, the posterior probability of a random event or an uncertain event is the conditional probability obtained after considering and giving relevant evidence or data.
  • A Markov chain is a random process in state space that goes through transitions from one state to another. This process requires "memoryless" properties: the probability distribution of the next state can only be determined by the current state, and the previous events in the time series have nothing to do with it. This particular type of "memorylessness" is called the Markov property.
    Diffusion Models are divided into forward diffusion process and reverse reverse diffusion process. The figure below shows the diffusion process. From x0 to the final xT is a Markov chain, which represents the random process of transition from one state to another in the state space. The subscript is the image diffusion process corresponding to Diffusion Models.
    Insert image description here

Finally, the real image input from x0 is asymptotically transformed into a pure Gaussian noise image xT after Diffusion Models. Model training mainly focuses on the inverse diffusion process. The goal of training a diffusion model is to learn the inverse of the forward process: that is, to train a probability distribution Insert image description here
. New data x0 can be regenerated by traversing backwards along the Markov chain.
It’s a bit interesting to read this. The biggest difference between Diffusion Models and GAN or VAE is that it is not generated through a model, but is based on Markov chain and generates data by learning noise.
Insert image description here
In addition to generating very interesting high-quality pictures, Diffusion Models also has many other benefits, the most important of which is that there is no confrontation in the training process. For the GAN network model, confrontational training is actually very difficult to debug. Because the two models that compete with each other during the training process are a black box for us. In addition, in terms of training efficiency, the diffusion model is also scalable and parallelizable, so how to speed up the training process, how to add more mathematical rules and constraints, and expand to voice, text, and 3D fields is very interesting. Lots of new articles.

5. Detailed explanation of Diffusion Model

It has been clearly stated above that Diffusion Models consist of a forward process (or diffusion process) and a reverse process (or inverse diffusion process), in which the input data is gradually noisy, and then the noise is converted back to samples of the source target distribution.
Next, there will be a little bit of mathematics. I can only say that I will try to make it as simple as possible. It is a Markov chain + conditional probability distribution. The core lies in how to use the neural network model to solve the probability distribution of the Markov process.

5.1 Diffusion forward process (diffusion process)

The training data of ChatGPT is based on massive text data in the Internet world. If the text data itself is inaccurate or has some kind of bias, the current ChatGPT cannot distinguish it, so it will inevitably be used when answering questions. Accuracy and bias are passed on.
Insert image description here
Insert image description here
Insert image description here
Insert image description here

5.2 Diffusion reverse diffusion process

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

5.3 Training loss

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

5.4 Training process

Users will enter information when using ChatGPT. Due to the powerful functions of ChatGPT, some employees use ChatGPT to assist their work, which has caused the company to worry about the leakage of trade secrets. Because the input information may be used as training data for further iterations of ChatGPT.
Insert image description here

6. Summary

Insert image description here

references

https://zhuanlan.zhihu.com/p/549623622
https://zhuanlan.zhihu.com/p/449284962
https://zhuanlan.zhihu.com/p/532736667
https://zhuanlan.zhihu.com/p/525106459
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Denoising Diffusion Probabilistic Models
Diffusion Models Beat GANs on Image Synthesis
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Generative Modeling by Estimating Gradients of the Data Distribution
Denoising Diffusion Probabilistic Models

7. Welcome everyone to join the column [Artificial Intelligence Algorithm Frontier]

This column covers computer vision, natural language processing, machine learning and other artificial intelligence related fields;

This column will explain in detail and comprehensively the principles of hot algorithm in various fields, and will reproduce the paper code from scratch step by step.

Guess you like

Origin blog.csdn.net/DFCED/article/details/132394895