DDPM: Denoising Diffusion Probabilistic Model, denoising diffusion probability model
Reference in this article: A video to understand the principle derivation of the diffusion model DDPM | AI painting underlying model_哔哩哔哩_bilibili
1. General principle
From right to left is the forward noise addition process, and from left to right is the reverse noise reduction process.
Continuously adding noise in the forward process, after T times , we hope
In this way, during inference, we can take out from random (add ' to indicate that this is a new value).
If we can learn the noise reduction method, we can finally pass the new picture.
2. What does the noise reduction method of the diffusion model predict
Now is the noise reduction method that needs to be learned . The DDPM algorithm is not a method of directly learning the predicted value, but the predicted conditional probability distribution , and then the value obtained from the distribution . This method is similar to the deepar prediction method in that distributions are predicted instead of values.
So why predict distributions instead of exact values?
Because the distribution can be sampled , the model has randomness.
Furthermore, if you get it , you can get it by sampling , so that you can get it step by step . Therefore, what we want to learn is the distribution of p, not an exact graph.
Conclusion: The whole learning process is predicting the distribution p .
Later we will see that the model is predicting noise, which is not the noise between and , but the noise involved in the calculation of the normal distribution p .
So, we get it by prediction , and then get p. It also verified our conclusion, that is, the whole learning process is predicting the distribution p .
3. Disassembly of conditional probability distribution
Formula 1 :, the original conditional probability distribution is transformed according to the Bayesian formula, and the new formula contains 3 probability distributions.
(1) Calculation of the first p
The first p is:
From to the probability distribution in the process of adding noise, because the process of adding noise is defined in advance, the probability distribution p can also be defined.
Now we define the ramping process as follows:
Equation 2 :, wherethe noise,.
Because , so . (ps: the variance needs to be squared)
Can be seen as the variance of the noise, it needs to be very small close to 0. Only when the added noise is small, the forward and backward directions obey the normal distribution.
Further derivation, , namely:
Formula 3: .
(2) Calculation of the third p
The third p is: , which is similar to the second p. If you find a calculation method for one, then the other can be similarly obtained.
In the previous step, we obtained the formula 2 of each step of adding noise, and the conditional probability distribution formula 3 of each step of adding noise.
For the process of adding noise, , so it can be used .
Transformation of formula 1:
Formula 4:
Because the heating process is a Markov process, it is only related to the previous step, and has nothing to do with the previous step, that is, the sum has nothing to do, so
It is obtained step by step , so no further simplification is possible. Furthermore, Equation 4 simplifies to:
Formula 5 :
Now start to calculate the value of the new third p again, and deduce it from formula 2 as follows (ps: brackets indicate that some parameters are included but not written out, and unimportant information is omitted):
Finally, after an imprecise derivation, we give the official result:
Equation 6 :, whichrepresents continuous multiplication.
(3) Diffusion formula solution
If obtained in the previous step , it can also be obtained similarly .
The official result of formula 4 is given directly:
Formula 7 :
Among them is the hyperparameter, the formula is as follows:
Formula 8 :
Because it is fixed, the task of seeking becomes seeking .
If so , then the predicted inference value can be obtained according to the following formula:
Formula 9 :,
If one is taken out of , the process is non-derivable (directly input the mean value and variance value through the python package), then there is a problem in the reverse process, so it can be converted to formula 9 through the heavy parameter technique. Guided formula to express .
In the inference stage is the value we ultimately want, which is unknown, so a formula to convert to a known factor is needed.
Equation 6 is transformed by the heavy parameter technique as follows:
Formula 10 :, and then get:
Formula 11 :, where t is the current number of noise adding stages, which will change. At the same time, thisis the parameter value of the intermediate process and cannot be used as the final predicted value, because the p-process of reasoning needs to follow the Markov process, so it must be derived step by step.
In Formula 7, the unknown value is , and the unknown value in the value is , and the unknown value in is , which cannot be calculated and derived by existing formulas .
So we use the UNet network, input , output .
Substituting Equation 11 into Equation 8, we get:
Equation 12 : , where among other things are known.
It is predicted by the UNet network, which can be expressed as a parameter of the UNet model.
*************The process of the diffusion model getting the predicted image through the UNet network************** :
The above is the most important logic of the diffusion model DDPM .
4. Model training
According to Equation 12, it can be seen that the UNet network is trained with normally distributed noise .
Question 1: What is the input and output during model training?
Answer: input , output .
Question 2: So which process performs the training of UNet network parameters?
Answer: the noise-adding process. The denoising process is the training phase, and the denoising process is the inference phase.
According to formula 2, the noise of the noise addition process is defined by the implementation, so we can compare the predicted noise and the real KL divergence to calculate the loss value. In the official description, the KL divergence formula can be simplified to calculate the two The mse value of a value.
Question 3: Is it deduced step by step during training?
Answer: No need. During the training process, according to formula 10 , it can be calculated by , , , these four values.
It can be calculated in advance and stored in memory, which is the input image set, the input noise, and the number of noise-adding stages.
Therefore, each step in the forward direction can directly obtain the value.
5. Pseudo-code implementation of training and inference
(1) Training stage
Interpretation:
Represents taking pictures from the data set
Indicates that a number of noise-adding stages is randomly selected. As mentioned earlier, the noise-adding process does not need to be done step by step.
for
(2) Reasoning stage
Interpretation:
It means that the reverse process needs to be done step by step.
The complex calculation in step 4 corresponds to Equation 9, and the first formula in the calculation corresponds to Equation 12.