SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Reading Notes

The article proposes a generative model based on stochastic differential equations. Denoising Score Matching with Annealed Langevin Sampling (SMLD) and Denoising Diffusion Probabilistic Models (DDPM) methods can be incorporated into this framework.

Establish a continuous-time indexed diffusion process { x ( t ) } t = 0 T , t ∈ [ 0 , T ] \{\mathbf{x}(t)\}_{t=0}^T, t \in [0, T]{ x(t)}t=0T,t[0,T ] , which satisfiesx ( 0 ) ∼ p 0 \mathbf{x}(0)\sim p_0x(0)p0is the target data distribution that needs to be learned, x ( T ) ∼ p T \mathbf{x}(T)\sim p_Tx(T)pTis a prior distribution that facilitates sampling. This diffusion process can be expressed by the solution of the following stochastic differential equation (SDE):
dx = f ( x , t ) dt + g ( t ) dw (5) \mathrm{d}\mathbf{x} = f(\mathbf {x}, t)\mathrm{d}t + g(t)\mathrm{d}\mathbf{w} \tag{5}dx=f(x,t)dt+g ( t ) dw _( 5 ) w\mathbf{w}w is the standard Wiener process,f ( ⋅ , t ) f(\cdot, t)f(,t ) is a vector function called drift coefficient,g ( t ) g(t)g ( t ) is a scalar function called diffusion coefficient.
By starting fromx ( T ) ∼ p T \mathbf{x}(T)\sim p_Tx(T)pTSampling and reversing the above process, we can get x ( 0 ) ∼ p 0 \mathbf{x}(0)\sim p_0x(0)p0, thereby obtaining samples of the target data distribution. Existing work has proved that the inverse of the above diffusion process is still a diffusion process, but the time goes from T to 0:
dx = [ f ( x , t ) − g ( t ) 2 ∇ x log ⁡ pt ( x ) ] dt + g ( t ) dw ‾ (6) \mathrm{d}\mathbf{x} = [f(\mathbf{x}, t) - g(t)^2 \nabla_\mathbf{x}\log p_t(\mathbf{ x})]\mathrm{d}t + g(t)\mathrm{d}\overline{\mathbf{w}} \tag{6}dx=[f(x,t)g(t)2xlogpt(x)]dt+g(t)dw( 6 ) w ‾ \overline{\mathbf{w}}wis a standard Wiener process with time from T to 0. ∇ x log ⁡ pt ( x ) \nabla_\mathbf{x}\log p_t(\mathbf{x})xlogpt( x ) is calledscore. If you can get score∇ x log ⁡ pt ( x ) \nabla_\mathbf{x}\log p_t(\mathbf{x})xlogpt( x ) , thenp 0 p_0p0sample.

In order to get x ( 0 ) ∼ p 0 \mathbf{x}(0)\sim p_0x(0)p0, you need to find score ∇ x log ⁡ pt ( x ) \nabla_\mathbf{x}\log p_t(\mathbf{x})xlogpt( x ) , and then solvex ( T ) \mathbf{x}(T)x(T)

score estimate

The score can be estimated by training a score-based model.
Insert image description here

Inverse SDE solution

The inverse time SDE can be directly solved using the general SDE solving algorithm. But because we have a score model, we consider a better method, which is to use the score-based MCMC method.
The author proposes Predictor-Corrector (PC) samplers. At each time step, the SDE solver is first used to estimate the samples of the next time step (predictor), and then the score-based MCMC method is used to correct the marginal distribution of the estimated samples (corrector).
Insert image description here

probability flow

For all diffusion processes, there is a certain process whose trajectory has the same marginal probability densities as the diffusion process { p ( x ) } t = 0 T \{p(\mathbf{x})\} _{t=0}^T{ p(x)}t=0T. The determination process corresponding to formula (5) satisfies the following ordinary differential equation (ODE):
dx = [ f ( x , t ) − 1 2 g ( t ) 2 ∇ x log ⁡ pt ( x ) ] dt \mathrm{d} \mathbf{x} = [f(\mathbf{x}, t) - \frac{1}{2}g(t)^2 \nabla_\mathbf{x}\log p_t(\mathbf{x})] \mathrm{d}tdx=[f(x,t)21g(t)2xlogpt( x )] d tThe author calls the above ODE probability flow ODE (probability flow ODE).
DDPM cannot directly calculate likelihood, and can only use ELBO to replace likelihood. The likelihood of the diffusion process can be calculated by converting it into a probability flow ODE. Because probability flow ODE is a special case of neural ODE, and the neural ODE paper has proven that it can calculate the probability of flow, the probability of probability flow ODE can be calculated through the instantaneous change of variables formula.
In addition, because formula (5) has no learnable parameters, if there is a perfectly estimated score, then the one-to-one mapping relationship between the data and the implicit representation can be obtained through the probability flow ODE. After all, probability flow ODE is a deterministic process similar to normalize flow, unlike methods such as DDPM, which are constantly sampling.

Controlled generation

Controllable generation can be obtained by solving the conditional inverse time SDE
dx = { f ( x , t ) − g ( t ) 2 [ ∇ x log ⁡ pt ( x ) + ∇ x log ⁡ pt ( y ∣ x ) ] } dt + g ( t ) dw ‾ \mathrm{d}\mathbf{x} = \{f(\mathbf{x}, t) - g(t)^2 [\nabla_\mathbf{x}\log p_t(\mathbf {x}) + \nabla_\mathbf{x}\log p_t(\mathbf{y|x})]\}\mathrm{d}t + g(t)\mathrm{d}\overline{\mathbf{w }}dx={ f(x,t)g(t)2[xlogpt(x)+xlogpt(y∣x)]}dt+g(t)dw

Guess you like

Origin blog.csdn.net/icylling/article/details/132135547