The difference of the cumulative distribution function of the standard distribution to simulate the discrete Gaussian distribution

How to understand "the difference of the cumulative distribution function of the standard distribution to simulate the discrete Gaussian distribution" in image generation?

Image generation needs to calculate the likelihood probability of the entire picture, but generally speaking (such as finding L0 in diffusion), what can be obtained is usually the continuous Gaussian distribution (mean and variance) corresponding to the picture, but the pixel value of the picture is 0- A discrete value of 255, so calculating the likelihood of a picture requires a discrete Gaussian distribution. How to obtain a discrete distribution through a continuous distribution has different processing methods, one of which is defined as the discrete p(x)=continuous Gaussian distribution (PDF) in the probability density function of a unit centered on x. At this time, the calculation area can be calculated by the cumulative distribution function (CDF), specifically discrete p(x)=CDF(x+half unit)-CDF(x-half unit), which is the so-called difference. It is worth noting that the cumulative distribution function is usually approximated by the standard normal distribution, so if you have a Gaussian distribution (non-standard), you need to normalize it and convert it to a standard normal distribution for subsequent processing

Know almost

discretized_gaussian_log_likelihood

code

The gaussian_diffusion.py of IDDPM official source code
is for t=0, the calculation is L 0 = − log ⁡ p θ ( x 0 ∣ x 1 ) L_0=-\log p_{\theta}(x_0|x_1)L0=logpi(x0x1)
When t=0, the mean and variance from the model arex 0 x_0x0mean and variance of


        # 对应着L[0]损失函数,负对数似然NLL:Negative Log Likelihood,用累积分布函数的差分去拟合高斯分布
        # 一般称之为Decoder NLL    L0 = -log p(x0|x1)
        # discretized_gaussian_log_likelihood: 这是一个常见的操作:用连续分布的累积分布函数的差分,去模拟离散分布
        decoder_nll = -discretized_gaussian_log_likelihood(
            x_start, means=out["mean"], log_scales=0.5 * out["log_variance"]
        ) # x_start:就是我们观测的数据,将之带入分布之中,看其可能性为多大
        # mean和log_scales就是分布的均值和标准差
        # 离散的高斯分布的似然,传入的是x0, 预测的均值,预测的对数的标准差
        assert decoder_nll.shape == x_start.shape
        decoder_nll = mean_flat(decoder_nll) / np.log(2.0)   # 除以np.log(2.0),得到binary_predimension,做一个归一化

        # At the first timestep return the decoder NLL,
        # otherwise return KL(q(x_{t-1}|x_t,x_0) || p(x_{t-1}|x_t))
        # t=0时刻,用离散的高斯分布去计算似然,此时model输出的正是x0的mu和sigma
        # t>0时刻,直接用KL散度
        # 当t=0时刻返回decoder——nll; 不等于0时返回kl
        output = th.where((t == 0), decoder_nll, kl)
        return {
    
    "output": output, "pred_xstart": out["pred_xstart"]}

loss.py of IDDPM official source code

def approx_standard_normal_cdf(x):
    """
    A fast approximation of the cumulative distribution function of the
    standard normal.
    """
    return 0.5 * (1.0 + th.tanh(np.sqrt(2.0 / np.pi) * (x + 0.044715 * th.pow(x, 3))))


def discretized_gaussian_log_likelihood(x, *, means, log_scales):
    """
    这是一个常见的操作:用连续分布的累积分布函数的差分,去模拟离散分布
    Compute the log-likelihood of a Gaussian distribution discretizing to a
    given image.

    :param x: the target images. It is assumed that this was uint8 values,
              rescaled to the range [-1, 1].
    :param means: the Gaussian mean Tensor.
    :param log_scales: the Gaussian log stddev Tensor.
    :return: a tensor like x of log probabilities (in nats).
    """
    assert x.shape == means.shape == log_scales.shape
    # 减去均值      x ∈[-1, 1]
    centered_x = x - means            # 归一化后转变为标准正态分布    
    inv_stdv = th.exp(-log_scales)    # 1/sigma

    # 将[-1,1]分为255个bins, 
    # 最右边的CDF记为1,最左边的CDF记为0
    # 256个槽,对于每个槽取左右两边一个微笑的距离,  # 图片的像素值是0-255的离散值
    plus_in = inv_stdv * (centered_x + 1.0 / 255.0) # 槽的右边
    cdf_plus = approx_standard_normal_cdf(plus_in) 

    min_in = inv_stdv * (centered_x - 1.0 / 255.0) # 槽的左边
    cdf_min = approx_standard_normal_cdf(min_in) # 去算一个标准分布的累计分布函数
    # 然后再*inv_stdv,就相当于除以一个标准差,就可以得到一个近似的标准分布
    # 把这样一个近似的标准分布的累积分布函数给它算出来

    # 稳定性的一个操作:把右边的累计分布函数的对数给计算出来,且需要确保最小值不能为0
    log_cdf_plus = th.log(cdf_plus.clamp(min=1e-12))
    # 同理:要算出左边的
    log_one_minus_cdf_min = th.log((1.0 - cdf_min).clamp(min=1e-12))

    # 用小范围的CDF之差来表示PDF
    cdf_delta = cdf_plus - cdf_min  # 两个累计分布函数的差值就是近似等于概率分布的值

    # 考虑到两个极限的地方,这里用到了两个where
    log_probs = th.where(
        x < -0.999,
        log_cdf_plus,
        th.where(x > 0.999, log_one_minus_cdf_min, th.log(cdf_delta.clamp(min=1e-12))),
    )
    ''' 
    x<-0.999, 即x在最左边的时候,我们直接用log_cdf_plus,即最右边的,
    当x在最小值的位置时,我们认为x左边的那个cdf,我们把它直接赋为0.

    x>0.999,即x在最右边的时候,我们直接用1-cdf_min,在最右边的时候,
    我们把其累计分布函数强制写为1.

    对于二者之间的,我们直接对cdf_delta取一个对数就好
    '''

    assert log_probs.shape == x.shape
    return log_probs

The approx_standard_normal_cdf and discretized_gaussian_log_likelihood functions are both utility functions for modeling probability distributions in image generation models.

The approx_standard_normal_cdf function is used to calculate an approximation to the cumulative distribution function of the standard normal distribution. This approximation can be used to discretize a continuous distribution into a finite number of values. In the image generation model, a discrete Gaussian distribution is often used to model the pixel values ​​of the image, and the approx_standard_normal_cdf function provides a method to approximate this discrete distribution.

The discretized_gaussian_log_likelihood function is used to calculate the logarithm of the probability density function of the Gaussian distribution under discrete values. This function takes the target image and the mean and variance of the Gaussian distribution as input, and calculates the logarithm of the probability density function of each pixel value under this distribution. This function normalizes image pixel values ​​to the range [-1, 1] and then discretizes them into 256 values. For each value, it first calculates the cumulative distribution function of the corresponding probability distribution, and then uses the difference between two adjacent cumulative distribution functions as an approximation to the probability density function of that value. Ultimately, the function returns the logarithm of the probability density function of each pixel value under a Gaussian distribution.

In the first line of the discretized_gaussian_log_likelihood function, we can see this comment:

It is assumed that this was uint8 values, rescaled to the range [-1, 1].
This comment explains that the input image x is represented in the range [0, 255] of 8-bit unsigned integers, and then normalized so that the range of pixel values ​​​​of the image is scaled to [-1, 1] . The code to implement normalization may appear in Dataset or DataLoader, depending on the situation of the dataset. Here, we only need to know that the input image has been normalized to the [-1, 1] range in the discretized_gaussian_log_likelihood function.

kl = mean_flat(kl) / np.log(2.0)
mean_flat(kl) is to average the numbers of c, h, and w, and the resulting kl.shape=(b,)

Information amount Information
amount can be used to describe the information of an event, define an event x = x \mathrm{x}=xx=x, I ( x ) = − log ⁡ P ( x ) I(x)=-\log P(x) I(x)=logP ( x ) , its unit is Nats (nats), 1 nats means1 e \frac{1}{e}e1The amount of information obtained by observing an event with probability.
(here log is based on e, and logarithm with base 2 is also used, and its unit is bit or Shannons)

information theory

approx_standard_normal_cdf

Why is the expression in approx_standard_normal_cdf 0.5 * (1.0 + th.tanh(np.sqrt(2.0 / np.pi) * (x + 0.044715 * th.pow(x, 3)))), is there any basis

This expression is a way to approximate the cumulative distribution function (Cumulative Distribution Function, CDF) of the standard normal distribution, which is based on the activation function tanh ⁡ \tanhThe shape of tanh approximates the CDF of the standard normal distribution. Specifically, it uses the following two properties:

tanh ⁡ ( x ) \tanh(x)The range of tanh ( x ) is( − 1 , 1 ) (-1, 1)(1,1 ) , and whenxxWhen x approaches positive and negative infinity,tanh ⁡ ( x ) \tanh(x)tanh ( x ) approaches± 1 \pm 1± 1 . Therefore,0.5 ∗ ( 1 + tanh ⁡ ( x ) ) 0.5 * (1 + \tanh(x))0.5(1+tanh ( x )) ranges from( 0 , 1 ) (0, 1)(0,1 ) , and whenxxAs x approaches plus or minus infinity, it approaches1 11

when xxWhen the distribution of x is a standard normal distribution,2 / π ⋅ x \sqrt{2/\pi} \cdot x2/ p The mean of the distribution of x is 0 00 with a variance of2 π \frac{2}{\pi}Pi2. Therefore, 2 / π ⋅ x \sqrt{2/\pi} \cdot x2/ p The distribution shape of x is similar to that of the standard normal distribution.

when xxWhen the distribution of x is a standard normal distribution,2 / π ⋅ x \sqrt{2/\pi} \cdot x2/ p The mean and variance of the distribution of x
are: Mean:E [ 2 / π ⋅ x ] = 2 π ⋅ E [ x ] = 0 E[\sqrt{2/\pi} \cdot x] = \sqrt{\frac{ 2}{\pi}} \cdot E[x] = 0E[2/ p x]=Pi2 E [ x ]=0
value:V ar [ 2 / π ⋅ x ] = ( 2 / π ) 2 ⋅ V ar [ x ] = 2 π Var[\sqrt{2/\pi} \cdot x] = (\sqrt{2/\ pi})^2 \cdot Var[x] = \frac{2}{\pi}was [ _2/ p x]=(2/ p )2r [ x ] _=Pi2
Since the standard normal distribution has a mean of 0 and a variance of 1, 2 / π ⋅ x \sqrt{2/\pi} \cdot x2/ p The distribution of x has a mean of 0 and a variance of 2 π \frac{2}{\pi}Pi2

By combining these two properties, 2 / π ⋅ x \sqrt{2/\pi} \cdot x2/ p The shape of the distribution of x is approximately the CDF of the standard normal distribution, resulting in0.5 ∗ ( 1 + tanh ⁡ ( 2 / π ⋅ x ) ) 0.5 * (1 + \tanh(\sqrt{2/\pi} \cdot x ))0.5(1+fishy (2/ p x )) approximation. This formula can be used to calculate the cumulative distribution function of the standard normal distribution, because0.5 ∗ ( 1 + tanh ⁡ ( 2 / π ⋅ x ) ) 0.5 * (1 + \tanh(\sqrt{2/\pi} \cdot x))0.5(1+fishy (2/ p x )) ranges from( 0 , 1 ) (0, 1)(0,1 ) , and whenxxAs x approaches plus or minus infinity, it approaches1 11

The polynomial form is derived using Taylor series expansions. Specifically, we want a cubic polynomial to approximate the cumulative distribution function of the standard normal distribution. Therefore, we can consider using the Taylor series expansion to approximate the cumulative distribution function of the standard normal distribution. The Taylor series expansion looks like this:

Φ ( x ) ≈ 1 2 + 1 2 π x − 1 6 2 π x 3 + 1 24 2 π x 5 \Phi(x) \approx \frac{1}{2}+\frac{1}{\sqrt {2 \pi}} x-\frac{1}{6 \sqrt{2 \pi}} x^{3}+\frac{1}{24 \sqrt{2 \pi}} x^{5}Φ ( x )21+2 p.m 1x62 p.m 1x3+242 p.m 1x5

where Φ ( x ) \Phi(x)Φ ( x ) represents the cumulative distribution function of the standard normal distribution. Comparing this expansion with0.5 ∗ ( 1.0 + tanh ⁡ ( 2 / π ⋅ x ) ) 0.5 * (1.0 + \tanh(\sqrt{2/\pi} \cdot x))0.5(1.0+fishy (2/ p x )) for comparison, it can be found that their forms are very similar. We can make their approximation better by adjusting the coefficients of the constant term and the cubic term in the expansion.

After some derivation and tweaking, we can get the following polynomial:
x + 0.44715 x 3 x+0.44715x^3x+0.44715x3

The form of this polynomial is very similar to the Taylor expansion above, but their coefficients have been adjusted to make the approximation better.

tanh activation function

Tanh was born later than Sigmoid. We mentioned that the sigmoid function has a disadvantage that the output is not centered on 0, which makes the convergence slower. And Tanh is to solve this problem. Tanh is the hyperbolic tangent function. Equal to hyperbolic cosine divided by hyperbolic sine. Function expressions and images are shown in the figure below. This function is an odd function.
tanh ⁡ ( x ) = sinh ⁡ ( x ) cosh ⁡ ( x ) = ex − e − xex + e − x \tanh (x)=\frac{\sinh (x)}{\cosh (x)}=\ frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}tanh ( x )=cosh(x)s i n h ( x )=ex +exexex
insert image description here
The tanh activation function is well written!

Cumulative Density Function of the Standard Normal Distribution

The probability density function of the standard normal distribution:
φ ( x ) = 1 2 π e − x 2 2 \varphi(x)=\frac{1}{\sqrt{2 \pi}} e^{-\frac{x ^{2}}{2}}φ ( x )=2 p.m 1e2x2

Cumulative density function of standard normal distribution:
Φ ( x ) = 1 2 π ∫ − ∞ xe − t 2 2 dt \Phi(x)=\frac{1}{\sqrt{2 \pi}} \int_{- \infty}^{x} e^{-\frac{t^{2}}{2}} dtΦ ( x )=2 p.m 1xe2t2dt
insert image description here

Density function of the standard normal distribution

CDF of the standard overall distribution

Guess you like

Origin blog.csdn.net/weixin_43845922/article/details/129984992