[Study notes] Gibbs Sampling

Gibbs Sampling

Intro

Gibbs Sampling method I recently was watching related papers probabilistic graphical model met, sampling methods are roughly as follows: iterative sampling, sampling from the beginning of the random sample, and this sample as a condition of entry, conditional probability sampling, every time from only one dimension, when all dimensions are finished sampling, beginning next iteration.

Random Sampling

We always assume that the probability density function of a random variable, how do we get obey the sampling distribution of the sample it?

I studied matrix theory, the teacher taught us to use inverse function to generate random numbers any probability distribution, therefore, we can also use the inverse function method to generate a sample of the distribution. I.e., assuming $ \ $ XI $ is a random variable on [0, 1] interval $ uniformly distributed, its inverse function $ cdf ^ {- 1} (\ xi) $ subject to the probability density function $ p (x) $ Distribution.

One problem is that, when $ p (x) $ complexity to its cumulative distribution function of inverse function can not be calculated, or do not know when $ p (x) $ exact value, how to sample it?

This time we should use some sampling strategies, such as refuse sampling, importance sampling, Gibbs sampling and so on. Here's mind about the various sampling strategies.

Rejection Sampling

Sampling rejection principle is known a proposal distribution q (often simple distribution), and the original distribution p, a sample from the sampling distribution proposed \ (\ Hat {X} \) , then the calculated receiving rate \ (a (\ hat {X}) = \ FRAC {P (\ Hat {X}} {KQ (\ Hat {X})} \) , then generates from a uniform distribution in a value of z, if z is less than equal to a, it is accepted sample, or We do not accept the sample and continue sampling, sampling to know enough samples.

This diagram should be explained, the blue line is above the proposed distribution, it must contain the original distribution, and to calculate the acceptance rate at z0.

However, rejected the proposal distribution and sampling requirements closer to the original distribution, so the sampling rate will be relatively high, otherwise the sampling method is inefficient, it is often not practical using this sampling method. Similarly, the importance sampling method is relatively inefficient method. (Omitted)

MCMC

MCMC is a Markov Chain Monte Carlo method is a method for sampling high-dimensional variables.

The core idea is to MCMC sampling process as a Markov chain, that t + 1 first samples is dependent on the t-th samples taken \ (x_t \) and a state transition distribution \ (q (x | x_t) \) . According to the convergence properties of the Markov chain, we know more than enough for this transfer after the final state will converge to a fixed state, we assume that the distribution when the convergence is \ (the p-(the X-) \) , then in a steady state when the sample obtained by sampling would certainly obey and \ (p (x) \) distribution.

The method has general application MCMC Metropolis-Hastings algorithm and Gibbs sampler algorithm. In order to hurry up the introduction of Gibbs Sampling, former omitted.

Gibbs Sampling

Suppose a random vector \ (the X-= (x_1, x_2, ..., x_d) \) , where d is the dimension d him, each dimension is a random variable, and are not independent of each other our common premise. So, if we know the probability distribution of random vectors, how we sampled it from this distribution?

Obviously we want to sample directly from the joint probability distribution of multivariate distributions is quite difficult, and Gibbs Sampling is a simple and effective method of sampling. Gibbs sampling steps outlined below:

From a random initialization state \ (x ^ {(0) } = [x_1 | x_2 ^ {(0)}, x_3 ^ {(0)}, \ cdots, x_d ^ {(0)}] \) begins, for each dimension separately sampled, the sampling sequence is as follows:
\ [x_1 ^ {(. 1)} \ thicksim P (x_1 | x_2 ^ {(0)}, {^ X_3 (0)}, \ cdots, x_d ^ {(0)}) \\ x_2 ^ {(1)} \ thicksim p (x_2 | x_1 ^ {(0)}, x_3 ^ {(0)}, \ cdots, x_d ^ {(0)}) \\ \ vdots \\ x_d ^ {(1 )} \ thicksim p (x_d | x_1 ^ {(0)}, x_2 ^ {(0)}, \ cdots, x_ {d-1} ^ {(0)}) \ \\ vdots \\ x_1 ^ {(t )} \ thicksim p (x_1 | x_2 ^ {(t-1)}, x_3 ^ {(t-1)}, \ cdots, x_d ^ {(t-1)} ) \\\ vdots \\ x_ {d} ^ {(t)} \ thicksim p (x_M | x_1 ^ {(t-1)}, x_2 ^ {(t-1)}, \ cdots, x_d ^ {( t-1)}) \\ \
] to comply with the above sampling step, we finally obtain a sample can be sampled high dimensional distribution required. It should be noted that the iterative beginning sampled sample sample distribution is not entirely satisfy the need, since the beginning of the sampling distribution of the sample is proposed to be distributed, generally uniformly distributed, and Gibbs Sampling process is more like a single step iterative process, which reminded me of the EM algorithm, all the same, step by step to reach the final iteration result.

I found a picture can describe this process online:

As shown above, the right is the distribution we need, on the left is an iterative process, the beginning of the sampling points 0 and 1 are obtained by sampling a uniform distribution, but more to the back, the more sampling points are distributed to satisfy our right so that this process can explain Gibbs sampling sampling process is feasible.

There are below this figure, almost:

Coding

Gibbs Sampling我是从一篇图像合成的论文中看到并有所了解的,文章基于MRF,使用神经网络去拟合条件分布\(p(x_i|x_{-i})\),其中\(x_{-i}\)表示除了第i个属性的其他属性。

具体到图像中来,\(x_i\)就是第i个位置的像素点的像素值,而\(x_{-i}\)描述的就是除了这个点以外的其他所有点,因此上式的概率分布就是一个条件分布。

使用神经网络可以拟合出这个分布来,那么如何去生成图片又是一个问题。

文章给出的解决方案就是Gibbs Sampling,先从随机噪声开始,逐像素进行生成,第一次迭代完成将生成一张图片,那么第二次第三次依次可以使用上一次迭代完前生成的图片进行迭代生成下一次,当迭代次数足够多的时候,即我们认为达到了平稳分布,这个时候生成的图片就是服从该分布的图片了。

原文参见:

原文链接

具体的,我给出下面的代码:

import numpy as np
import torch
import torch.nn.functional as F
from torch import nn, optim
from torch.utils import data
from torchvision import datasets, transforms, utils
from tqdm import tqdm
from PIL import Image
import glob
import random
import cv2 as cv
class MaskedConv2d(nn.Conv2d):
    '''
    mask_type A or B
    A : the center is zero
    B : the center is not zero
    '''
    def __init__(self,mask_type,*args,**kwargs):
        super(MaskedConv2d,self).__init__(*args,**kwargs)
        assert mask_type in ["A","B"]
        self.mask_type = mask_type
        self.register_buffer('mask', self.weight.data.clone())
        _,_,h,w = self.weight.size()
        self.mask.fill_(1)
        self.mask[:,:,h//2,w//2 + (mask_type == 'B'):] = 0
        self.mask[:,:,h//2+1:,:] = 0
        
    def forward(self,x):
        self.weight.data *= self.mask
        return super(MaskedConv2d,self).forward(x)
    
    
class DoublePixelCNN(nn.Module):
    def __init__(self,fm,kernel_size = 7,padding = 3):
        super(DoublePixelCNN, self).__init__()
        self.net1 = nn.Sequential(
                MConv('A', 1,  64, 17, 1,8, bias=False), nn.BatchNorm2d(64), nn.ReLU(True),
                MConv('B', 64, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                #nn.Conv2d(fm, 256, 1)
        ) 
        self.net2 = nn.Sequential(
                MConv('A', 1,  64, 17, 1,8, bias=False), nn.BatchNorm2d(64), nn.ReLU(True),
                MConv('B', 64, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                MConv('B', fm, fm, kernel_size, 1, padding, bias=False), nn.BatchNorm2d(fm), nn.ReLU(True),
                #nn.Conv2d(fm, 256, 1)
        ) 
        
        self.conv1x1 = nn.Conv2d(fm*2, 256, 1)
    def forward(self,x):
        x1 = self.net1(x)
        x2 = self.net2(x.flip(dims = [-1,-2]))
        x = torch.cat([x1,x2.flip(dims = [-1,-2])],dim = 1)
        x = self.conv1x1(x)
        return x

if __name__ == "__main__":
    tr =       data.DataLoader(datasets.MNIST(root="/media/xueaoru/Ubuntu/dataset/data",transform=transforms.ToTensor(),),
                     batch_size=64, shuffle=True, num_workers=12, pin_memory=True)
    net = DoublePixelCNN(128)
    net.cuda()
    sample = torch.rand(64,1,k,k).cuda()
    optimizer = optim.Adam(net.parameters(),lr = 0.0001)
    for epoch in range(1000):
        net.train()
        running_loss = 0.
        for input,_ in tqdm(tr):
            #print(input.size())
            input = input.cuda()
            #target = target.cuda()
            target = (input.data[:,:] * 255).long() # (b,3,h,w)
            # net(input) (b,256,3,h,w)
            loss = F.cross_entropy(net(input), target) # 计算的是每个像素的二分类交叉熵
            running_loss += loss.item()
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print("training loss: {:.8f}".format(running_loss / len(tr)))
        if epoch % 5 == 0:
            torch.save(net.state_dict(),open("./{}.pth".format(epoch),"wb"))
            #sample.fill_(0)
            net.eval()
            with torch.no_grad():
                for t in tqdm(range(300)):
                    for i in range(k):
                        for j in range(k):
                            out = net(sample) # (b,256)
                            probs = F.softmax(out[:, :, i ,j],dim = 1).data # (b,c) = (16,256)
                            sample[:, :, i, j] = torch.multinomial(probs, 1).float() / 255.
                
                utils.save_image(sample, 'sample_{:02d}.png'.format(epoch), nrow=12, padding=0)
                sample = torch.rand(64,1,k,k).cuda()

由于这个方法采样时间极其缓慢,所以我生成的图片尺度比较小,训练周期也比较短,只是做个demo使用。

Guess you like

Origin www.cnblogs.com/aoru45/p/12092453.html