【贝叶斯分析①】Metropolis-Hastings算法理解和简单实现

贝叶斯分析的目标就是根据先验分布(prior distribution)和观测数据(data)求后验分布(posterior distribution)，计算机不能存储连续的分布，也就不能通过分析的方法求出后验，而是只能通过对分布进行采样离散化存储，这样就引出了对（先验）分布采样的问题。求参数后验的问题又称为推断问题（Inference），解决推断的方法主要分为Non-Markovian methods（e.g., Grid computing，Quadratic approximation, Variational methods）和Markovian methods (e.g., Metropolis-Hastings, Hamiltonian Monte Carlo/No U-Turn Sampler).

Grid computing是一种brute-force的方法，即对参数空间进行网格采样来计算后验，这种方法的缺点是采样效率很低（参数空间高维时尤甚）。目前贝叶斯分析采用的主要是Markov Chain Monte Carlo (MCMC) 方法。MCMC的优点就是采样效率更高（MCMC methods outperform the grid approximation because they are designed to spend more time in higher probability regions than in lower ones. In fact, a MCMC method will visit different regions of the parameter space in accordance with their relative probabilities.）MCMC方法最经典的一种就是Metropolis-Hastings方法。该方法的原理是借助一个方便采样的分布（e.g.,高斯分布）来对目标分布进行采样，具体步骤再次不再赘述。

简单示例代码如下：

# -*- coding: utf-8 -*-
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def metropolis(func, steps=100000):
    """A very simple Metropolis implementation"""
    samples = np.zeros(steps)
    old_x = func.mean()
    old_prob = func.pdf(old_x)

    for i in range(steps):
        new_x = old_x + np.random.normal(0, 1)
        new_prob = func.pdf(new_x)
        acceptance = new_prob/old_prob
        if acceptance >= np.random.random():
            samples[i] = new_x
            old_x = new_x
            old_prob = new_prob
        else:
            samples[i] = old_x
    return samples

np.random.seed(345)
func = stats.beta(0.4, 2)
samples = metropolis(func=func)
x = np.linspace(0.001, .999, 1000)
y = func.pdf(x)
plt.xlim(0, 1)
plt.plot(x, y, 'r-', lw=3, label='True distribution')
plt.hist(samples, bins=100, normed=True, label='Estimated distribution')
plt.xlabel('$x$', fontsize=14)
plt.ylabel('$pdf(x)$', fontsize=14)
plt.legend(fontsize=14)

输出：

qq_32464407

发布了44 篇原创文章 · 获赞 16 · 访问量 6万+

私信关注

【贝叶斯分析①】Metropolis-Hastings算法理解和简单实现

猜你喜欢