O(1) 的离散概率分布采样方法 - Alias Method

前言

如果你对这篇文章感兴趣，可以点击「【访客必读 - 指引页】一文囊括主页内所有高质量博客」，查看完整博客分类与对应链接。

Alias Method

给定一个离散概率分布 $\ 0.2\ 0.4\ 0.1]$ ，现要对该分布进行采样，最直接的方法是随机一个 $[0, 1]$ 之间的数 $v$ ，从左往右比：

若 $v\leq 0.3$ ，则采样类别 $A$ ；
若 $v > 0.3$ ，则和 $0.5 = 0.3 + 0.2$ 对比，若比 $0.5$ 小，则采样类别 $B$ ；
否则继续往后，和 $0.9 = 0.3 + 0.2 + 0.4$ 对比。

上述过程的时间复杂度为 $O (n)$ ，可通过二分算法加速至 $O(\log n)$ 。那么还有没有更快的方式？

考虑一种「接受 & 拒绝」的方式，对于上述离散概率分布 $p$ ，先在 $[1, 4]$ 中随机一个整数 $i$ ，再在 $[0, 1]$ 中随机一个实数 $r$ ，若 $r\leq p[i]$ ，则接受 $i$ （即采样 $i$ 对应类别），否则拒绝 $i$ ，重新执行上述过程。

上述过程显然较慢，而 Alias Method 便是对其进行加速的方法。Alias Method 的思想是，用概率高的填充概率低的，使得随机的 $i$ 一定可以对应一个结果，具体如下：

将 $p * n$ 得到 $p'=p*4=[1.2\ 0.8\ 1.6\ 0.4 ]$ ，如下图所示：
随后将概率高的类别填充至概率低的类别，实现每个位置只存在两个类别，且和为 1，即得到 Alias Table：

可以发现经过转换后，每一列最多只有两个类别，且和为 1。因此重新执行之前「接受 & 拒绝」的方式，先在 $[1, 4]$ 中随机一个整数 $i$ ，再在 $[0, 1]$ 中随机一个实数 $r$ ，例如 $i = 2$ ，则若 $r\leq 0.8$ ，则采样类别 $B$ ，否则采样类别 $A$ 。

通过上述的构造，Alias Method 只需执行一遍，实现 $O (1)$ 的时间复杂度。具体细节可参照如下代码实现：

import numpy as np


class aliasMethod:
    def __init__(self, prob):
        self.n = len(prob)
        self.prob = prob
        assert np.sum(prob) == 1.0
        self.accept, self.alias = self.create_table()

    def create_table(self):
        area = self.prob * self.n
        small, large = [], []
        accept, alias = [0] * self.n, [0] * self.n

        for idx, value in enumerate(area):
            if value < 1.0:
                small.append(idx)
            else:
                large.append(idx)

        while small and large:
            sid, lid = small.pop(), large.pop()
            accept[sid] = area[sid]
            alias[sid] = lid
            area[lid] = area[lid] + area[sid] - 1

            if area[lid] < 1.0:
                small.append(lid)
            elif area[lid] > 1.0:
                large.append(lid)

        for lid in large:
            accept[lid] = 1
        for sid in small:
            accept[sid] = 1

        return accept, alias

    def sample(self):
        i = int(np.random.random() * self.n)
        r = np.random.random()
        if r < self.accept[i]:
            return i
        else:
            return self.alias[i]


if __name__ == "__main__":
    n = 10
    prob = np.random.rand(n)
    prob /= np.sum(prob)

    alias_table = aliasMethod(prob)
    sample_cnt = 10000
    res = np.zeros(n)
    
    for i in range(sample_cnt):
        res[alias_table.sample()] += 1
    res /= np.sum(res)

    print(f"prob: {
      
      list(prob)}")
    print(f"simulation: {
      
      list(res)}")

O(1) 的离散概率分布采样方法 - Alias Method

前言

Alias Method

参考资料

猜你喜欢