[总结]随机抽样与蓄水池抽样问题

现实中碰到很多需要随机抽样的问题，这也是算法工程师面试中常见的题型，特意记录在这里。下面以几个例题为例，展开随机抽样问题的解决方案。

[leetcode]470.Implement Rand10() Using Rand7()

已提供一个Rand7()的API可以随机生成1到7的数字，使用Rand7实现Rand10，Rand10可以随机生成1到10的数字。可以通过拒绝采样的方法来计算：用Rand49生成一个数，如果它位于41-49，则丢弃，继续生成，当生成一个1-40的数时返回。这样子做可以近似看成随机生成1-40之间的数。

class Solution:
    def rand10(self):
        """
        :rtype: int
        """
        while True:
            val = (rand7() - 1) * 7 + rand7()  #减一是为了从0开始
            if(val <= 40):
                break
        return (val-1) % 10 + 1

[leetcode]478.Generate Random Point in a Circle

拒绝采样（Rejection Sampling）。对于圆中任意小的面积内落入点的概率相等。注意刚才说的是任意面积落点的概率是相等的。而如果采用随机半径+随机角度的方式，那么在任意半径上落入点的概率相等。很明显的是靠近圆心的半径比较密，远离圆心的时候半径比较稀疏。

class Solution:
    def __init__(self, radius: float, x_center: float, y_center: float):
        self.r = radius
        self.x = x_center
        self.y = y_center

    def randPoint(self) -> List[float]:
        nr = math.sqrt(random.random()) * self.r
        alpha = random.random() * 2 * 3.141592653
        newx = self.x + nr * math.cos(alpha)
        newy = self.y + nr * math.sin(alpha)
        return [newx, newy]

[leetcode]519.Random Flip Matrix # Fisher-Yates Shuffle算法，利用dict字典映射取前20个数字？

给定一个全零矩阵的行和列，实现flip函数随机把一个0变成1并返回索引，实现rest函数将所有数归零。
使用Fisher-Yates Shuffle算法，Fisher-Yates洗牌算法是用来打乱一个随机序列的算法，主要步骤为：在0到n（索引）之间生成一个数m，交换m和n（索引对应的数），n（索引）减掉1，循环这三步，直到n等于0。主要思想就是每次采样（索引）时，当前随机采样到的数（索引对应的数）交换到最后一个数（末尾索引对应的数），然后采样池数量减一（末尾索引减一），然后继续采样和交换（不断迭代），直到采样池为空。

import random
class Solution(object):
    def __init__(self, n_rows, n_cols):
        self.n_rows, self.n_cols = n_rows, n_cols
        self.reset()

    def flip(self):
        self.n -= 1
        i = random.randrange(0, self.n+1)
        index = self.dic.get(i, i)
        self.dic[i] = self.dic.get(self.n, self.n)
        return [index // self.n_cols, index % self.n_cols]

    def reset(self):
        self.n = self.n_rows * self.n_cols
        self.dic = {}

在这个题中不能直接使用数组进行这个过程的模拟，内存不够。所以，使用一个字典保存已经被随机数选择过的位置，把这个位置和末尾的total交换的实现方式是使用字典保存这个位置交换成了末尾的那个数字。每次随机到一个数字，然后在字典中查，如果这个数字不在字典中，表示这个数字还没被选中过，那么就直接返回这个数字，把这个数字和末尾数字交换；如果随机数已经在字典中出现过，那么说明这个位置已经被选中过，使用字典里保存的交换后的数字返回。

[leetcode]528.Random Pick with Weight

把概率分布函数转化为累计概率分布函数。然后通过随机数，进行二分查找。
比如，输入是[1,2,3,4]，那么概率分布是[1/10, 2/10, 3/10, 4/10, 5/10]，累积概率分布是[1/10, 3/10, 6/10, 10/10].总和是10。如果我们产生一个随机数，在1~10之中，然后判断这个数字在哪个区间中就能得到对应的索引。
对于输入[1,2,3,4]，计算出来的preSum是[1,3,6,10]，然后随机选一个s，然后查找s属于哪个区间。

import random
class Solution:
    def __init__(self, w: List[int]):
        self.cursum = [0]
        for weight in w:
            self.cursum.append(self.cursum[-1]+weight)

    def pickIndex(self) -> int:
        r = random.randrange(0,self.cursum[-1])
        # r = random.random()*self.cursum[-1]
        return self.gethi(self.cursum,r)

    def gethi(self,nums,target):
        lo,hi = 0,len(nums)-1
        while lo<=hi:
            mid = lo + ((hi-lo)>>1)
            if target < nums[mid]:
                hi = mid-1
            else:
                lo = mid+1
        return hi

[leetcode]382.Linked List Random Node

对于数量居多无法实现内存加载、值从流中输入长度未知的情况，我们无法做到先统计数量再使用随机函数实现，所以就会用到蓄水池算法。由于限定了head一定存在，所以我们先让返回值res等于head的节点值，然后让cur指向head的下一个节点，定义一个变量i，初始化为1，若cur不为空我们开始循环，我们在[0, i - 1]中取一个随机数，如果取出来0，那么我们更新res为当前的cur的节点值，然后此时i自增一，cur指向其下一个位置，这里其实相当于我们维护了一个大小为1的水塘，然后我们随机数生成为0的话，我们交换水塘中的值和当前遍历到的值，这样可以保证每个数字的概率相等

import random
class Solution:

    def __init__(self, head: ListNode):
        """
        @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node.
        """
        self.head = head

    def getRandom(self) -> int:
        """
        Returns a random node's value.
        """
        res = self.head.val

        i = 1
        cur = self.head.next
        while cur:
            j = random.randint(0,i)
            if j== 0:
                res = cur.val
            i+=1
            cur = cur.next
        return res

[leetcode]384.Shuffle an Array

每次往后读取数组的时候，当读到第i个的时候以1/i的概率随机替换1～i中的任何一个数，这样保证最后每个数字出现在每个位置上的概率都是相等的。
证明：
设\(x\)元素在第\(m\)次的时候出现在位置\(i\)的概率是\(1/m\),那么在第\(m+1\)次的时候，\(x\)仍然待在位置\(i\)的概率是 \(1/m * m/(m+1) = 1/(m+1)\)

import random
class Solution:

    def __init__(self, nums: List[int]):
        self.nums = nums
        self.shuffle_nums = nums[:]


    def reset(self) -> List[int]:
        """
        Resets the array to its original configuration and return it.
        """
        return self.nums


    def shuffle(self) -> List[int]:
        """
        Returns a random shuffling of the array.
        """
        for i in range(len(self.shuffle_nums)):
            j = random.randint(0,i)
            self.shuffle_nums[i],self.shuffle_nums[j] = self.shuffle_nums[j], \
            self.shuffle_nums[i]
        return self.shuffle_nums

[leetcode]497.Random Point in Non-overlapping Rectangles

根据面积作为权重，按概率选到长方形。之后在这个长方形的范围内随机选x和y，输出。

class Solution:
    def __init__(self, rects: List[List[int]]):
        self.rects = rects
        self.weights = []
        for x1,y1,x2,y2 in self.rects:
            self.weights.append((x2 - x1 + 1) * (y2 - y1 + 1))


    def pick(self) -> List[int]:
        x1,y1,x2,y2 = random.choices(
            self.rects, weights=self.weights)[0]
        res = [
            random.randrange(x1, x2 + 1),
            random.randrange(y1, y2 + 1)
        ]
        return res

参考：
470.Implement Rand10() Using Rand7() （拒绝采样Reject Sampling）
519.Random Flip Matrix（Fisher-Yates洗牌算法）