RANSAC（随机抽样一致算法）

RANSAC

RANSAC为Random Sample Consensus的缩写，它是根据一组包含异常数据的样本数据集，计算出数据的数学模型参数，得到有效样本数据的算法。不同于最小二乘法考虑对整体最优的模拟，RANSAC默认数据集中可能有一些outliers, 先设法把这些 outliers剔除，再进行模拟。

基本步骤

RANSAC achieves its goal by repeating the following steps:

1. Select a random subset of the original data. Call this subset the hypothetical inliers.

2. A model is fitted to the set of hypothetical inliers.

3. All other data are then tested against the fitted model. Those points that fit the estimated model well, according to some model-specific loss function, are considered as part of the consensus set.

4. The estimated model is reasonably good if sufficiently many points have been classified as part of the consensus set.

5. Afterwards, the model may be improved by reestimating it using all members of the consensus set.

参数选择

k - 迭代次数 n - 一次采样的样本数量

p - k次迭代中至少有一次迭代所有采样点均为inliers的概率 w - 每选取一个点这个点是inliers的概率

w = 所有数据中inliers的个数 / 所有数据点的个数

首先，我们假设一次采样的n个点的选取互相之间都是相互独立的（实际上并非如此，将在后续说明），那么一次采样中n个点均为inliers的概率即为w^n，一次采样中n个点中至少有一个点为outlier的概率为1-w^n，k次迭代中所有次迭代的采样点均存在outliers的概率为(1 - w ^n)^k。因此满足等式：

等式两边取对数，即可得到：

然而在实际情况下，一次采样的n个点的选取并不是相互独立的，所以w^n实际上会相对的变小，那么k值也会相应的变小，所以上述公式的k值其实只是迭代次数的上界。

n的选取往往为确定一个模型所需要的最小样本数量，比如对于拟合一条直线，n=2。

应用场景

用于特征点匹配的优化

通过图像特征点匹配算法，可以得到两个图像的多个特征点匹配对，但是有一些数据是错误的，即这些数据为outliers,通过RANSAC算法即可剔除这些outliers，得到最优的匹配结果。

参考资料

【1】维基百科：Random sample consensus

【2】随机抽样一致算法（Random sample consensus，RANSAC）