随机抽样一致（RANSAC，Random Sample Consensus）

前言

随机采样一致（RANSAC）是一种迭代方法，可从一组包含离群值（outliers）的观察数据中估计数学模型的参数，不使离群值对估计值产生影响。因此，它也可以解释为离群值检测方法。从某种意义上说，它是一种非确定性算法，以一定的概率产生合理的结果，且随着迭代次数的增加，该概率增加。RANSAC由Fischler和Bolles于1981年首次提出，解决了位置确定问题（LDP，Location Determination Problem，LDP简述见附录A）¹ ²。

简而言之，通用RANSAC算法是通过最有可能的数据集合或者说内群值（inliers），排除离群值，拟合或估计一个高鲁棒性模型³。

所以，RANSAC也可以理解成一种思想——排除可能存在的错误数据，来估计模型参数或者做一些其他的事情，如图像特征点匹配。这个跟主动学习（Active Learning）的思想有点相通之处。主动学习寻找尽可能少的标注点训练模型，就像RANSAC对inliers的迭代搜索。且两个过程都存在对outliers的判定和操作。主动学习的思想体系更繁杂一些。

回到RANSAC，其有一些基本假设需要我们知道。

整个数据集由内群值（inliers）和离群值（outliers）组成；
内群值的分布可用参数化模型解释，尽管其存在噪声；
离群值不适合模型解释，其来自噪声的极值，错误的测量方法，对数据的错误假设等。

即使上述假设对于数据集不成立，即不存在outliers，也不影响RANSAC对模型的参数估计。因为在这种情况下，RANSAC的迭代过程，可以将整个整个数据集纳为内群值（inliers），然后估计模型参数。那么我们就来看一些，RANSAC的算法迭代流程是怎样进行的。

算法

直接搬运的Wiki的内容¹，顺便贴一个KTH课件⁴上的图，两个描述有些许不同，但总的思想是一致的。

1. 描述

从数据集中随机选择一个子集，称作==“假设inliers（hypothetical inliers）”==，一致集合（consensus set）的初始样本集；
估计或训练一个模型，拟合上述子集；
基于某些损失函数（loss function）或者规则，从数据集剩余数据样本中，选择能较优地符合模型的数据样本，添加到一致集合。如，模型是一条直线方程，假如剩余的数据样本中存在到直线的距离小于阈值 $TH$ 的数据样本，认为该数据样本与模型一致，纳入一致集合；一致集合（consensus set）中的数据点为内群值（inliers），其余为离群值（outliers）；
当一致集合中有足够多的数据样本，认为2中的估计模型足够合理；
利用一致集合中的所有数据样本重新估计模型。
重复上述过程，最终返回误差最小的模型，或者包含inliers最多的模型。

2. 伪代码

Given:
    data – A set of observations.
    model – A model to explain observed data points.
    n – Minimum number of data points required to estimate model parameters.
    k – Maximum number of iterations allowed in the algorithm.
    t – Threshold value to determine data points that are fit well by model.
    d – Number of close data points required to assert that a model fits well to data.

Return:
    bestFit – model parameters which best fit the data (or nul if no good model is found)

iterations = 0
bestFit = nul
bestErr = something really large

while iterations < k do
    maybeInliers := n randomly selected values from data
    maybeModel := model parameters fitted to maybeInliers
    alsoInliers := empty set
    for every point in data not in maybeInliers do
        if point fits maybeModel with an error smaller than t
             add point to alsoInliers
    end for
    if the number of elements in alsoInliers is > d then
        // This implies that we may have found a good model
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    increment iterations
end while

return bestFit