RANSAC algorithm-after reading to ensure that you understand

RANSAC full Random sample consensus, Chinese is "random sampling consensus algorithm", this method uses an iterative method to estimate the parameters of the mathematical model from a set of observed data containing outliers (outlier or erroneous data), compared with the least squares Method, which incorporates the idea of ​​excluding unqualified data, so for data samples with partial erroneous data, the identification results can be given faster and more accurately. The algorithm was first proposed by Fischler and Bolles in 1981, and they adopted this method to solve the image localization problem (LDP). At present, it is widely used in the fields of image and recognition.

1 Defects of the least squares algorithm

The least square method, which is to find the best function matching of data by minimizing the square of the error, is widely used in the field of data identification, but it has certain defects in the fitting of certain data. The least squares method is for all The data is globally optimal, but not all data is suitable for fitting, which means that the data may have large errors or even errors, and the identification of such errors requires a certain cost, as shown in the following diagram (only ) Is a typical example:
Least squares cannot be given offline
Obviously, the result of the above linear fitting is not what we want, what we really need is the effect of the fitting:
Insert picture description here
this is the fitting effect of the RANSAC algorithm! Because the red dots that do not want to be involved in the fitting are removed and blue dots within a certain range are selected, this ensures that the sample data is clean and the fitting effect is closer to the real.

2 RANSAC algorithm

2.1 Principle

The workflow of the general RANSAC algorithm is as follows [2] :

Given as follows:
data – a set of observed data sets.
Model – fitted model (eg linear, quadratic, etc.).
N – the minimum number of data sets used for fitting.
K – the maximum number of traversals specified by the algorithm.
T – The threshold of the degree of matching between the data and the model, which is inliers in the range of t and outliers in the range.
D – indicates the minimum number of suitable data sets for the model.

The return is as follows:
bestFit – a set of best matching model parameters, namely model parameters

In the above premise, the parameters of data and model and model are easy to understand, but the meaning of n, k, t, and d may be a little vague, so let ’s put it first and let it be understood slowly later.

The pseudo code of the function body is as follows:

Parameter initialization:
iterations = 0 /// traversal times
bestFit = nul
bestErr = something really large ///

/// 遍历
while iterations < k do
    maybeInliers := n /// 从数据中随机选择n个拟合的数据组
    maybeModel := model parameters fitted to maybeInliers /// 根据以上n个数据获得模型的参数
    alsoInliers := empty set  /// 初始化空数据组
    
    for every point in data not in maybeInliers do  /// 遍历:将处maybeInliers外的其他数据组一一与模型进行比较
        if point fits maybeModel with an error smaller than t /// 如果比较下来误差在t范围内,则调价到inliers集合中
             add point to alsoInliers
    end for
    
    if the number of elements in alsoInliers is > d then /// 如果当前得到的Inliers集合中的数据组数量大于d
        // This implies that we may have found a good model ///意味着该模型是个“好”模型(即好参数)
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        /// 如果当前模型的与Inliers中数据的误差比之前得到的最小误差更小,则更新最小误差,
        /// 最优模型参数设置为当前模型参数
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    
    increment iterations  /// 继续遍历
end while

Look at the above pseudo-code above two sides, I believe it should be able to understand the meaning of the four parameters, if you do not understand, it does not matter, there are more practical examples. Look down:

2.2 Examples

The following is a function and example implemented in MATLAB using the RANSAC algorithm model as a linear function.

function [bestParameter1,bestParameter2] = ransac_demo(data,num,iter,threshDist,inlierRatio)
 % data: a 2xn dataset with #n data points
 % num: the minimum number of points. For line fitting problem, num=2
 % iter: the number of iterations
 % threshDist: the threshold of the distances between points and the fitting line
 % inlierRatio: the threshold of the number of inliers 
 
 %% Plot the data points
 figure;plot(data(1,:),data(2,:),'o');hold on;
 number = size(data,2); % Total number of points
 bestInNum = 0; % Best fitting line with largest number of inliers
 bestParameter1=0;bestParameter2=0; % parameters for best fitting line
 for i=1:iter
 %% Randomly select 2 points
     idx = randperm(number,num); sample = data(:,idx);   
 %% Compute the distances between all points with the fitting line 
     kLine = sample(:,2)-sample(:,1);% two points relative distance
     kLineNorm = kLine/norm(kLine);
     normVector = [-kLineNorm(2),kLineNorm(1)];%Ax+By+C=0 A=-kLineNorm(2),B=kLineNorm(1)
     distance = normVector*(data - repmat(sample(:,1),1,number));
 %% Compute the inliers with distances smaller than the threshold
     inlierIdx = find(abs(distance)<=threshDist);
     inlierNum = length(inlierIdx);
 %% Update the number of inliers and fitting model if better model is found     
     if inlierNum>=round(inlierRatio*number) && inlierNum>bestInNum
         bestInNum = inlierNum;
         parameter1 = (sample(2,2)-sample(2,1))/(sample(1,2)-sample(1,1));
         parameter2 = sample(2,1)-parameter1*sample(1,1);
         bestParameter1=parameter1; bestParameter2=parameter2;
     end
 end
 
 %% Plot the best fitting line
 xAxis = -number/2:number/2; 
 yAxis = bestParameter1*xAxis + bestParameter2;
 plot(xAxis,yAxis,'r-','LineWidth',2);
end


%% Generate random data for test
data = 150*(2*rand(2,100)-1); data = data.*rand(2,100);
ransac_demo(data,2,100,10,0.1);

The results are as follows:
ransac - demo
Here, 100 sets of (x, y) data sets randomly generated from the data (data) set, the model (model) is y = ax + b, the parameters to be identified are a and b, that is, n is 2, the number of traversal k is 100 (the amount of data itself is small and can be traversed all), t is 10, that is, only points that meet the distance from the line y = ax + b less than 10 will be considered as inliers, and d refers to the identified model and parameters For the minimum number of good inliers set, the ratio is used here, 0.1 means one tenth of 100, that is, the amount of data reaches 10, it can be considered that the requirements have been met.

2.3 Parameters

It can be seen that in addition to selecting the appropriate data and model, the RANSAC algorithm also needs to select the appropriate 4 parameters n, k, t, and d, where n, t, and d can be obtained based on experience, then k can be calculated according to the following formula:

among them, p p represents the probability that the result of the RANSAC algorithm is useful, w w is the probability of the data in the inliers set, then for the n data needed for the model fitting once, the probability that they are all in the inliers set is w n w^n (Replace sampling probability), the probability of not inliers is 1 w n 1-w^n , so the result of k iterations satisfies:

there can be a calculation formula for k.

In fact, the existing function ransac is also stored in MATLAB. This function is designed to be more versatile. Those interested can learn it by themselves.

Finally, the examples and principles of this article mainly refer to: "Random sample consensus" .

reference

  1. Wikipedia: "Random sample consensus" .
  2. Martin A. Fischler and Robert C. Bolles (June 1981). “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”. Comm. of the ACM 24: 381–395. doi:10.1145/358669.358692.
  3. MATLAB official website about RANSAC introduction: https://www.mathworks.com/discovery/ransac.html

Thanks for reading

Published 46 original articles · Like 103 · Visit 250,000+

Guess you like

Origin blog.csdn.net/zhoucoolqi/article/details/105497572