The Principle of Straight Line Fitting Based on the Least Square Method and Its Implementation in C++

"  During data analysis, it is necessary to eliminate noise interference as much as possible in order to analyze the essential laws of the data. One of the common methods to eliminate noise interference is data fitting, using straight lines, parabolas, and multiple curves as data models. fit the data. "

In this article, we mainly talk about the principle of straight line fitting based on the least squares method, and on this basis, introduce a straight line fitting algorithm that combines the least squares method and the RANSAC algorithm.

979330a216fba317cdf969351d35abfb.png

01

The principle of straight line fitting based on the least square method

The core idea of ​​least squares straight line fitting is: take the sum of squared differences between all sample values ​​and their corresponding model values ​​as the objective function, and when the objective function value reaches the minimum value, the model is considered to be the fitting of all samples .

We know that when the derivative is 0, the function obtains the extreme value, so each parameter can be solved by finding the partial derivative of the objective function with respect to each parameter and setting the partial derivative to 0. The parameters obtained when the partial derivative is 0 constitute the solution of the least squares method.

Next, let's deduce the calculation formula of least squares straight line fitting.

Suppose there are n points (x i , y i ) (0≤i<n), and assuming their fitting line is Y=ax+b, then for each x i , its fitting value is Y i = ax i +b. So the objective function is:

b5b1679d90bf0cad9ff6d8c84d7b984e.png

Our goal is to find the a and b parameters when f(x) achieves the minimum value . Then find the partial derivatives of f(x) for a and b respectively:

1611e1895cb49e7d75a537c72ad93598.png

Let the above partial derivatives be 0, and get a system of linear equations in two variables:

640b8f5c8fb072d00747007797cff49c.png

remember:

2f2a6d1e4b7d187c1d95a261aa7aecc4.png

So there are:

2cfd76d76a873bf5036b04486e899314.png

Solve the above equations to get a and b, which are the straight line fitting parameters we require:

616d3a42ead80fa2a6d3d0ee348e9d1d.png

Code:

//y=ax+b
void lineplofit(vector<Point2f>& points_list, int points_num, float* a, float* b)
{
    float sum_x2 = 0.0;
    float sum_y = 0.0;
    float sum_x = 0.0;
    float sum_xy = 0.0;


    int num = points_num;


    int i;
    for (i = 0; i < num; ++i)
    {
        sum_x2 += points_list[i].x * points_list[i].x;
        sum_y += points_list[i].y;
        sum_x += points_list[i].x;
        sum_xy += points_list[i].x * points_list[i].y;
    }


    float tmp = num * sum_x2 - sum_x * sum_x;
    if (abs(tmp) > 0.000001f)
    {
        *a = (num * sum_xy - sum_x * sum_y) / tmp;
        *b = (sum_x2 * sum_y - sum_x * sum_xy) / tmp;
    }
    else
    {
        *a = 0;
        *b = 0;
    }
}

02

The principle of straight line fitting based on least square method and RANSAC algorithm

In the least squares line fitting algorithm introduced in the previous chapter, all sample points are involved in the calculation. However, in the actual application process, due to the existence of noise, some sample points are often far away from most of the other points. These points are usually called outliers. If the outliers also participate in the straight line fitting operation, the fitting results will be relatively poor. big error. As shown below:

55f79d75efc02d44d91434e3fdce448f.png

Like the above situation, it is necessary to remove the outliers before fitting the straight line, otherwise there will be a large error, and the RANSAC algorithm is such a classic algorithm for removing outlier sample points.

Before introducing the RANSAC algorithm, let’s first talk about the meaning of interior points and exterior points: the sample points whose distance from the model is less than the set threshold are called interior points, and vice versa, the sample points whose distance from the model is greater than or equal to the set threshold are called exterior points (outside points That is, the outliers mentioned above). For example: if the threshold is set to 5, then if the distance from a point to the line is 2, the point is an inner point because 2 is less than 5, but if the distance from a point to the line is 8, the point is because 8 is greater than 5 for outliers.

The following describes the process and principle of RANSAC. This algorithm is a process of repeating multiple cycles. During multiple cycles, the set of inliers with the largest number of inliers is recorded, and finally the set of inlier points with the largest number of inliers is used to estimate the fitting model .

Assume that the maximum set of inliers in history is MaxInline, and the number of inliers is MaxM.

1. First, set a target model according to the needs. If it is straight line fitting, the model is y=ax+b; if it is quadratic curve fitting, the model is y=ax 2 +bx+c; if it is affine transformation fitting The model is a 2*3 affine transformation matrix......

2. Randomly select the "minimum number" of sample points required to calculate the model parameters from all sample points. Assume that the "minimum number" is n. Different models have different n values. If it is a straight line fitting, then n= 2. If it is a quadratic curve fitting, then n=3, if it is an affine transformation fitting, then n=3......

3. Use the n sample points randomly selected in the previous step to calculate the model parameters. If it is a straight line fitting, find a, b, if it is a quadratic curve fitting, find a, b, c, if it is an affine transformation fitting then Find the 6 parameters of the 2*3 matrix...

4. For each point, calculate its distance to the model, and judge that if the distance is less than the threshold, the point is an inline point and add it to the inline set Inline, otherwise add it to the outset set. At the same time, compare the inline point m with the maximum inline point MaxM in history, if m>MaxM, execute MaxM=m and MaxInline=Inline.

5. Determine whether MaxM exceeds a certain percentage of the total number of samples (such as 80%), or whether the number of cycles reaches the set maximum number of cycles. If MaxM does not exceed a certain percentage of the total number of samples and the number of cycles does not reach the maximum number of cycles, then jump back to step 2 above and start again, otherwise stop the cycle.

6. Judging whether MaxM≥n is satisfied, if it is satisfied, use the points in the MaxInline set to estimate the fitting model, otherwise, consider the RANSAC algorithm to fail.

In the above steps, the distance from the sample point to the model needs to be calculated, and the distance calculation method is different for different models. If it is a straight line model, then directly calculate the vertical distance from the point (x 0 ,y 0 ) to the line y=ax+b:

2065dbff45833a6111e104887c188b33.png

The distance threshold for judging the interior points needs to be set to an appropriate value in order to have a better effect of removing outliers, so you can try different thresholds multiple times. In this paper, according to the proportional gap between the minimum distance MinD and the maximum distance MaxD, the distance threshold The setting of α is converted to a proportional value α setting of 0~1, which reduces the setting range, so it is easier to find a suitable threshold:

2c8e11213fc8755a5cc537ad9c55d298.png

Code:

#define RANSAC_K 2


//获取0~n-1范围内的num个随机数
static void GetRansacRandomNum(int n, int num, int p[])
{
    int i = 0, j;


    int r = rand() % n;
    p[0] = r;
    i++;


    while (1)
    {
        int status = 1;
        r = rand() % n;


        for (j = 0; j < i; j++)
        {
            if (p[j] == r)
            {
                status = 0;
                break;
            }
        }


        if (status == 1)
        {
            p[i] = r;
            i++;
        }


        if (i == num)
            break;
    }
}


void RansacPolyfitLine(vector<Point2f> p, int iter_num, float alpha, float* a, float* b)
{
    int r_idx[RANSAC_K];


    vector<Point2f> pick_p;


    srand((unsigned)time(NULL));


    int max_inline_num = 0;


    vector<Point2f> inline_p;
    vector<Point2f> max_inline_p;
    vector<float> d_list;


    int n = p.size();


    for (int i = 0; i < iter_num; i++)   //总共迭代iter_num次
    {
        GetRansacRandomNum(n, RANSAC_K, r_idx);  //生成RANSAC_K个不重复的0~n-1的随机数


        pick_p.clear();
        //随机选择2个点
        for (int j = 0; j < RANSAC_K; j++)
        {       
            pick_p.push_back(p[r_idx[j]]);
        }


        float aa = 0, bb = 0;
        //使用以上随机选择的两个点来计算一条直线
        lineplofit(pick_p, RANSAC_K, &aa, &bb);


        float mind = 99999999.9f;
        float maxd = -99999999.9f;
        d_list.clear();
        //计算所有点到以上计算直线的距离,并记录最大最小距离
        for (int j = 0; j < n; j++)
        {
            float d = abs(aa * p[j].x - p[j].y + bb) / sqrtf(aa * aa + 1.0f);
            d_list.push_back(d);
            mind = MIN(mind, d);
            maxd = MAX(maxd, d);
        }
        //根据0~1的α值和最大最小距离计算阈值
        float threld = mind + (maxd - mind) * alpha;


        inline_p.clear();
        for (int j = 0; j < n; j++)
        {
            //判断如果点距离小于阈值则将该点加入内点集合
            if (d_list[j] < threld)
            {               
                inline_p.push_back(p[j]);                          
            }
        }
        //判断如果以上内点集合的点数大于历史最大内点数,则替换历史最大内点数集合
        if (max_inline_num < inline_p.size())
        {
            max_inline_num = inline_p.size();
            max_inline_p.swap(inline_p);
        }
    }
    //判断如果历史最大内点数大于等于2,则使用历史最大内点数集合来计算直线
    if (max_inline_num >= RANSAC_K)
    {
        lineplofit(max_inline_p, max_inline_p.size(), a, b);
    }
    else  //否则RANSAC算法失败
    {
        *a = 0;
        *b = 0;
    }


}

03

Straight line fitting result

When there are few outliers, the least squares line fitting algorithm is similar to the line fitting algorithm combined with least squares and RANSAC, as shown in the figure below, the blue line is the result of the least squares line fitting algorithm, and the green line is the combined minimum As a result of the straight line fitting algorithm of quadratic and RANSAC, the two lines basically coincide.

9f3b7832b98c25cdc3ee0194af99c289.png

When there are many outliers, the result of the least squares line fitting algorithm will have a large deviation (blue line), while the line fitting algorithm combined with least squares and RANSAC will not (green line):

6b03e79eb5b2cf9f0bd786d179939ea5.png

04

Digression

Personally, it is really difficult to make an official account well and expand. I was tired before, and I was busy with work, so I haven’t updated the official account for a long time. Now I decide to continue to update because I am very happy to see that although my official account has not been updated for a long time, I can still help some people one after another. Isn’t this my original intention: to improve myself, summarize myself, and help others .

If you see this, if you think my article is helpful to you, please help me to publicize and repost it so that more people in need can see it. Your affirmation is the biggest motivation for me to continue to update!

Welcome to scan the QR code to follow this WeChat official account, and more exciting content will be updated from time to time, so stay tuned~

Guess you like

Origin blog.csdn.net/shandianfengfan/article/details/130799228