Qingfeng Mathematical Modeling - Fitting Algorithm

Fitting algorithm

concept

As mentioned in the previous chapter, the interpolation algorithm can be used to calculate a certain curve through given sample points to calculate some desired values. But there are some problems. First, if there are too many sample points, the degree of the polynomial is too high, which will cause Runge's phenomenon; second, in order to avoid Runge's phenomenon, the fitting curve is obtained through the idea of ​​segmentation, but this will cause the curve function to be very complicated.

In response to the above problems, in the fitting problem, there is no need for the curve to pass through a given point. The goal of the fitting problem is to find a function (curve), and the function is set as simple as possible so that the curve is closest to all data points under a certain criterion, that is, as long as the error is small enough, (minimum loss function), this is the idea of ​​fitting.

Determine the fitting curve

Given a set of data [x,y], find the fitting curve between y and x

image-20230811201338259

Draw the image corresponding to this set of data on matlab

plot(x,y,'o');

image-20230811205402341

Fit a curve to get close to the sample point. Here I use a simple fitting curve y=kx+b. The question now is, when k and b take what value, the sample point is closest to the fitting curve.

Geometric Interpretation of the Least Squares Method

image-20230811210132216

  • The first definition has an absolute value, and subsequent derivation is not easy, so the calculation is more complicated. So we often use the second definition, which is exactly the idea of ​​the least squares method
  • We also do not use cubic power, because the cubic power calculation of the distance between the sample point and the fitted curve will result in a negative number, then the distance will be canceled out.
  • We also do not use the fourth power. When using the fourth power, if an outlier appears far away from the curve, the fitting curve will be greatly affected.

image-20230811210708976

Solving least squares method

image-20230811210825318

The two formulas we finally settled on: k </sup> and b<sup> derivation formulas

  • This formula is obtained by deriving the derivatives of k and b and then separating the coefficients.

Matlab solves the least squares method

image-20230811211416247

It is not difficult to get the code based on the formula

plot(x,y,'o');
xlabel("x");
ylabel("y");
n=size(x,1);%% 数据的个数
k=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.*x)-sum(x)*sum(x));
b=(sum(x.*x)*sum(y)-sum(x)*sum(x.*y))/(n*sum(x.*x)-sum(x)*sum(x));
hold on;%% 写上这句后续可以继续在之前的图形上画图形
grid on;%% 图形显示网格线
f=@(x) k*x+b; %% f=kx+b是匿名函数,该函数图形不需要另外传参数也能形成图形
fplot(f,[2.5,7]);
legend('样本数据','拟合函数','location','southeast');
  1. The f function is an anonymous function, and the function graph can form a graph without passing additional parameters. Drawing graphics in matlab requires passing parameters. For example, under normal circumstances, the f function needs to pass the parameter

Basic usage of anonymous functions

handle = @(arglist) anonymous_function
  • Where handle is the name used when calling the anonymous function.

  • arglist is the input parameter of the anonymous function, which can be one or multiple, separated by commas.

  • anonymous_function is the expression of anonymous function.

  • Note that there should be spaces between input parameters and expressions

  1. fplot can be used to draw graphs of anonymous unary functions

Basic usage

fplot(f,xinterval) 
  • Plot the anonymous function f in the specified interval xinterval. xinterval = [xmin xmax] represents the range of the domain

image-20230811214612764

How to evaluate the quality of fitting

image-20230811214710021

  • According to SST, SSE and SSR, it can be proved that:
  1. SST=SSE+SSR
  2. Goodness of fit: 0<=1-SSE/SST<=1; and the smaller the sum of squares of SSE errors, the closer the goodness of fit R 2 is to 1. The smaller the error, the better the fit.
  3. Note: The goodness of fit R 2 can only be used when the fitting function is a linear function. If the fitting function is other functions, just look at the sum of squared errors. The smaller the SSE, the better the fit.
  4. A linear function refers to a function in which the parameters only appear to the power of one, and cannot be multiplied or divided by any other parameters, and the parameters cannot appear in the form of a composite function. This parameter does not refer to the independent variable x. For example, y=kx+b, this parameter refers to parameters k and b that are different from the independent variable x and dependent variable y.

image-20230811221145242

Code to calculate goodness of fit

plot(x,y,'o');
xlabel("x");
ylabel("y");
n=size(x,1);%% 数据的个数
k=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.*x)-sum(x)*sum(x));
b=(sum(x.*x)*sum(y)-sum(x)*sum(x.*y))/(n*sum(x.*x)-sum(x)*sum(x));
hold on;%% 写上这句后续可以继续在之前的图形上画图形
grid on;%% 图形显示网格线
f=@(x) k*x+b; %% f=kx+b是匿名函数,该函数图形不需要另外传参数也能形成图形
fplot(f,[2.5,7]);
legend('样本数据','拟合函数','location','southeast');
y_hat=k*x+b;
SSR=sum((y_hat-mean(y)).^2); % 回归平方和
SSE=sum((y-y_hat).^2); % 误差平方和
SST=sum((y-mean(y)).^2); % 总体平方和
disp(SST-SSE-SSR);
R_2=SSR/SST; % 拟合优度
disp(R_2);

image-20230811222406484

  • The reason why the result of SST-SSE-SSR is not 0 is that the floating point number operation in matlab is inaccurate to a certain extent, but the result is 5.6843 ^-14 . The result is very small, that is, very close to 0
    [External link picture transfer is in progress …(img-WkmLP3WM-1692188156893)]

  • The reason why the result of SST-SSE-SSR is not 0 is that the floating point number operation in matlab is inaccurate to a certain extent, but the result is 5.6843 ^-14. The result is very small, that is, very close to 0.

  • The fitting degree is 0.9635, which is very close to 1, indicating that the fitting function has a good fitting degree.

Guess you like

Origin blog.csdn.net/m0_71841506/article/details/132327028