Article directory
Original Link: Address
Personal Notes:
Least Squares, Weighted Least Squares, Iterative Reweighted Least Squares. Combined with the purpose of realizing the function, the following mainly gives the derivation results, code implementation and some practical applications. At the end of the derivation process, some articles and materials for personal reference will be placed.
Here is a video derivation process recommended: use the matrix to find the partial derivation to get xxx= ( A T A ) − 1 A T B (A^TA)^{-1}A^TB (AT A)−1AT BMatrix Multiplication Derivation Video
One: least square method (OLS)
1 Overview
The method of least squares (also known as the method of least squares) is a mathematical optimization technique. It finds the best function fit to the data by minimizing the sum of squared errors. The unknown data can be easily obtained by using the least square method, and the sum of squares of the errors between the obtained data and the actual data can be minimized. Here is an example, for example, the objective function y = a 0 + a 1 x + a 2 x 2 y=a_0+a_1x+a_2x^2
has been determinedy=a0+a1x+a2x2 ,x and yx and yx and y are determined actual values,xxx is the independent variable,yyy is the dependent variable, needa 0 , a 1 , a 2 a_0,a_1,a_2a0,a1,a2Three unknown parameters. At this time, three equations are generally required to form a system of equations to solve the three unknown parameters to determine the only solution. In practice, we usually find three unknown parameters in an overdetermined equation system (the number of equations is greater than the unknown parameters). At this time, we need to use the least squares method to solve this problem and find the optimal solution. The algebraic and matrix solutions are given below. It is recommended to use matrix solution (very convenient).
2: Algebraic formula
The idea of the least squares method is to minimize the sum of the squares of the distances between the theoretical value and the predicted value.
Example: The most basic and commonly used in curve fitting is straight line fitting. set xxx和yyThe functional relationship between y is: unary linear functiony = f ( a 0 , a 1 ) = a 0 + a 1 xy = f(a_0,a_1) = a_0+a_1xy=f(a0,a1)=a0+a1x algebraic derivation:
for a 0 and a 1 a_0 and a_1 respectivelya0and a1Find the partial derivative, here a 0 and a 1 a_0 and a_1a0and a1is an unknown parameter
Organize into a system of equations
Then simplify to:
3: Matrix (recommended)
Example: The most basic and commonly used in curve fitting is straight line fitting. set xxx和yyThe functional relationship between y is: unary linear functiony = f ( a 0 , a 1 ) = a 0 + a 1 xy = f(a_0,a_1) = a_0+a_1xy=f(a0,a1)=a0+a1x is expressed using a matrix:A x = B Ax = BAx=B , seekxxx vector parameter
I will put a link at the end of the derivation process. So:
If it is a one-variable polynomial function:
where m represents the order of the polynomial, the sum of the squares of the discrete point and the polynomial is F ( a 0 , a 1 , . . . , am ) F(a_0,a_1,..., a_m )F(a0,a1,...,am) . wherenn represents the number of sampling points:
The univariate polynomial matrix expression is the same as the univariate linear term matrix expression: A x = B Ax = BAx=B
3.1: Implementation code
/* 普通最小二乘 Ax = B
* (A^T * A) * x = A^T * B
* x = (A^T * A)^-1 * A^T * B
*/
Array<double,Dynamic,1> GlobleFunction::leastSquares(Matrix<double,Dynamic,Dynamic> A, Matrix<double,Dynamic,1> B)
{
//获取矩阵的行数和列数
int rows = A.rows();
int col = A.cols();
//A的转置矩阵
Matrix<double,Dynamic,Dynamic> AT;
AT.resize(col,rows);
//x矩阵
Array<double,Dynamic,1> x;
x.resize(col,1);
//转置 AT
AT = A.transpose();
//x = (A^T * A)^-1 * A^T * B
x = ((AT * A).inverse()) * (AT * B);
return x;
}
Matrix is a matrix class in the Eigen library , and the Eigen library is introduced here to facilitate algebraic operations.
Two: Weighted Least Squares (WLS)
The weighted least squares method is a mathematical optimization technique that weights the original model to make it a new model without heteroskedasticity, and then uses the ordinary least squares method to estimate its parameters. Baidu Encyclopedia
1: Increase the diagonal matrix W
Add a diagonal matrix WW on the basis of the least square methodW , giving different weights to each set of data.
W T ∗ W W^T * W WT∗The square of each data in the W diagonal matrix eliminates negative numbers.
1.1: Implementation code
/* 加权最小二乘(WLS) W为对角线矩阵
* W²(Ax - B) = 0
* W²Ax = W²B
* (A^T * W^T * W * A) * x = A^T * W^T * W * B
* x = (A^T * W^T * W * A)^-1 * A^T * W^T * W * B
*/
Array<double,Dynamic,1> GlobleFunction::reweightedLeastSquares(Matrix<double,Dynamic,Dynamic> A, Matrix<double,Dynamic,1> B,Array<double,Dynamic,1> vectorW)
{
//获取矩阵的行数和列数
int rows = A.rows();
int col = A.cols();
//vectorW为空,默认构建对角线矩阵1
if(vectorW.isZero())
{
vectorW.resize(rows,1);
for(int i=0;i<rows;++i)
{
vectorW(i,0) = 1;
}
}
//A的转置矩阵
Matrix<double,Dynamic,Dynamic> AT;
AT.resize(col,rows);
//x矩阵
Array<double,Dynamic,1> x;
x.resize(col,1);
//W的转置矩阵
Matrix<double,Dynamic,Dynamic> WT,W;
W.resize(rows,rows);
WT.resize(rows,rows);
//生成对角线矩阵
W = vectorW.matrix().asDiagonal();
//转置
WT = W.transpose();
//转置 AT
AT = A.transpose();
// x = (A^T * W^T * W * A)^-1 * A^T * W^T * W * B
x = ((AT * WT * W * A).inverse()) * (AT * WT * W * B);
return x;
}
Matrix is a matrix class in the Eigen library , and the Eigen library is introduced here to facilitate algebraic operations.
Three: Iterative Reweighted Least Squares (IRLS)
1: The method of iterative reweighted least squares (also called iterative weighted least squares) (IRLS) is used to solve problems with ppCertain optimization problems for objective functions of the p- norm form:Wikipedia. Iterative weighting can fit the objective function and known data, but at this time some data are far away from the overall objective function, and when participating in the least square method, it will have a great impact on the estimated parameters. At this time It is necessary to optimize the parameters, give less weight to the farther data (don't make it appear very important, so the impact is relatively small), and give greater weight to the closer data (great influence). Iterative weighted least squares is to establish a weighted least squares to perform an iteration to estimate the optimal value.
By an iterative approach, where each step involves solving a weighted least squares problem of the form:
2: The following is an iterative method in a paper to solve the paper address: Burrus, CS (2014). Iterative Reweighted Least Squares ∗.
MATLAB code 1:
% m-file IRLS0.m to find the optimal solution to Ax=b
% minimizing the L_p norm ||Ax-b||_p, using basic IRLS.
% csb 11/10/2012
function x = IRLS0(A,b,p,KK)
if nargin < 4, KK=10; end;
x = pinv(A)*b; % Initial L_2 solution
E = [];
for k = 1:KK % Iterate
e = A*x - b; % Error vector
w = abs(e).^((p-2)/2); % Error weights for IRLS
W = diag(w/sum(w)); % Normalize weight matrix
WA = W*A; % apply weights
x = (WA'*WA)\(WA'*W)*b; % weighted L_2 sol.
ee = norm(e,p); E = [E ee]; % Error at each iteration
end
plot(E)
MATLAB code 2:
% m-file IRLS1.m to find the optimal solution to Ax=b
% minimizing the L_p norm ||Ax-b||_p, using IRLS.
% Newton iterative update of solution, x, for M > N.
% For 2<p<infty, use homotopy parameter K = 1.01 to 2
% For 0<p<2, use K = approx 0.7 - 0.9
% csb 10/20/2012
function x = IRLS1(A,b,p,K,KK)
if nargin < 5, KK=10; end;
if nargin < 4, K = 1.5; end;
if nargin < 3, p = 10; end;
pk = 2; % Initial homotopy value
x = pinv(A)*b; % Initial L_2 solution
E = [];
for k = 1:KK % Iterate
if p >= 2, pk = min([p, K*pk]); % Homotopy change of p
else pk = max([p, K*pk]); end
e = A*x - b; % Error vector
w = abs(e).^((pk-2)/2); % Error weights for IRLS
W = diag(w/sum(w)); % Normalize weight matrix
WA = W*A; % apply weights
x1 = (WA'*WA)\(WA'*W)*b; % weighted L_2 sol.
q = 1/(pk-1); % Newton's parameter
if p > 2, x = q*x1 + (1-q)*x; nn=p; % partial update for p>2
else x = x1; nn=2; end % no partial update for p<2
ee = norm(e,nn); E = [E ee]; % Error at each iteration
end
plot(E)
C++ code:
/* 迭代重加权最小二乘(IRLS) W为权重,p为范数
* e = Ax - B
* W = e^(p−2)/2
* W²(Ax - B) = 0
* W²Ax = W²B
* (A^T * W^T * W * A) * x = A^T * W^T * W * B
* x = (A^T * W^T * W * A)^-1 * A^T * W^T * W * B
* 参考论文地址:https://www.semanticscholar.org/paper/Iterative-Reweighted-Least-Squares-%E2%88%97-Burrus/9b9218e7233f4d0b491e1582c893c9a099470a73
*/
Array<double,Dynamic,1> GlobleFunction::iterativeReweightedLeastSquares(Matrix<double,Dynamic,Dynamic> A, Matrix<double,Dynamic,1> B,double p,int kk)
{
/* x(k) = q x1(k) + (1-q)x(k-1)
* q = 1 / (p-1)
*/
//获取矩阵的行数和列数
int rows = A.rows();
int col = A.cols();
double pk = 2;//初始同伦值
double K = 1.5;
double epsilong = 10e-9; // ε
double delta = 10e-15; // δ
Array<double,Dynamic,1> x,_x,x1,e,w;
x.resize(col,1);
_x.resize(col,1);
x1.resize(col,1);
e.resize(rows,1);
w.resize(rows,1);
//初始x 对角矩阵w=1
x = reweightedLeastSquares(A,B);
//迭代 最大迭代次数kk
for(int i=0;i<kk;++i)
{
//保留前一个x值,用作最后比较确定收敛
_x = x;
if(p>=2)
{
pk = qMin(p,K*pk);
}
else
{
pk = qMax(p,K*pk);
}
//偏差
e = (A * x.matrix()) - B;
//偏差的绝对值// 求矩阵绝对值 :e = e.cwiseAbs(); 或 e.array().abs().matrix()
e = e.abs();
//对每个偏差值小于delta,用delta赋值给它
for(int i=0;i<e.rows();++i)
{
e(i,0) = qMax(delta,e(i,0));
}
//对每个偏差值进行幂操作
w = e.pow(p/2.0-1);
w = w / w.sum();
x1 = reweightedLeastSquares(A,B,w);
double q = 1 / (pk-1);
if(p>2)
{
x = x1*q + x*(1-q);
}
else
{
x = x1;
}
//达到精度,结束
if((x-_x).abs().sum()<epsilong)
{
return x;
}
}
return x;
}
The C++ implementation code is basically the same as MATLAB, but with a slight improvement, which refers to Wikipedia and Burrus, CS (2014). Iterative Reweighted Least Squares ∗.
Four: Application
The following is solved for overdetermined equations. The number of data is greater than the unknown parameter.
1: Fitting circle (algorithm: iteratively reweighted least squares)
1: Using the least square effect, it can be seen that the external noise interference is still relatively large. The following optimizes using iterative reweighted least squares algorithm.
2: Iterative reweighted least squares
1st iteration
2nd iteration 3rd
iteration
4th iteration
...
20th iteration
2: Straight line fitting (algorithm: iteratively reweighted least squares)
1: y = a 0 + a 1 x y = a_0 + a_1x y=a0+a1The picture below x
uses the least squares method. It can be seen that the data noise far below has a great impact on the overall dense data.
1.2: Use iterative reweighted least squares
1st iteration
...
100th iteration
throughnnAfter n iterations, the noise basically has no effect on the whole, and the parameters obtained at this time are ideal.
3: Curve fitting (algorithm: least squares)
1: Least squares curve fitting, looking for the best subterm function
The least squares method is a common method for curve fitting, and using this method is very important for the selection of matching functions. . The so-called matching function is the route that the function passes through to achieve a best match at the point in the graph.
Otherwise, overfitting and underfitting will occur.
Linear function fitting:
Quadratic function fitting:
Fitting a cubic function:
Fitting a quartic function:
Fitting a quintic function:
Hexagram fit:
Fitting a heptonic function:
Fitting an octagonal function:
Nine function fitting:
It can be found that the function fits very well at the fifth function, and it becomes more and more overfitting as it goes on.
4: N-point calibration (including 9-point calibration) (algorithm: least squares)
The 9-point calibration is to find the relationship between the pixel coordinates and the world coordinates in the vision.
It can be seen that the halcon operator block vector_to_hom_mat2d uses the least square method to calculate the matrix. The external algorithm [2] in the figure is actually realized by the least squares algorithm in this article. The internal algorithm [1] is realized by calculating partial derivatives. In this article, N-point calibration is implemented.
Five: Summary
1: Tools: Main Qt + Eigen library + QCustomPlot class
Eigen library is used for matrix calculation, algebraic calculation library
QCustomPlot class is used for drawing and data visualization
2: The complete code above has been uploaded to GitHub
3: Reference
Least Squares Algebra Derivation
Least Squares Matrix Derivation
Least Squares?
The principle understanding of the absolute value regularization
robust learning algorithm least square method for Shenma is not bad
Understanding the least squares method from the perspective of maximum likelihood