[Mathematical knowledge] Least squares method, starting from linear regression, giving numerical examples and using least squares method to solve the regression model

serial number content
1 [Mathematical knowledge] Degree of freedom and calculation method of degrees of freedom
2 [Mathematical knowledge] Rigid body rigid body and the motion of rigid body
3 [Mathematical knowledge] Basic motion of rigid bodies, translation and rotation
4 [Mathematical knowledge] Vector multiplication, inner product, outer product, matlab code implementation
5 [Mathematical knowledge] Covariance, the covariance of random variables, the covariance when the random variables are single numbers and vectors respectively
6 [Mathematical knowledge] The derivation process of the rotation matrix is ​​based on the rotation of the vector, while solving the nonlinear limitations of the Euclidean transformation.

I checked my notes and found that the least squares method was mentioned before when using the SG filter. Here is a link to the article for your convenience: [UWB] Savitzky Golay filter Explanation of the principle of the SG filter .

The Least Squared Method is a mathematical optimization technique used to minimize the difference between predicted values ​​and true values ​​(usually expressed as the sum of squared residuals).

The core idea of ​​the least squares method is to find the best parameter values ​​of the model by minimizing the sum of squares of the difference between the predicted value and the true value.

Because the least squares method is an optimization technique, simply discussing the least squares method is boring. Therefore, applying the least squares method to linear regression in regression analysis can help solve the optimal parameters of the regression model. This way we can better understand the idea of ​​the least squares method.

The original purpose of regression analysis is to estimate the parameters of the model in order to achieve the best fit to the data. In many cases, especially in linear regression, the least squares method can provide us with a closed, analytical solution to find the optimal parameters. Therefore, the least squares method is often used to fit straight lines and polynomials to a set of data points to predict or explain the behavior of variables.


Next, linear regression will be explained first, and two simple linear regression models will be explained. Then, specific examples are given to solve the optimal model parameters through the idea of ​​least squares method, thereby achieving an understanding of using the least squares method to solve model parameters.


1. Linear regression

When discussing Linear Regression, let’s discuss it together with Regression Analysis.

Linear regression and regression analysis are both commonly used methods in statistics, but there are some key differences between them:

By definition,

  • Linear Regression: Linear regression is a prediction method that attempts to find a linear function (for simple linear regression) or multiple linear functions (for multiple linear regression) that best describes the relationship between two (or more) variables. relationship between.
  • Regression analysis: Regression analysis is a broader term used to describe the statistical method of describing the relationship between a variable (or variables) and one or more other variables. Linear regression is just one form of regression analysis.

In terms of type,

  • Linear Regression: Mainly focuses on linear relationships.
  • Regression analysis: can include linear regression, polynomial regression, logistic regression, ridge regression and other types.

In terms of purpose,

  • Linear regression: Predicts or explains the relationship between a response variable and one or more predictor variables.
  • Regression analysis: Exploring and modeling relationships between variables, which may be linear, nonlinear, or other relationships.

In terms of application,

  • Linear Regression: Linear regression is usually used when we believe that there is a linear relationship between variables.
  • Regression analysis: can be applied to various relationships, including but not limited to linear relationships.

Simply put, linear regression is a subset of regression analysis. Regression analysis includes a variety of methods for modeling and explaining relationships between variables, while linear regression focuses specifically on linear relationships.


Give a random sample ( y, x 1, x 2, ⋯ y, x_1, x_2, \cdotsy,x1,x2, ), a linear regression model assumes that
the response variable (often called the dependent variable or target)yyy and
explanatory variables (often called independent variables or features)x 1 , x 2 , ⋯ x_1, x_2, \cdotsx1,x2,
In addition to being affected by explanatory variables, the relationship between ⋯ is also affected by other variables. We add an error termϵ \epsilonϵ (also a random variable) to capture exceptx 1 , x 2 , ⋯ x_1, x_2, \cdotsx1,x2,Any pair ofyy other than ⋯y influence.


However, in the analysis of this article, we mainly want to learn the regression model, so we simplify the error term ϵ \epsilonThe influence of ϵ .

At the same time, according to the number of explanatory variables, linear regression models are also divided into simple linear regression and multiple linear regression.


1. Simple linear regression model

describes a response variable yyy and an explanatory variablexxrelationship between x .

Assume that this relationship is linear, that is, a linear function

y = β 0 + β 1 x \begin{aligned} y &= \beta _0 + \beta_1 x \end{aligned} y=b0+b1x

Among them yyy is the response variable,xxx is the explanatory variable,β 0 \beta_0b0and β 1 \beta_1b1is the regression coefficient.


2. Multiple linear regression model

describes a response variable yyy and two or more explanatory variablesx 1 , x 2 , ⋯ x_1, x_2, \cdotsx1,x2,The relationship between ⋯ .

y = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β p x p \begin{aligned} y &= \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \end{aligned} y=b0+b1x1+b2x2++bpxp


2. From linear regression to least squares method, giving numerical examples and using least squares method to solve the regression model

The core idea of ​​the least squares method is to find model parameters that minimize the sum of squares of prediction errors.

Specifically, consider a simple linear model:

y = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β p x p \begin{aligned} y &= \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \end{aligned}y=b0+b1x1+b2x2++bpxp

Our goal is to find a set of β \betaβ value such that the predicted valuey^ = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β pxp \hat{y} = \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_py^=b0+b1x1+b2x2++bpxpand actual value yyThe difference between y is minimal. This difference can be measured by the residual sum of squares (RSS):

RSS = ∑ i = 1 n ( y − y ^ i ) 2 \text{RSS} = \sum_{i=1}^{n} (y - \hat{y}_i)^2 RSS=i=1n(yy^i)2

The goal of the least squares method is to minimize RSS. Specifically, by finding β \beta that minimizes RSSβ value to achieve.

The intuitive idea behind the least squares method is to try to find a model that makes the predicted value as close as possible to the observed value. This method is based on the squared error loss.

Next, we will observe the role of the least squares method by giving specific numerical examples.


1. Single explanatory variable, single response variable

This kind of regression model with a single explanatory variable and a single response variable can be expressed by the formula as

y = β 0 + β 1 xy = \beta_0 + \beta_1 xy=b0+b1x


Assume a simple linear regression model y = β 0 + β 1 xy = \beta_0 + \beta_1 xy=b0+b1x , four data points (x, yx,yx,y):( 1 , 6 1,6 1,6)( 2 , 5 2,5 2,5)( 3 , 7 3,7 3,7)( 4 , 10 4,10 4,10 ), find the best matching parametersβ 0 , β 1 \beta_0, \beta_1b0,b1

Substituting the data points into this linear regression model respectively,
β 0 + β 1 1 = 6 β 0 + β 1 2 = 5 β 0 + β 1 3 = 7 β 0 + β 1 4 = 10 \begin{aligned} \ beta_0 + \beta_1 1 &= 6 \\ \beta_0 + \beta_1 2 &= 5 \\ \beta_0 + \beta_1 3 &= 7 \\ \beta_0 + \beta_1 4 &= 10 \end{aligned}b0+b11b0+b12b0+b13b0+b14=6=5=7=10

The method used by the least squares method is to try to minimize the square of the difference on both sides of the equal sign, that is, to find the minimum value of the following function:
S ( β 0 , β 1 ) = [ 6 − ( β 0 + β 1 1 ) ] 2 + [ 5 − ( β 0 + β 1 2 ) ] 2 + [ 7 − ( β 0 + β 1 3 ) ] 2 + [ 10 − ( β 0 + β 1 4 ) ] 2 \begin{aligned} S(\beta_0 , \beta_1) &= [6 - (\beta_0 + \beta_1 1)]^2 \\ &+ [5 - (\beta_0 + \beta_1 2)]^2 \\ &+ [7 - (\beta_0 + \ beta_1 3)]^2 \\ &+ [10 - (\beta_0 + \beta_1 4)]^2 \end{aligned}S ( b0,b1)=[6( b0+b11)]2+[5( b0+b12)]2+[7( b0+b13)]2+[10( b0+b14)]2

% 初始化符号变量
syms beta_0 beta_1 real

% 定义你的公式
f = (6 - (beta_0 + beta_1 * 1))^2 ... 
  + (5 - (beta_0 + beta_1 * 2))^2 ... 
  + (7 - (beta_0 + beta_1 * 3))^2 ... 
  + (10 - (beta_0 + beta_1 * 4))^2;

% 展开公式
f_expanded = expand(f);

% 你可以尝试进一步简化它,但是否能简化到所需的形式是不确定的
f_simplified = simplify(f_expanded);

% 输出结果
disp(f_expanded);
disp(f_simplified);
>>
4*beta_0^2 + 20*beta_0*beta_1 - 56*beta_0 + 30*beta_1^2 - 154*beta_1 + 210
 
4*beta_0^2 + 20*beta_0*beta_1 - 56*beta_0 + 30*beta_1^2 - 154*beta_1 + 210

The minimum value can be obtained by pairing S ( β 0 , β 1 ) S(\beta_0, \beta_1)S ( b0,b1) respectively findβ 0 \beta_0b0and β 1 \beta_1b1The partial derivative of , and setting it equal to zero is obtained.

∂ S ( β 0 , β 1 ) ∂ β 0 = 8 β 0 + 20 β 1 − 56 = 0 ∂ S ( β 0 , β 1 ) ∂ β 1 = 20 β 0 + 60 β 1 − 154 = 0 \begin {aligned} \frac{\partial S(\beta_0, \beta_1)}{\partial \beta_0} &= 8 \beta_0 + 20 \beta_1 - 56 = 0 \\ \frac{\partial S(\beta_0, \beta_1 )}{\partial \beta_1} &= 20 \beta_0 + 60 \beta_1 - 154 = 0 \end{aligned}β0S ( b0,b1)β1S ( b0,b1)=8 b0+20 b156=0=20 b0+60 b1154=0

% 对 b0 求偏导数
df_dbeta_0 = diff(f, beta_0);

% 对 b1 求偏导数
df_dbeta_1 = diff(f, beta_1);

disp(df_dbeta_0)

disp(df_dbeta_1)
>> 
8*beta_0 + 20*beta_1 - 56
20*beta_0 + 60*beta_1 - 154

By solving the above linear equation of two variables, we can get

β 0 = 3.5 β 1 = 1.4 \begin{aligned} \beta_0 &= 3.5 \\ \beta_1 &= 1.4 \end{aligned} b0b1=3.5=1.4

% 设置偏导数等于零并求解
solutions = solve([df_dbeta_0 == 0, df_dbeta_1 == 0], [beta_0, beta_1]);

% 输出解
disp(solutions.beta_0);

disp(solutions.beta_1);
>>
7/2
 
7/5

Therefore the optimal regression model is y = 3.5 + 1.4 x y = 3.5 + 1.4 xy=3.5+1.4x _

x = [1  2  3  4]';
y = [6  5  7  10]';

plot_x = linspace(1,4,50);
plot_y = 3.5 + 1.4 * plot_x;

scatter(x(1), y(1)); hold on;
scatter(x(2), y(2));
scatter(x(3), y(3));
scatter(x(4), y(4));
plot(plot_x, plot_y);
xlabel("$x$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

Insert image description here


2. Multiple explanatory variables, single response variable

This multi-explanatory variable single response variable regression model can be expressed by the formula as

y = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β pxpy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_py=b0+b1x1+b2x2++bpxp


Also based on the above regression model, we take the number of explanatory variables here as p = 2 p = 2p=2,Nayay = β 0 + β 1 x 1 + β 2 x 2 y = \beta_0 + \beta_1 x_1 + \beta_2 x_2y=b0+b1x1+b2x2, four data points were obtained in the experiment ( x 1, x 2, y x_1, x_2, yx1,x2,y):( 1 , 3 , 6 1,3,6 1,3,6)( 2 , 4 , 5 2,4,5 2,4,5)( 3 , 5 , 7 3,5,7 3,5,7)( 4 , 8 , 10 4,8,10 4,8,10 ), find the best matching parametersβ 0 , β 1 , β 2 \beta_0, \beta_1, \beta_2b0,b1,b2

Substituting the data points into this linear regression model respectively,
β 0 + β 1 1 + β 2 3 = 6 β 0 + β 1 2 + β 2 4 = 5 β 0 + β 1 3 + β 2 5 = 7 β 0 + β 1 4 + β 2 8 = 10 \begin{aligned} \beta_0 + \beta_1 1 + \beta_2 3 &= 6 \\ \beta_0 + \beta_1 2 + \beta_2 4&= 5 \\ \beta_0 + \beta_1 3 + \beta_2 5&= 7 \\ \beta_0 + \beta_1 4 + \beta_2 8&= 10 \end{aligned}b0+b11+b23b0+b12+b24b0+b13+b25b0+b14+b28=6=5=7=10

The function to get the difference is

S ( β 0 , β 1 , β 2 ) = [ 6 − ( β 0 + β 1 1 + β 2 3 ) ] 2 + [ 5 − ( β 0 + β 1 2 + β 2 4 ) ] 2 + [ 7 − ( β 0 + β 1 3 + β 2 5 ) ] 2 + [ 10 − ( β 0 + β 1 4 + β 2 8 ) ] 2 \begin{aligned} S(\beta_0, \beta_1, \beta_2) & = [6 - (\beta_0 + \beta_1 1 + \beta_2 3)]^2 \\ &+ [5 - (\beta_0 + \beta_1 2 + \beta_2 4)]^2 \\ &+ [7 - (\ beta_0 + \beta_1 3 + \beta_2 5)]^2 \\ &+ [10 - (\beta_0 + \beta_1 4 + \beta_2 8)]^2 \end{aligned}S ( b0,b1,b2)=[6( b0+b11+b23)]2+[5( b0+b12+b24)]2+[7( b0+b13+b25)]2+[10( b0+b14+b28)]2

syms beta_0 beta_1 beta_2 real

f = (6 - (beta_0 + beta_1 * 1 + beta_2 * 3))^2 ...
  + (5 - (beta_0 + beta_1 * 2 + beta_2 * 4))^2 ...
  + (7 - (beta_0 + beta_1 * 3 + beta_2 * 5))^2 ... 
  + (10 - (beta_0 + beta_1 * 4 + beta_2 * 8))^2;

f_expanded = expand(f);

disp(f_expanded);
>> 
4*beta_0^2 + 20*beta_0*beta_1 + 40*beta_0*beta_2 - 56*beta_0 + 30*beta_1^2 + 116*beta_1*beta_2 - 154*beta_1 + 114*beta_2^2 - 306*beta_2 + 210

生求偏密数,今令其 is equal to 零
∂ S ( β 0 , β 1 , β 2 ) ∂ β 0 = 8 β 0 + 20 β 1 + 40 β 2 − 56 = 0 ∂ S ( β 0 , β 1 , β 2 ) 2 ) ∂ β 1 = 20 β 0 + 60 β 1 + 116 β 2 − 154 = 0 ∂ S ( β 0 , β 1 , β 2 ) ∂ β 2 = 40 β 0 + 116 β 1 + 228 β 2 − 306 = 0 \begin{aligned} \frac{\partial S(\beta_0, \beta_1, \beta_2)}{\partial \beta_0} &= 8 \beta_0 + 20 \beta_1 + 40 \beta_2 - 56 = 0 \\ \ frac{\partial S(\beta_0, \beta_1, \beta_2)}{\partial \beta_1} &= 20 \beta_0 + 60 \beta_1 + 116 \beta_2 - 154 = 0 \\ \frac{\partial S(\beta_0 , \beta_1, \beta_2)}{\partial \beta_2} &= 40 \beta_0 + 116 \beta_1 + 228 \beta_2 - 306 = 0 \end{aligned}β0S ( b0,b1,b2)β1S ( b0,b1,b2)β2S ( b0,b1,b2)=8 b0+20 b1+40 b256=0=20 b0+60 b1+116 b2154=0=40 b0+116 b1+228 b2306=0

df_dbeta_0 = diff(f, beta_0);
df_dbeta_1 = diff(f, beta_1);
df_dbeta_2 = diff(f, beta_2);

disp(df_dbeta_0)
disp(df_dbeta_1)
disp(df_dbeta_2)
>> 
8*beta_0 + 20*beta_1 + 40*beta_2 - 56
20*beta_0 + 60*beta_1 + 116*beta_2 - 154
40*beta_0 + 116*beta_1 + 228*beta_2 - 306

Solving the above three-dimensional linear equation we can get

β 0 = 2 β 1 = − 1 β 2 = 1.5 \begin{aligned} \beta_0 &= 2 \\ \beta_1 &= -1 \\ \beta_2 &= 1.5 \end{aligned} b0b1b2=2=1=1.5

solutions = solve([df_dbeta_0 == 0, df_dbeta_1 == 0, df_dbeta_2 == 0], [beta_0, beta_1, beta_2]);

disp(solutions.beta_0);
disp(solutions.beta_1);
disp(solutions.beta_2);
>>
2
-1
3/2

Therefore the optimal regression model is y = 2 − x 1 + 1.5 x 2 y = 2 - x_1 + 1.5 x_2y=2x1+1.5x _2

x_1 = [1  2  3  4]';
x_2 = [3  4  5  8]';
y = [6  5  7  10]';

plot_x_1 = linspace(1,4,50);
plot_x_2 = linspace(3,8,50);
plot_y = 2 - 1 * plot_x_1 + 1.5 * plot_x_2;

figure()
subplot(2,2,1)
scatter3(x_1(1), x_2(1), y(1)); hold on;
scatter3(x_1(2), x_2(2), y(2));
scatter3(x_1(3), x_2(3), y(3));
scatter3(x_1(4), x_2(4), y(4));
plot3(plot_x_1, plot_x_2, plot_y);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$x_2$", "Interpreter","latex", "FontSize",16);
zlabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,2)
scatter(x_1(1), y(1)); hold on;
scatter(x_1(2), y(2));
scatter(x_1(3), y(3));
scatter(x_1(4), y(4));
plot(plot_x_1, plot_y);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,3)
scatter(x_2(1), y(1)); hold on;
scatter(x_2(2), y(2));
scatter(x_2(3), y(3));
scatter(x_2(4), y(4));
plot(plot_x_2, plot_y);
xlabel("$x_2$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,4)
scatter(x_1(1), x_2(1)); hold on;
scatter(x_1(2), x_2(2));
scatter(x_1(3), x_2(3));
scatter(x_1(4), x_2(4));
plot(plot_x_1, plot_x_2);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$x_2$", "Interpreter","latex", "FontSize",16);
grid on;

Insert image description here


Ref

  1. Regression analysis - WikiPedia
  2. Linear Regression - WikiPedia
  3. What is linear regression?
  4. Least squares method - WikiPedia

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/132183072