[Mathematical knowledge] Least squares method, starting from linear regression, giving numerical examples and using least squares method to solve the regression model

serial number	content
1	[Mathematical knowledge] Degree of freedom and calculation method of degrees of freedom
2	[Mathematical knowledge] Rigid body rigid body and the motion of rigid body
3	[Mathematical knowledge] Basic motion of rigid bodies, translation and rotation
4	[Mathematical knowledge] Vector multiplication, inner product, outer product, matlab code implementation
5	[Mathematical knowledge] Covariance, the covariance of random variables, the covariance when the random variables are single numbers and vectors respectively
6	[Mathematical knowledge] The derivation process of the rotation matrix is based on the rotation of the vector, while solving the nonlinear limitations of the Euclidean transformation.

I checked my notes and found that the least squares method was mentioned before when using the SG filter. Here is a link to the article for your convenience: [UWB] Savitzky Golay filter Explanation of the principle of the SG filter .

Article directory

1. Linear regression
- 1. Simple linear regression model
- 2. Multiple linear regression model
2. From linear regression to least squares method, giving numerical examples and using least squares method to solve the regression model
- 1. Single explanatory variable, single response variable
- 2. Multiple explanatory variables, single response variable
Ref

The Least Squared Method is a mathematical optimization technique used to minimize the difference between predicted values and true values (usually expressed as the sum of squared residuals).

The core idea of the least squares method is to find the best parameter values of the model by minimizing the sum of squares of the difference between the predicted value and the true value.

Because the least squares method is an optimization technique, simply discussing the least squares method is boring. Therefore, applying the least squares method to linear regression in regression analysis can help solve the optimal parameters of the regression model. This way we can better understand the idea of the least squares method.

The original purpose of regression analysis is to estimate the parameters of the model in order to achieve the best fit to the data. In many cases, especially in linear regression, the least squares method can provide us with a closed, analytical solution to find the optimal parameters. Therefore, the least squares method is often used to fit straight lines and polynomials to a set of data points to predict or explain the behavior of variables.

Next, linear regression will be explained first, and two simple linear regression models will be explained. Then, specific examples are given to solve the optimal model parameters through the idea of least squares method, thereby achieving an understanding of using the least squares method to solve model parameters.

1. Linear regression

When discussing Linear Regression, let’s discuss it together with Regression Analysis.

Linear regression and regression analysis are both commonly used methods in statistics, but there are some key differences between them:

By definition,

Linear Regression: Linear regression is a prediction method that attempts to find a linear function (for simple linear regression) or multiple linear functions (for multiple linear regression) that best describes the relationship between two (or more) variables. relationship between.
Regression analysis: Regression analysis is a broader term used to describe the statistical method of describing the relationship between a variable (or variables) and one or more other variables. Linear regression is just one form of regression analysis.

In terms of type,

Linear Regression: Mainly focuses on linear relationships.
Regression analysis: can include linear regression, polynomial regression, logistic regression, ridge regression and other types.

In terms of purpose,

Linear regression: Predicts or explains the relationship between a response variable and one or more predictor variables.
Regression analysis: Exploring and modeling relationships between variables, which may be linear, nonlinear, or other relationships.

In terms of application,

Linear Regression: Linear regression is usually used when we believe that there is a linear relationship between variables.
Regression analysis: can be applied to various relationships, including but not limited to linear relationships.

Simply put, linear regression is a subset of regression analysis. Regression analysis includes a variety of methods for modeling and explaining relationships between variables, while linear regression focuses specifically on linear relationships.

Give a random sample ( $x_1, x_2, \cdots$ ), a linear regression model assumes that
the response variable (often called the dependent variable or target) $y$ and
explanatory variables (often called independent variables or features) $x_1, x_2, \cdots$
In addition to being affected by explanatory variables, the relationship between $\dots is also affected by other variables.$ We add an error term $\epsilon$ (also a random variable) to capture except $x_1, x_2, \cdots$ Any pair of $other than \dots$ $y$ influence.

However, in the analysis of this article, we mainly want to learn the regression model, so we simplify the error term $\epsilon$ .

At the same time, according to the number of explanatory variables, linear regression models are also divided into simple linear regression and multiple linear regression.

1. Simple linear regression model

describes a response variable $y$ and an explanatory variablerelationship between $x .$

Assume that this relationship is linear, that is, a linear function

$\begin{aligned} y &= \beta _0 + \beta_1 x \end{aligned}$

Among them $y$ is the response variable, $x$ is the explanatory variable, $\beta_0$ and $\beta_1$ is the regression coefficient.

2. Multiple linear regression model

describes a response variable $y$ and two or more explanatory variables $x_1, x_2, \cdots$ The relationship between $\dots .$

$\begin{aligned} y &= \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \end{aligned}$

2. From linear regression to least squares method, giving numerical examples and using least squares method to solve the regression model

The core idea of the least squares method is to find model parameters that minimize the sum of squares of prediction errors.

Specifically, consider a simple linear model:

$\begin{aligned} y &= \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \end{aligned}$

Our goal is to find a set of $\beta$ value such that the predicted value $y^ = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β pxp \hat{y} = \beta _0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$ and actual value The difference between $y is minimal.$ This difference can be measured by the residual sum of squares (RSS):

$\text{RSS} = \sum_{i=1}^{n} (y - \hat{y}_i)^2$

The goal of the least squares method is to minimize RSS. $\beta$ that minimizes RSS $β$ value to achieve.

The intuitive idea behind the least squares method is to try to find a model that makes the predicted value as close as possible to the observed value. This method is based on the squared error loss.

Next, we will observe the role of the least squares method by giving specific numerical examples.

1. Single explanatory variable, single response variable

This kind of regression model with a single explanatory variable and a single response variable can be expressed by the formula as

$\beta_0 + \beta_1 x$

Assume a simple linear regression model $\beta_0 + \beta_1 x$ , four data points ( $x, y$ ）：（ $1, 6$ ）（ $2, 5$ ）（ $3, 7$ ）（ $4, 10$ ), find the best matching parameters $\beta_0, \beta_1$ 。

Substituting the data points into this linear regression model respectively,
$\begin{aligned} \ beta_0 + \beta_1 1 &= 6 \\ \beta_0 + \beta_1 2 &= 5 \\ \beta_0 + \beta_1 3 &= 7 \\ \beta_0 + \beta_1 4 &= 10 \end{aligned}$

The method used by the least squares method is to try to minimize the square of the difference on both sides of the equal sign, that is, to find the minimum value of the following function:
$\begin{aligned} S(\beta_0 , \beta_1) &= [6 - (\beta_0 + \beta_1 1)]^2 \\ &+ [5 - (\beta_0 + \beta_1 2)]^2 \\ &+ [7 - (\beta_0 + \ beta_1 3)]^2 \\ &+ [10 - (\beta_0 + \beta_1 4)]^2 \end{aligned}$

% 初始化符号变量
syms beta_0 beta_1 real

% 定义你的公式
f = (6 - (beta_0 + beta_1 * 1))^2 ... 
  + (5 - (beta_0 + beta_1 * 2))^2 ... 
  + (7 - (beta_0 + beta_1 * 3))^2 ... 
  + (10 - (beta_0 + beta_1 * 4))^2;

% 展开公式
f_expanded = expand(f);

% 你可以尝试进一步简化它，但是否能简化到所需的形式是不确定的
f_simplified = simplify(f_expanded);

% 输出结果
disp(f_expanded);
disp(f_simplified);

>>
4*beta_0^2 + 20*beta_0*beta_1 - 56*beta_0 + 30*beta_1^2 - 154*beta_1 + 210
 
4*beta_0^2 + 20*beta_0*beta_1 - 56*beta_0 + 30*beta_1^2 - 154*beta_1 + 210

The minimum value can be obtained by pairing $S(\beta_0, \beta_1)$ respectively find $\beta_0$ and $\beta_1$ The partial derivative of , and setting it equal to zero is obtained.

$\begin {aligned} \frac{\partial S(\beta_0, \beta_1)}{\partial \beta_0} &= 8 \beta_0 + 20 \beta_1 - 56 = 0 \\ \frac{\partial S(\beta_0, \beta_1 )}{\partial \beta_1} &= 20 \beta_0 + 60 \beta_1 - 154 = 0 \end{aligned}$

% 对 b0 求偏导数
df_dbeta_0 = diff(f, beta_0);

% 对 b1 求偏导数
df_dbeta_1 = diff(f, beta_1);

disp(df_dbeta_0)

disp(df_dbeta_1)

>> 
8*beta_0 + 20*beta_1 - 56
20*beta_0 + 60*beta_1 - 154

By solving the above linear equation of two variables, we can get

$\begin{aligned} \beta_0 &= 3.5 \\ \beta_1 &= 1.4 \end{aligned}$

% 设置偏导数等于零并求解
solutions = solve([df_dbeta_0 == 0, df_dbeta_1 == 0], [beta_0, beta_1]);

% 输出解
disp(solutions.beta_0);

disp(solutions.beta_1);

>>
7/2
 
7/5

Therefore the optimal regression model is $y = 3.5 + 1.4x$ 。 $_$

x = [1  2  3  4]';
y = [6  5  7  10]';

plot_x = linspace(1,4,50);
plot_y = 3.5 + 1.4 * plot_x;

scatter(x(1), y(1)); hold on;
scatter(x(2), y(2));
scatter(x(3), y(3));
scatter(x(4), y(4));
plot(plot_x, plot_y);
xlabel("$x$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

Insert image description here

2. Multiple explanatory variables, single response variable

This multi-explanatory variable single response variable regression model can be expressed by the formula as

$\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$

Also based on the above regression model, we take the number of explanatory variables here as $p = 2$ ，Naya $\beta_0 + \beta_1 x_1 + \beta_2 x_2$ , four data points were obtained in the experiment ( $x_1, x_2, y$ ）：（ $1, 3, 6$ ）（ $2, 4, 5$ ）（ $3, 5, 7$ ）（ $4, 8, 10$ ), find the best matching parameters $\beta_0, \beta_1, \beta_2$ 。

Substituting the data points into this linear regression model respectively,
$\begin{aligned} \beta_0 + \beta_1 1 + \beta_2 3 &= 6 \\ \beta_0 + \beta_1 2 + \beta_2 4&= 5 \\ \beta_0 + \beta_1 3 + \beta_2 5&= 7 \\ \beta_0 + \beta_1 4 + \beta_2 8&= 10 \end{aligned}$

The function to get the difference is

$\begin{aligned} S(\beta_0, \beta_1, \beta_2) & = [6 - (\beta_0 + \beta_1 1 + \beta_2 3)]^2 \\ &+ [5 - (\beta_0 + \beta_1 2 + \beta_2 4)]^2 \\ &+ [7 - (\ beta_0 + \beta_1 3 + \beta_2 5)]^2 \\ &+ [10 - (\beta_0 + \beta_1 4 + \beta_2 8)]^2 \end{aligned}$

syms beta_0 beta_1 beta_2 real

f = (6 - (beta_0 + beta_1 * 1 + beta_2 * 3))^2 ...
  + (5 - (beta_0 + beta_1 * 2 + beta_2 * 4))^2 ...
  + (7 - (beta_0 + beta_1 * 3 + beta_2 * 5))^2 ... 
  + (10 - (beta_0 + beta_1 * 4 + beta_2 * 8))^2;

f_expanded = expand(f);

disp(f_expanded);

>> 
4*beta_0^2 + 20*beta_0*beta_1 + 40*beta_0*beta_2 - 56*beta_0 + 30*beta_1^2 + 116*beta_1*beta_2 - 154*beta_1 + 114*beta_2^2 - 306*beta_2 + 210

生求偏密数，今令其 is equal to 零
$\begin{aligned} \frac{\partial S(\beta_0, \beta_1, \beta_2)}{\partial \beta_0} &= 8 \beta_0 + 20 \beta_1 + 40 \beta_2 - 56 = 0 \\ \ frac{\partial S(\beta_0, \beta_1, \beta_2)}{\partial \beta_1} &= 20 \beta_0 + 60 \beta_1 + 116 \beta_2 - 154 = 0 \\ \frac{\partial S(\beta_0 , \beta_1, \beta_2)}{\partial \beta_2} &= 40 \beta_0 + 116 \beta_1 + 228 \beta_2 - 306 = 0 \end{aligned}$

df_dbeta_0 = diff(f, beta_0);
df_dbeta_1 = diff(f, beta_1);
df_dbeta_2 = diff(f, beta_2);

disp(df_dbeta_0)
disp(df_dbeta_1)
disp(df_dbeta_2)

>> 
8*beta_0 + 20*beta_1 + 40*beta_2 - 56
20*beta_0 + 60*beta_1 + 116*beta_2 - 154
40*beta_0 + 116*beta_1 + 228*beta_2 - 306

Solving the above three-dimensional linear equation we can get

$\begin{aligned} \beta_0 &= 2 \\ \beta_1 &= -1 \\ \beta_2 &= 1.5 \end{aligned}$

solutions = solve([df_dbeta_0 == 0, df_dbeta_1 == 0, df_dbeta_2 == 0], [beta_0, beta_1, beta_2]);

disp(solutions.beta_0);
disp(solutions.beta_1);
disp(solutions.beta_2);

>>
2
-1
3/2

Therefore the optimal regression model is $y = 2 - x_1 + 1.5 x_2$ 。

x_1 = [1  2  3  4]';
x_2 = [3  4  5  8]';
y = [6  5  7  10]';

plot_x_1 = linspace(1,4,50);
plot_x_2 = linspace(3,8,50);
plot_y = 2 - 1 * plot_x_1 + 1.5 * plot_x_2;

figure()
subplot(2,2,1)
scatter3(x_1(1), x_2(1), y(1)); hold on;
scatter3(x_1(2), x_2(2), y(2));
scatter3(x_1(3), x_2(3), y(3));
scatter3(x_1(4), x_2(4), y(4));
plot3(plot_x_1, plot_x_2, plot_y);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$x_2$", "Interpreter","latex", "FontSize",16);
zlabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,2)
scatter(x_1(1), y(1)); hold on;
scatter(x_1(2), y(2));
scatter(x_1(3), y(3));
scatter(x_1(4), y(4));
plot(plot_x_1, plot_y);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,3)
scatter(x_2(1), y(1)); hold on;
scatter(x_2(2), y(2));
scatter(x_2(3), y(3));
scatter(x_2(4), y(4));
plot(plot_x_2, plot_y);
xlabel("$x_2$", "Interpreter","latex", "FontSize",16);
ylabel("$y$", "Interpreter","latex", "FontSize",16);
grid on;

subplot(2,2,4)
scatter(x_1(1), x_2(1)); hold on;
scatter(x_1(2), x_2(2));
scatter(x_1(3), x_2(3));
scatter(x_1(4), x_2(4));
plot(plot_x_1, plot_x_2);
xlabel("$x_1$", "Interpreter","latex", "FontSize",16);
ylabel("$x_2$", "Interpreter","latex", "FontSize",16);
grid on;

Insert image description here