[Mathematical knowledge] Least squares method, general linear situation, matrix representation process, optimal parameter solution formula process

serial number content
1 [Mathematical knowledge] Degree of freedom and calculation method of degrees of freedom
2 [Mathematical knowledge] Rigid body rigid body and the motion of rigid body
3 [Mathematical knowledge] Basic motion of rigid bodies, translation and rotation
4 [Mathematical knowledge] Vector multiplication, inner product, outer product, matlab code implementation
5 [Mathematical knowledge] Covariance, the covariance of random variables, the covariance when the random variables are single numbers and vectors respectively
6 [Mathematical knowledge] The derivation process of the rotation matrix is ​​based on the rotation of the vector, while solving the nonlinear limitations of the Euclidean transformation.

Friendly reminder: Please read the article [Mathematical Knowledge] Least Squares Method first, starting from linear regression, giving numerical examples and using the least squares method to solve the regression model , and then read this article.

In the previous article [Mathematical Knowledge] Least Squares Method, starting from linear regression, giving numerical examples and using the least squares method to solve the regression model, starting from the perspective of regression analysis, the linear regression model was analyzed, specific numerical examples were given, and using The least squares method solves the optimal parameters of the model.

In the previous regression analysis, the example used was:

  • Simple linear regression model (single explanatory variable, single response variable)
  • Multiple linear regression model (multiple explanatory variables, single response variable)

However, they are all single response variables, and it is not convenient to carry out matrix description.

In practical applications, regression models with multiple explanatory variables and multiple response variables are more commonly used, and matrix operations are also used, which can not only save computer resources but also speed up calculation efficiency.

Therefore, this article will start from multiple explanatory variables and multiple response variables, focusing on how to matrix it, as well as the derivation operations and operations after matrixing.


1. Multiple explanatory variables and multiple response variables

On the basis of the above (single explanatory variable, single response variable) sum, it is extended to a more general linear case.

This kind of regression model with multiple explanatory variables and multiple response variables can be expressed by the formula as

y 1 = β 0 + β 1 x 11 + β 2 x 12 + ⋯ + β px 1 py 2 = β 0 + β 1 x 21 + β 2 x 22 + ⋯ + β px 3 py 3 = β 0 + β 1 x 31 + β 2 x 32 + ⋯ + β px 3 p ⋮ ym = β 0 + β 1 xm 1 + β 2 xm 2 + ⋯ + β pxmp \begin{aligned} y_1 &= \beta_{0} + \beta_ {1} x_{11} + \beta_{2} x_{12} + \cdots + \beta_{p} x_{1p} \\ y_2 &= \beta_{0} + \beta_{1} x_{21} + \beta_{2} x_{22} + \cdots + \beta_{p} x_{3p} \\ y_3 &= \beta_{0} + \beta_{1} x_{31} + \beta_{2} x_ {32} + \cdots + \beta_{p} x_{3p} \\ \vdots \\ y_m &= \beta_{0} + \beta_{1} x_{m1} + \beta_{2} x_{m2} + \cdots + \beta_{p} x_{mp} \end{aligned}y1y2y3ym=b0+b1x11+b2x12++bpx1p=b0+b1x21+b2x22++bpx3p=b0+b1x31+b2x32++bpx3p=b0+b1xm 1+b2xm 2++bpxmp

You can see that this regression model has a total of
mmm response variablesy 1 , ⋯ , ym y_1, \cdots, y_my1,,ym
m ∗ p m * p mp explanatory variablesx 11 , ⋯ , xmp x_{11}, \cdots, x_{mp}x11,,xmp
1 1 1 parameterβ 0 \beta_{0}b0
p p p parametersβ 1 , ⋯ , β p \beta_{1}, \cdots, \beta_{p}b1,,bp


Next, matrixization is performed, so that

Y = [ y 1 y 2 ⋮ y m ] m × 1 , X = [ x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m p ] m × p , β = [ β 1 β 2 ⋮ β p ] p × 1 Y = \left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix}\right]_{m \times 1}, \quad X = \left[\begin{matrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mp} \\ \end{matrix}\right]_{m \times p}, \quad \beta = \left[\begin{matrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{matrix}\right]_{p \times 1} Y= y1y2ym m×1,X= x11x21xm 1x12x22xm 2x1px2pxmp m×p,b= b1b2bp p×1

Y = X β + β 0 \begin{aligned} Y = X \beta + \beta_0 \end{aligned} Y=+b0

Although this form looks simple and clear, it has one more term β 0 \beta_0b0, we change β 0 \beta_0 in another formb0Integrate into β \betaGo to β . make

Y = [ y 1 y 2 ⋮ y m ] m × 1 , X = [ 1 x 11 x 12 ⋯ x 1 p 1 x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋮ ⋱ ⋮ 1 x m 1 x m 2 ⋯ x m p ] m × ( p + 1 ) , β = [ β 0 β 1 β 2 ⋮ β p ] ( p + 1 ) × 1 Y = \left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix}\right]_{m \times 1}, \quad X = \left[\begin{matrix} 1 & x_{11} & x_{12} & \cdots & x_{1p} \\ 1 & x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \cdots & x_{mp} \\ \end{matrix}\right]_{m \times (p+1)}, \quad \beta = \left[\begin{matrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{matrix}\right]_{(p+1) \times 1} Y= y1y2ym m×1,X= 111x11x21xm 1x12x22xm 2x1px2pxmp m×(p+1),b= b0b1b2bp (p+1)×1

Then it became

Y = X β \begin{aligned} Y = X \beta \end{aligned} Y=


2. Matrix operations to find partial derivatives

The next focus is to find the best θ \thetaθ derivativeXβ − YX\beta - YY is the smallest, that is

min ⁡ β ∥ X β − Y ∥ 2 2 \min_{\beta} \| X \beta - Y \|_2^2bminY22

Among them : X , YX, YX,Y is known, the unknown isβ \betab .

Use the least squares method to find the optimal parameter β \betaWhen β , we often see a conclusion formula, that is, the optimalβ \betaβ is

β = ( X T X ) − 1 X T Y \beta = (X^\text{T} X)^{-1} X^\text{T} Y b=(XTX)1XTY

But I still don’t know the solution process. Therefore, how to arrive at this conclusion will be discussed next.


J ( β ) = ∥ X β − Y ∥ 2 2 J(\beta) = \| X \beta - Y \|_2^2 J ( b )=Y22, which is another way to express the sum of squares of the residuals. We can find its corresponding matrix form by expanding this equation as

J ( β ) = ( X β − Y ) T ( X β − Y ) J(\beta) = (X \beta - Y)^\text{T} (X \beta - Y) J ( b )=(Y)T(Y)

Next calculate J ( β ) J(\beta)J ( β )β \betaThe partial derivative of β , and let it equal zero to find β \betaThe value of β . Expanding the above equation we have

J ( β ) = ( X β − Y ) T ( X β − Y ) = ( β T X T − Y T ) ( X β − Y ) = β T X T X β − β T X T Y − Y T X β + Y T Y \begin{aligned} J(\beta) &= (X \beta - Y)^\text{T} (X \beta - Y) \\ &= (\beta^\text{T} X^\text{T} - Y^\text{T}) (X \beta - Y) \\ &= \beta^\text{T} X^\text{T} X \beta - \beta^\text{T} X^\text{T} Y - Y^\text{T} X \beta + Y^\text{T} Y \end{aligned} J ( b )=(Y)T(Y)=( bTXTYT)(Y)=bTXTbTXTYYT+YTY

To find the partial derivative, we have

∂ J ( β ) ∂ β = ∂ ∂ β ( β T X T X β − β T X T Y − Y T X β + Y T Y ) \begin{aligned} \frac{\partial J(\beta)}{\partial \beta} &= \frac{\partial}{\partial \beta} (\beta^\text{T} X^\text{T} X \beta - \beta^\text{T} X^\text{T} Y - Y^\text{T} X \beta + Y^\text{T} Y) \end{aligned} βJ ( b )=β( bTXTbTXTYYT+YTY)

To find out this formula, you must use the rules of matrix calculus: For the unknown quantity XXX and constant matrixAAA , yes

  • d d X ( A X ) = A T \frac{\text{d}}{\text{d}X}(A X) = A^\text{T} dXd(AX)=AT
  • d d X ( X T A ) = A \frac{\text{d}}{\text{d}X}(X^\text{T} A) = A dXd(XT A)=A
  • d d X ( X T A X ) = ( A + A T ) X \frac{\text{d}}{\text{d}X}(X^\text{T} A X) = (A+A^\text{T}) X dXd(XTAX)=(A+AT)X

According to the above rules, there are

  • ∂ ∂ β ( β T X T X β ) = ( ( X T X ) + ( X T X ) T ) β = 2 X T X β \frac{\partial}{\partial \beta} (\beta^\text{T} X^\text{T} X \beta) = ((X^\text{T} X) + (X^\text{T} X)^\text{T})\beta = 2 X^\text{T} X \beta β( bTXT X β)=((XTX)+(XTX)T )b=2X _T
  • ∂ ∂ β ( − β T X T Y ) = − X T Y \frac{\partial}{\partial \beta} (- \beta^\text{T} X^\text{T} Y) = -X^\text{T} Y β( bTXTY)=XTY
  • ∂ ∂ β ( − Y T X β ) = ( − Y T X ) T = − X T Y \frac{\partial}{\partial \beta} (- Y^\text{T} X \beta) = (-Y^\text{T} X)^\text{T} = -X^\text{T} Y β(YT X β)=(YTX)T=XTY

Therefore, the partial derivative is

∂ J ( β ) ∂ β = 2 X T X β − 2 X T Y \frac{\partial J(\beta)}{\partial \beta} = 2 X^\text{T} X \beta - 2 X^\text{T} Y βJ ( b )=2X _T2X _TY

Let it be zero

2 X T X β − 2 X T Y = 0 X T X β − X T Y = 0 X T X β = X T Y β = ( X T X ) − 1 X T Y \begin{aligned} 2 X^\text{T} X \beta - 2 X^\text{T} Y &= 0 \\ X^\text{T} X \beta - X^\text{T} Y &= 0 \\ X^\text{T} X \beta &= X^\text{T} Y \\ \beta &= (X^\text{T} X)^{-1} X^\text{T} Y \end{aligned} 2X _T2X _TYXTXTYXTb=0=0=XTY=(XTX)1XTY

At this point, the solution formula for the optimal parameters of the least squares method has been obtained.


Ref

  1. Least squares method - WikiPedia
  2. The least squares method, how to use it and how to derive it - bilibili
  3. How to understand the least squares method? - Classmate Ma

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/132222677