Friendly reminder: Please read the article [Mathematical Knowledge] Least Squares Method first, starting from linear regression, giving numerical examples and using the least squares method to solve the regression model , and then read this article.
Article directory
In the previous article [Mathematical Knowledge] Least Squares Method, starting from linear regression, giving numerical examples and using the least squares method to solve the regression model, starting from the perspective of regression analysis, the linear regression model was analyzed, specific numerical examples were given, and using The least squares method solves the optimal parameters of the model.
In the previous regression analysis, the example used was:
- Simple linear regression model (single explanatory variable, single response variable)
- Multiple linear regression model (multiple explanatory variables, single response variable)
However, they are all single response variables, and it is not convenient to carry out matrix description.
In practical applications, regression models with multiple explanatory variables and multiple response variables are more commonly used, and matrix operations are also used, which can not only save computer resources but also speed up calculation efficiency.
Therefore, this article will start from multiple explanatory variables and multiple response variables, focusing on how to matrix it, as well as the derivation operations and operations after matrixing.
1. Multiple explanatory variables and multiple response variables
On the basis of the above (single explanatory variable, single response variable) sum, it is extended to a more general linear case.
This kind of regression model with multiple explanatory variables and multiple response variables can be expressed by the formula as
y 1 = β 0 + β 1 x 11 + β 2 x 12 + ⋯ + β px 1 py 2 = β 0 + β 1 x 21 + β 2 x 22 + ⋯ + β px 3 py 3 = β 0 + β 1 x 31 + β 2 x 32 + ⋯ + β px 3 p ⋮ ym = β 0 + β 1 xm 1 + β 2 xm 2 + ⋯ + β pxmp \begin{aligned} y_1 &= \beta_{0} + \beta_ {1} x_{11} + \beta_{2} x_{12} + \cdots + \beta_{p} x_{1p} \\ y_2 &= \beta_{0} + \beta_{1} x_{21} + \beta_{2} x_{22} + \cdots + \beta_{p} x_{3p} \\ y_3 &= \beta_{0} + \beta_{1} x_{31} + \beta_{2} x_ {32} + \cdots + \beta_{p} x_{3p} \\ \vdots \\ y_m &= \beta_{0} + \beta_{1} x_{m1} + \beta_{2} x_{m2} + \cdots + \beta_{p} x_{mp} \end{aligned}y1y2y3⋮ym=b0+b1x11+b2x12+⋯+bpx1p=b0+b1x21+b2x22+⋯+bpx3p=b0+b1x31+b2x32+⋯+bpx3p=b0+b1xm 1+b2xm 2+⋯+bpxmp
You can see that this regression model has a total of
mmm response variablesy 1 , ⋯ , ym y_1, \cdots, y_my1,⋯,ym,
m ∗ p m * p m∗p explanatory variablesx 11 , ⋯ , xmp x_{11}, \cdots, x_{mp}x11,⋯,xmp,
1 1 1 parameterβ 0 \beta_{0}b0,
p p p parametersβ 1 , ⋯ , β p \beta_{1}, \cdots, \beta_{p}b1,⋯,bp。
Next, matrixization is performed, so that
Y = [ y 1 y 2 ⋮ y m ] m × 1 , X = [ x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m p ] m × p , β = [ β 1 β 2 ⋮ β p ] p × 1 Y = \left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix}\right]_{m \times 1}, \quad X = \left[\begin{matrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mp} \\ \end{matrix}\right]_{m \times p}, \quad \beta = \left[\begin{matrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{matrix}\right]_{p \times 1} Y= y1y2⋮ym m×1,X= x11x21⋮xm 1x12x22⋮xm 2⋯⋯⋱⋯x1px2p⋮xmp m×p,b= b1b2⋮bp p×1
Y = X β + β 0 \begin{aligned} Y = X \beta + \beta_0 \end{aligned} Y=Xβ+b0
Although this form looks simple and clear, it has one more term β 0 \beta_0b0, we change β 0 \beta_0 in another formb0Integrate into β \betaGo to β . make
Y = [ y 1 y 2 ⋮ y m ] m × 1 , X = [ 1 x 11 x 12 ⋯ x 1 p 1 x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋮ ⋱ ⋮ 1 x m 1 x m 2 ⋯ x m p ] m × ( p + 1 ) , β = [ β 0 β 1 β 2 ⋮ β p ] ( p + 1 ) × 1 Y = \left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix}\right]_{m \times 1}, \quad X = \left[\begin{matrix} 1 & x_{11} & x_{12} & \cdots & x_{1p} \\ 1 & x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \cdots & x_{mp} \\ \end{matrix}\right]_{m \times (p+1)}, \quad \beta = \left[\begin{matrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{matrix}\right]_{(p+1) \times 1} Y= y1y2⋮ym m×1,X= 11⋮1x11x21⋮xm 1x12x22⋮xm 2⋯⋯⋱⋯x1px2p⋮xmp m×(p+1),b= b0b1b2⋮bp (p+1)×1
Then it became
Y = X β \begin{aligned} Y = X \beta \end{aligned} Y=Xβ
2. Matrix operations to find partial derivatives
The next focus is to find the best θ \thetaθ derivativeXβ − YX\beta - YXβ−Y is the smallest, that is
min β ∥ X β − Y ∥ 2 2 \min_{\beta} \| X \beta - Y \|_2^2bmin∥Xβ−Y∥22
Among them : X , YX, YX,Y is known, the unknown isβ \betab .
Use the least squares method to find the optimal parameter β \betaWhen β , we often see a conclusion formula, that is, the optimalβ \betaβ is
β = ( X T X ) − 1 X T Y \beta = (X^\text{T} X)^{-1} X^\text{T} Y b=(XTX)−1XTY
But I still don’t know the solution process. Therefore, how to arrive at this conclusion will be discussed next.
令 J ( β ) = ∥ X β − Y ∥ 2 2 J(\beta) = \| X \beta - Y \|_2^2 J ( b )=∥Xβ−Y∥22, which is another way to express the sum of squares of the residuals. We can find its corresponding matrix form by expanding this equation as
J ( β ) = ( X β − Y ) T ( X β − Y ) J(\beta) = (X \beta - Y)^\text{T} (X \beta - Y) J ( b )=(Xβ−Y)T(Xβ−Y)
Next calculate J ( β ) J(\beta)J ( β )对β \betaThe partial derivative of β , and let it equal zero to find β \betaThe value of β . Expanding the above equation we have
J ( β ) = ( X β − Y ) T ( X β − Y ) = ( β T X T − Y T ) ( X β − Y ) = β T X T X β − β T X T Y − Y T X β + Y T Y \begin{aligned} J(\beta) &= (X \beta - Y)^\text{T} (X \beta - Y) \\ &= (\beta^\text{T} X^\text{T} - Y^\text{T}) (X \beta - Y) \\ &= \beta^\text{T} X^\text{T} X \beta - \beta^\text{T} X^\text{T} Y - Y^\text{T} X \beta + Y^\text{T} Y \end{aligned} J ( b )=(Xβ−Y)T(Xβ−Y)=( bTXT−YT)(Xβ−Y)=bTXT Xβ−bTXTY−YT Xβ+YTY
To find the partial derivative, we have
∂ J ( β ) ∂ β = ∂ ∂ β ( β T X T X β − β T X T Y − Y T X β + Y T Y ) \begin{aligned} \frac{\partial J(\beta)}{\partial \beta} &= \frac{\partial}{\partial \beta} (\beta^\text{T} X^\text{T} X \beta - \beta^\text{T} X^\text{T} Y - Y^\text{T} X \beta + Y^\text{T} Y) \end{aligned} ∂β∂ J ( b )=∂β∂( bTXT Xβ−bTXTY−YT Xβ+YTY)
To find out this formula, you must use the rules of matrix calculus: For the unknown quantity XXX and constant matrixAAA , yes
- d d X ( A X ) = A T \frac{\text{d}}{\text{d}X}(A X) = A^\text{T} dXd(AX)=AT
- d d X ( X T A ) = A \frac{\text{d}}{\text{d}X}(X^\text{T} A) = A dXd(XT A)=A
- d d X ( X T A X ) = ( A + A T ) X \frac{\text{d}}{\text{d}X}(X^\text{T} A X) = (A+A^\text{T}) X dXd(XTAX)=(A+AT)X
According to the above rules, there are
- ∂ ∂ β ( β T X T X β ) = ( ( X T X ) + ( X T X ) T ) β = 2 X T X β \frac{\partial}{\partial \beta} (\beta^\text{T} X^\text{T} X \beta) = ((X^\text{T} X) + (X^\text{T} X)^\text{T})\beta = 2 X^\text{T} X \beta ∂β∂( bTXT X β)=((XTX)+(XTX)T )b=2X _T Xβ
- ∂ ∂ β ( − β T X T Y ) = − X T Y \frac{\partial}{\partial \beta} (- \beta^\text{T} X^\text{T} Y) = -X^\text{T} Y ∂β∂( − bTXTY)=−XTY
- ∂ ∂ β ( − Y T X β ) = ( − Y T X ) T = − X T Y \frac{\partial}{\partial \beta} (- Y^\text{T} X \beta) = (-Y^\text{T} X)^\text{T} = -X^\text{T} Y ∂β∂(−YT X β)=(−YTX)T=−XTY
Therefore, the partial derivative is
∂ J ( β ) ∂ β = 2 X T X β − 2 X T Y \frac{\partial J(\beta)}{\partial \beta} = 2 X^\text{T} X \beta - 2 X^\text{T} Y ∂β∂ J ( b )=2X _T Xβ−2X _TY
Let it be zero
2 X T X β − 2 X T Y = 0 X T X β − X T Y = 0 X T X β = X T Y β = ( X T X ) − 1 X T Y \begin{aligned} 2 X^\text{T} X \beta - 2 X^\text{T} Y &= 0 \\ X^\text{T} X \beta - X^\text{T} Y &= 0 \\ X^\text{T} X \beta &= X^\text{T} Y \\ \beta &= (X^\text{T} X)^{-1} X^\text{T} Y \end{aligned} 2X _T Xβ−2X _TYXT Xβ−XTYXT Xβb=0=0=XTY=(XTX)−1XTY
At this point, the solution formula for the optimal parameters of the least squares method has been obtained.
Ref
- Least squares method - WikiPedia
- The least squares method, how to use it and how to derive it - bilibili
- How to understand the least squares method? - Classmate Ma