Linear algebra---The application of least squares to straight line fitting and Gram-Schmidt orthogonalization (below)

        In the previous article, an example was used to illustrate the role of least squares in fitting a straight line, and the comparison of two illustrations further clarified the close relationship between projection and minimizing e.

Linear algebra---The application of least squares to straight line fitting and Gram-Schmidt orthogonalization (on)_Panasonic J27 Blog-CSDN Blog , step by step to deduce what is Gram-Schmidt orthogonalization, and why we need to orthogonalize the column vectors in matrix A https://blog.csdn.net/daduzimama/article/details/129995583         here In this article, we will still start with the straight line fitting of the three sets of data points. The difference is that thevalues ​​of the three observation time points selected this time are quite special, and then lead to the concept of Gram-Schmidt orthogonalization.

Example 2:

        In Example 1, three moments t = -1, 1, and 2 are selected as observation points, and three observation values ​​of b=1, 1, and 3 are obtained. Here, we changed the time of the three observation points to t = -2,0,2 , (note that this is the most important change) and got another set of measurements b = 1,2,4. Draw these three points in the Cartesian coordinate system, and the three points are not on the same straight line. Similarly, this is also a least squares straight line fitting problem.

Using the equation b=C+Dt to express the straight line through these points, the following equations are obtained:

 These three points are not on the same straight line, so there is no solution. It is necessary to solve the least squares equations and simultaneous normal equations A^{T}A\hat{x}=A^{T}b.

 \large A=\begin{bmatrix} 1 & -2\\ 1 & 0\\ 1 & 2 \end{bmatrix}                \large \hat{x}=\begin{bmatrix} \hat{C}\\ \hat{D} \end{bmatrix}                \large b=\begin{bmatrix} 1\\ 2\\ 4 \end{bmatrix}

 left A^{T}A:

\large A^{T}A=\begin{bmatrix} 1 &1 & 1\\ -2& 0 & 2 \end{bmatrix}\begin{bmatrix} 1 &-2 \\ 1&0 \\ 1 & 2 \end{ bmatrix}=\begin{bmatrix}3&\mathbf{0}\\\mathbf{0}&8\end{bmatrix}

 right side A^{T}b:

\large A^{T}b=\begin{bmatrix} 1 &1 & 1\\ -2& 0 & 2 \end{bmatrix}\begin{bmatrix} 1 \\2 \\ 4 \end{bmatrix}=\begin{bmatrix} 7 \\ 6 \end{bmatrix}

 get:

\large A^{T}A\hat{x}=A^{T}b\; \mathbf{ is}\; \begin{bmatrix} 3 &0 \\ 0 & 8 \end{bmatrix}\begin{bmatrix} \hat{C}\\ \hat{D} \end{bmatrix} = \begin{bmatrix} 7\\ 6 \end{bmatrix}

Finally, the optimal solution is obtained as:

\large \hat{x}=(A^{T}A)^{-1}A^{T}b=\begin{bmatrix} 1/3 &0 \\ 0& 1/8 \end{bmatrix} \begin{bmatrix} 7\\ 6 \end{bmatrix} = \begin{bmatrix} 7/3\\ 6/8 \end{bmatrix}

in: 

\large \hat{C}=7/3,\; \hat{D}=6/8

 The corresponding best-fit straight line is:

\large f(x)=7/3+6/8t

 Meanwhile, find the projection vector p:

\large p=A(A^{T}A)^{-1}A^{T}b=A{\hat{x}}=\begin{bmatrix} 1 & -2\\ 1 &0 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 7/3\\ 6/8 \end{bmatrix} =\begin{bmatrix} 5/6\\ 7/3\\ 23/6 \end{bmatrix}

Among them, the projections P1, P2, and P3 are on the same straight line, as shown in the figure below: 


The benefits of changing the time observation point:

        Now, let's look back at the process of finding the optimal solution in this example. So far, my calculation process and method are exactly the same as those in Example 1 in the previous article, and I still directly apply the formula of \hat{x}p calculated, such as: 

Another example:

         But in fact, if we pay attention to the previous normal equation, we can find that the optimal solution can actually be obtained by directly solving the normal equation. Since the values ​​of the three observation points are changed in this example, it is A^{T}Aa diagonal matrix. This allows us to directly write the solution to the system of equations \hat{C}=7/3, \hat{D}=6/8.

        A^{T}AThere are two main reasons why it is a diagonal matrix: First, the sum of all elements in the vector t is 0, or the measured values ​​b1, b2 and b3 are values ​​taken at the time of symmetry about t=0 . Second, the inner product of the two column vectors [1,1,1] and [-2,0,2] in matrix A is 0, and they are mutually orthogonal .

        If the sum of the three moments t selected by the three observation points is not equal to 0, or is not symmetrical about t=0. (As far as this example is concerned, because the first column of matrix A is a vector of all 1s, if the sum of vector t is not 0, the inner product between him and another vector will not be 0). You can spend a little time first, by subtracting the mean value of t from the three moments t \hat{t}=(t1+t2+...+tm)/m, so that the sum of the three moments is 0. Because in this way we can directly find it through the normal equation \hat{x}.

        For example, when t=(1,3,5), his sum is not equal to 0. His mean \hat{t}=3, and then subtract the mean from each element in t to get a new T=t-\hat{t}=t-3=(-2,0,2). In this way, the sum of T is equal to 0 again! At the same time, the fitting equation of the straight line has changed from \hat{C}+ t to + T =   + ( ) =  + (t - 3).\hat{D}\hat{C}\hat{D}\hat{C}\hat{D}t-\hat{t}\hat{C}\hat{D}

         In this way, we no longer need \hat{x}=(A^{T}A)^{-1}A^{T}bto solve \hat{C}the sum through formulas \hat{D}, but directly solve the normal equation (Normal Equation) to get \hat{C}the sum \hat{D}.

        In fact, this particular example coincides with the idea of ​​"Gram-Schmidt orthogonalization". That is, if the column vectors in the original matrix A are not orthogonal vectors, first change the column vectors in the matrix A into orthogonal vectors to obtain a new matrix A_{new}. A^{T}Ax=A^{T}bIn this way, the left part of the normal equation A_{new}^{T}A_{new}will become a diagonal matrix. Once A_{new}^{T}A_{new}the result is a diagonal matrix, the solution \hat{x}will become very easy.

        We will see later that Gram-Schmidt orthogonalization will not only A_{new}^{T}A_{new}turn it into a diagonal matrix, but also turn it A_{new}^{T}A_{new}into an identity matrix. In that case, it will be easier to solve the equation.

summary:

        For the least squares, it is only required that each column in the matrix A is linearly independent from the beginning (because only in this way, A^{T}Ait is invertible). Up to now, not only are the columns in A required to be linearly independent, but we also require the column vectors in A to be orthogonal to each other, thus leading to the rudiment of Gram-Schmidt orthogonalization step by step.


 (full text)

Author --- Panasonic J27

References (thanks):

1, Introduction to Linear Algebra, Fifth Edition - Gilbert Strang (most of the illustrations in the text are from this book)

2. Graphing software, Graphing Calculator

Appreciation of classic lyrics:

        Don't make friends with yourself, the tree and the shadow are so awkward. Xinhe opened and twisted many buttons, like boiled polenta.

--- Excerpted from "Ancient Ships Trawling Nets, No Fast Boats" (Theme Song of " Ancient Ships, Women and Nets ")

(The accompanying picture has nothing to do with this article)

Copyright statement: Some pictures, texts or other materials in this article may come from many different websites and descriptions, so I cannot list them here. If there is any infringement, please let me know and delete it immediately. Everyone is welcome to reprint, but if someone quotes or copies my article, you must indicate in your article that the pictures or text you use come from my article, otherwise, the infringement will be investigated. ----Panasonic J27

Guess you like

Origin blog.csdn.net/daduzimama/article/details/130081729