[Linear Algebra 04] Projection matrix P and standard orthogonal matrix Q

  Continuing with the content of the MIT notes, the solution of AX=b was discussed earlier, and I focused on drawing some graphics of the solution space. Later we divided these subspaces into four parts, namely AAThe column space, row space and null space of A , and ATA^TAFor the null space of T , the relationship between these four spaces is shown in figure 4.2 of the textbook. In fact, we can find such a vertical relationship from the last example last time. It should be pointed out that the vertical relationship is a very good relationship. This time we mainly introducetheprojection matrix Pandthe standard orthogonal matrix Q.


“Follow the rules”

  When the old man talked about orthogonal vectors and orthogonal subspaces, he asked whether the zero vector is orthogonal to any vector. He gave such a suggestion " The one thing about math is you're supposed to follow the rules. " This The flavor of the suggestion is that when everything is difficult to decide or even unbelievable, you should return to the definition. If the definition allows it to happen and it happens, then it is reasonable and you should face it squarely. So, whether the zero vector is orthogonal to any vector, the answer is "sure". So let's look at the definition of orthogonal (or vertical in spatial relation):

If the inner product of two vectors in the inner product space is 0 , then they are orthogonal .

insert image description here

  Taking figure4.2 as an example, the known null space is defined as

The null space appears in the context of a linear map (that is, a matrix), and refers to the original image space of zero, that is, { x ∣ A x = 0 x| Ax=0 xAx=0}

  It can be seen that the definition of the null space happens to be embedded with the definition of orthogonality . So see AX = 0 AX=0AX=0 A A A throughXXNull space represented by X and AAThe inner product of the row space represented by A is 0, that is, AAA 's Null Space andAAThe line space of A is vertical, that is, the left part of figure4.2; look atATY = 0 A^TY=0ATY=0 A T A^T AT throughYYNull space represented by Y andATA^TAThe row space of T (due to the transposition, i.e.AAThe column space of A ) vertically, shown as the right part of figure 4.2.


Projection matrix P

   What is a projection matrix? We illustrate this concept in conjunction with figure 4.6.

insert image description here

Start with vectors

   Look at the left part first, this is a vector b ⃗ \vec{b}b projected onto a ⃗ \vec{a}a Combining with the definition of orthogonality, we have
e ⃗ ⋅ a ⃗ = 0 ⇒ ( b ⃗ − p ⃗ ) ⋅ a ⃗ = 0 ⇒ b ⃗ ⋅ a ⃗ − p ⃗ ⋅ a ⃗ = 0 ⇒ p ⃗ ⋅ a ⃗ = b ⃗ ⋅ a ⃗ \vec{e} \cdot \vec{a} = 0 \Rightarrow (\vec{b}-\vec{p}) \cdot \vec{a} = 0 \Rightarrow \vec {b} \cdot \vec{a} - \vec{p} \cdot \vec{a} = 0 \Rightarrow \vec{p} \cdot \vec{a} = \vec{b} \cdot \vec{ a}e a =0(b p )a =0b a p a =0p a =b a
   Multiply both sides by a ⃗ \vec{a}a , Note that vector multiplication does not satisfy the associative law, but matrix multiplication does, but matrix multiplication does not satisfy the commutative law, but vector multiplication does. Then there is
p ⃗ ∣ a ∣ 2 = b ⃗ ⋅ a ⃗ ⋅ a ⃗ ⇒ p ⃗ = b ⃗ ⋅ a ⃗ ∣ a ∣ 2 a ⃗ = a ⃗ ⋅ b ⃗ ∣ a ∣ 2 a ⃗ \vec{p} |a|^2 = \vec{b} \cdot \vec{a} \cdot \vec{a} \Rightarrow \vec{p} = \frac{\vec{b} \cdot \vec{a}}{ |a|^2}\vec{a}= \frac{\vec{a} \cdot \vec{b} }{|a|^2}\vec{a}p a2=b a a p =a2b a a =a2a b a
   We express the above vector formula in matrix form, a T aa^TaaT ameans fora ⃗ \vec{a}a The square of the modulus of the column vector represented, and note that a T ba^TbaTb也为一数,就有:
p = a T b a T a a = a a T b a T a = a a T a T a b p = \frac{a^Tb}{a^Ta}a= a\frac{a^Tb}{a^Ta}=\frac{a a^T}{a^Ta}b p=aT aaTba=aaT aaTb=aT aaaTThe b
   projection matrix then plays such a role and finds out the vectorb ⃗ \vec{b}b In the vector a ⃗ \vec{a}a On the component, that is, p = P b = ax ^ p = Pb=a\hat{x}p=Pb=ax^ . For the left part, the projection matrix is
​​P = aa T a T a P = \frac{aa^T}{a^Ta}P=aT aaaT

Look at the matrix again

   But looking at this process, we can easily find that a T aa^TaaTa a T b a^Tb aThe condition that T bis a number is harsh, and it is aimed at a one-dimensional column matrix; for general matrices, the results of these two are matrices rather than a number. So we continue to look at the right part and project b onto a plane represented by a two-dimensional column vector. Although the equation has become more complicated, the obtained equation is still the same, that is, using a set of bases in the plane ande ⃗ \vec{e}e The result of the inner product is 0. Then for AAAs far as A is concerned, the plane where its two bases are located is its column space. If e is regarded as a column vector, the equation can be solved:
AT e = 0 A^Te=0AThat's it=0
   We generate e from the perspective of the projection matrix P:
e = b − p = b − P b = b − A x ^ e = bp=b-Pb=bA\hat{x}e=bp=bPb=bAx^
   反代后,就有:
A T ( b − A x ^ ) = A T b − A T A x ^ = 0 ⇒ A T A x ^ = A T b ⇒ x ^ = ( A T A ) − 1 A T b ⇒ p = A x ^ = A ( A T A ) − 1 A T b A^T(b-A\hat{x})=A^Tb-A^TA\hat{x}=0 \Rightarrow A^T A\hat{x} = A^Tb \Rightarrow \hat{x} = (A^TA)^{-1}A^Tb \Rightarrow p = A\hat{x} = A(A^TA)^{-1}A^Tb AT(bAx^)=ATbAT Ax^=0AT Ax^=ATbx^=(AT A)1ATbp=Ax^=A(AT A)1AT b
   finally deduces the projection matrix P
P = A ( ATA ) − 1 ATP = A(A^TA)^{-1}A^TP=A(AT A)1ABut
   this derivation is obviously based on an assumption that"If A has independent columns, thenATAA^TAAT Ais invertible."In other words, for A, there is a left invert matrix.


least square method

   An interesting usage of the projection matrix is ​​to derive the least squares method, which is often used in linear regression. We can either use the general gradient descent method to give the computer iterative derivatives to solve, or directly start from the matrix. Let's look at a general example:
A x = [ a 11 a 12 a 21 a 22 a 31 a 32 . . . . . ] [ CD ] = [ b 1 b 2 b 3 . . . ] = b Ax = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \\ . & .\\. & .\\ . &.\end{bmatrix} \begin{bmatrix} C \\ D \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ . \\ . \\ . \end{bmatrix} = bAx=a11a21a31...a12a22a32...[CD]=b1b2b3...=b
   A x = b Ax= b Ax=b When the number of data points (number of equations) is much greater than the number of fitting parameters (2 unknowns), it is obvious that there is either a solution or no solution. According to the idea of ​​least squares, when faced with no solution, we should seek the minimum value of the following function, namely
min ( a 11 C + a 12 D − b 1 ) 2 + ( a 21 C + a 22 D − b 2 ) 2 + ( a 31 C + a 32 D − b 3 ) 2 + . . . min \ \ (a_{11}C+a_{12}D-b_1)^2+(a_{21}C+a_ {22}D-b_2)^2+(a_{31}C+a_{32}D-b_3)^2+...min  (a11C+a12Db1)2+(a21C+a22Db2)2+(a31C+a32Db3)2+. . .We
  compare C sum D partial deviation (abbreviated coefficient 2), with
a 11 ( a 11 C + a 12 D − b 1 ) + a 21 ( a 21 C + a 22 D − b 2 ) + a31 ( a31 C + a32 D − b3 ) + . . . = 0 a12 ( a11 C + a12 D − b1 ) + a22 ( a21 C + a22 D − b2 ) + a 32 ( a 31 C + a 32 D − b 3 ) + . }C+a_{22}D-b_2)+a_{31}(a_{31}C+a_{32}D-b_3)+... = 0 \\ a_{12} (a_{11}C+ a_{12}D-b_1)+a_{22}(a_{21}C+a_{22}D-b_2)+a_{32}(a_{31}C+a_{32}D-b_3)+. .. = 0 \\a11(a11C+a12Db1)+a21(a21C+a22Db2)+a31(a31C+a32Db3)+...=0a12(a11C+a12Db1)+a22(a21C+a22Db2)+a32(a31C+a32Db3)+...=0
  We add the two formulas together, and it is not difficult to find that the coefficient of C is ∑ ai 1 2 + ai 1 ai 2 \sum a_{i1}^2+a_{i1}a_{i2}ai 12+ai 1ai2, the D-like coefficient is ∑ ai 2 2 + ai 1 ai 2 \sum a_{i2}^2+a_{i1}a_{i2}ai22+ai 1ai2, and bi b_ibiThe former coefficient is equivalent to ai 1 + ai 2 a_{i1}+a_{i2}ai 1+ai2. So we can write this equation
ATA [ CD ] = AT b A^TA \begin{bmatrix} C \\ D \end{bmatrix} = A^TbAT A[CD]=A
  When T b
is the left inverse of A, there is: [ CD ] = ( ATA ) − 1 AT b \begin{bmatrix} C \\ D \end{bmatrix} =(A^TA)^{-1}A ^Tb[CD]=(AT A)1AT b
  Then the question arises, how to derive this formula from P? We know thatwhen P exists, we can project b, which is not in the column space of A, into the column space of A, which can form a solution guarantee. That is to say:
P b = A x ^ ⇒ A ( ATA ) − 1 AT b = A x ^ Pb = A \hat{x} \Rightarrow A(A^TA)^{-1}A^Tb = A\ hat{x}Pb=Ax^A(AT A)1ATb=Ax^
  So we have
x ^ = ( ATA ) − 1 AT b \hat{x} = (A^TA)^{-1}A^Tbx^=(AT A)1AT b
  From this point of view,∑ ei \sum e_igenerated by projecting the vector onto the planeei(vector sum) min . We can use matlab to verify its correctness, that is, first use the polyfit function to solve it, and then use the matrix to verify it .


% 用matlab自带的ployfit函数拟合曲线

x=[9,13,15,17,18.6,20,23,29,31.7,35];
y=[-8,-6.45,-5.1,-4,-3,-1.95,-1.5,-0.4,0.2,-0.75];
coefficient=polyfit(x,y,1);  %用一次函数拟合曲线,想用几次函数拟合,就把n设成那个数
y1=polyval(coefficient,x);
plot(x,y,'o',x,y1,'-');
legend('散点',['拟合曲线y=',num2str(coefficient(1)),'*x',num2str(coefficient(2))]);

% 验证
c = ones(1,length(x)); % 常数列
A = [c;x]'; % 矩阵A
est = inv(A'*A)*A'*y'; % 估计值
text(15,-6,['常数项系数=',num2str(est(1)),'  一次项系数=',num2str(est(2))]);

insert image description here

orthonormal matrix

nature

   The so-called standard means that the modulus value of each vector in the vector group that makes up the matrix is ​​1, and the so-called orthogonal means that the inner product of any two vectors in the vector group is 0, which is the "Follow the rules" reminded at the beginning . It is stipulated that such a special matrix must have some excellent properties. Since the inner product of any two vectors is 0, we can easily understand that they are all linearly independent. According to the definition of standard orthogonal matrix, we can easily deduce the following formula:
QQT = I QQ^T = IQQT=IBecause each column of Q is only related to itself, and the modulus is 1, and it is irrelevant to the other columns, and the inner product is 0, so the result will be an identity
   matrix. Further, for the square matrix, we know that since the columns are linearly independent, the inverse must exist, and the definition of the inverse seems to be hereQTQ^TQThe position of T , so, there is
Q − 1 = QTQ^{-1}=Q^{T}Q1=QThe inverse of the T
   orthonormal matrix is ​​its transpose, a gift that makes people happy enough.


process

  Let's show the method of standard orthogonalization of the matrix, that is, Gram-Schmidt orthogonalization. This step is divided into two steps, the first step is orthogonalization, and the second step is normalization. To understand the principle of orthogonalization, we can also refer to the graph of our projection matrix P, that is, instead of seeking p, we are seeking the orthogonal e .

  With the help of figure4.6, we know that e should be obtained like this:
B = b − P b = b − a ( a T a ) − 1 a T b = b − a T ba T aa B= b-Pb=ba( a^Ta)^{-1}a^Tb = b-\frac{ a^T b}{a^Ta}aB=bPb=ba(aT a)1aTb=baT aaTbaSimilarly
  ,if it is for a plane, the two bases that have been orthogonalized before should be subtracted, namely:
C = c − P 1 c − P 2 c = c − a T ca T aa − BT c BTBBC= c-P_1c-P_2 c=c-\frac{ a^T c}{a^Ta}a -\frac{ B^T c}{B^TB}BC=cP1cP2c=caT aaTcaBTBBTcB

  可以验证 a T B = 0 , a T C = 0 , B T a = 0 , B T C = 0 a^TB=0,a^TC=0,B^Ta=0, B^TC=0 aTB=0aTC=0BT a=0BTC=0 , which shows the correctness of our orthogonalization method.

  According to the diagram, let’s focus on the process of orthogonalization of column 3. As we know from the above, the two items subtracted from c are actually the projections of c on a and B. If we add this projection first, the result is blue The red diagonal line ON in the color plane is equivalent to explaining:
column 3 − red crossing line = column 3 of orthogonalization? ⇒ MN ⊥ planar OPNQ ? Column 3 - red bisecting line = Orthogonalized column 3? \Rightarrow MN \perp Plane OPNQ?column 3red crossing line _=Orthogonalization of column 3 ? _MNPlanar O P N Q ?
Please add a picture description
  Actually, this conclusion is obvious:
OP ⊥ PM , OP ⊥ PN ( OQ ) ⇒ OP ⊥ MNOQ ⊥ QM , OQ ⊥ QN ( OP ) ⇒ OQ ⊥ MNMN ⊥ OP , MN ⊥ OP ⇒ MN ⊥ planar OPNQ OP \perp PM, OP \perp PN(OQ) \Rightarrow OP \perp MN \\ OQ \perp QM, OQ \perp QN(OP) \Rightarrow OQ \perp MN \\ MN \perp OP , MN \perp OP \Rightarrow MN\ perp plane OPNQ \\OPP M O PPN(OQ)OPMNOQQMOQQ N ( O P )OQMNMNOP,MNOPMNPlane O P N Q
  From this, it can be reasonably extrapolated that for the hyperplane, the nth vector should subtract the n-1 bases that have been orthogonalized . Of course, this kind of perspective based on the vertical starting point of space is cumbersome for the matrix, and A = LUA=LU of the same matrixA=Like the L U transformation, the standard orthogonalization of the matrix also corresponds to a matrix R, that is,A = QRA=QRA=Q R . We can easily see that this R is an upper triangular matrix (to verify the orthonormal property of the matrix Q), take the three-dimensional space as an example
[ abc ] = [ q 1 q 2 q 3 ] [ q 1 T aq 1 T bq 1 T c 0 q 2 T bq 2 T c 0 0 q 3 T c ] \begin{bmatrix} a & b &c \end{bmatrix}= \begin{bmatrix} q_1 & q_2 & q_3 \end{bmatrix}\begin{bmatrix} q_1^Ta & q_1^Tb & q_1^Tc \\ 0 & q_2^Tb & q_2^Tc \\ 0 & 0 &q_3^Tc \end{bmatrix}[abc]=[q1q2q3]q1Ta00q1Tbq2Tb0q1Tcq2Tcq3Tc
  Deduce q 1 q_1 from A (=a) , B, Cq1 q 2 q_2 q2 q 3 q_3 q3The process of is still one step away, that is, standardization, just divide the orthogonalized vector by its modulus :
q 1 = A ∣ A ∣ q 2 = B ∣ B ∣ q 3 = C ∣ C ∣ q _1 = \ frac{A}{|A|} \ \ \ q _2 = \frac{B}{|B|} \ \ \ q _3 = \frac{C}{|C|}q1=AA   q2=BB   q3=CC


example

  To end with an example:
A = [ 1 3 1 2 2 2 3 1 1 ] = [ a 1 a 2 a 3 ] A = \begin{bmatrix} 1 &3 & 1 \\ 2 & 2 &2 \\ 3 & 1 & 1 \end{bmatrix} =\begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix}A=123321121=[a1a2a3]

  Let’s take a look at the space first:

% 画向量图
quiver3(0,0,0,1,2,3,'m'); hold on; quiver3(0,0,0,3,2,1,'black'); % 基底
hold on;  quiver3(0,0,0,1,2,1,'r');  %3
V1 = [1;2;3]; V2 = [3;2;1];

% 求法向量
Vn = cross(V1,V2);
Vn = Vn/norm(Vn);

% 画单位法向量
hold on;quiver3(0,0,0,Vn(1),Vn(2),Vn(3),'g');
% 画平面
syms x1;syms x2;syms x3;
plane = -(x1*Vn(1)+x2*Vn(2))/Vn(3);
hold on;
fmesh(plane);
% 标注
legend('列1','列2','列3','单位法向量','列1和列2所在平面');

insert image description here
  首先进行正交化,即
A = a 1 = [ 1 2 3 ] B = a 2 − A T a 2 A T A A = [ 3 2 1 ] − 10 14 [ 1 2 3 ] = [ 16 / 7 4 / 7 − 8 / 7 ] C = a 3 − A T a 3 A T A A − B T a 3 B T B B = [ 1 2 1 ] − 8 14 [ 1 2 3 ] − 112 336 [ 16 / 7 4 / 7 − 8 / 7 ] = [ − 1 / 3 2 / 3 − 1 / 3 ] A = a_1= \begin{bmatrix} 1 \\2 \\ 3 \end{bmatrix} \\ B = a_2 - \frac{ A^T a_2}{A^TA}A =\begin{bmatrix} 3 \\2 \\ 1 \end{bmatrix}-\frac{10}{14}\begin{bmatrix} 1 \\2 \\ 3 \end{bmatrix}=\begin{bmatrix} 16/7\\ 4/7 \\ -8/7 \end{bmatrix} \\ C = a_3-\frac{ A^T a_3}{A^TA}A-\frac{ B^T a_3}{B^TB}B=\begin{bmatrix} 1 \\2 \\ 1 \end{bmatrix}-\frac{8}{14}\begin{bmatrix} 1 \\2 \\ 3 \end{bmatrix} -\frac{112}{336}\begin{bmatrix} 16/7\\ 4/7 \\ -8/7 \end{bmatrix} = \begin{bmatrix}-1/3 \\ 2/3 \\-1/3\end{bmatrix} A=a1=123B=a2AT AAT a2A=3211410123=16/74/78/7C=a3AT AAT a3ABTBBT a3B=12114812333611216/74/78/7=1/32/31/3
  Then normalize,
q 1 = A ∣ A ∣ = [ 1 2 3 ] / 14 = [ 0.2673 0.5345 0.8018 ] q 2 = B ∣ B ∣ = [ 16 / 7 4 / 7 − 8 / 7 ] / 336 / 49 = [ 0.8729 0.2182 − 0.4362 ] q 3 = C ∣ C ∣ = [ − 1 / 3 2 / 3 − 1 / 3 ] / 6 / 9 = [ − 0.4082 0.8165 − 0.4082 ] q _1 = \frac{A}{| A|} = \begin{bmatrix} 1 \\2 \\ 3 \end{bmatrix}/ \sqrt{14} =\begin{bmatrix} 0.2673 \\ 0.5345 \\ 0.8018 \end{bmatrix} \\ q _2 = \frac{B}{|B|} = \begin{bmatrix} 16/7\\ 4/7 \\ -8/7 \end{bmatrix} / \sqrt{336/49} = \begin{bmatrix} 0.8729 \\ 0.2182 \\ -0.4362 \\ 2/3 \\-1/3\end {bmatrix}/\sqrt{6/9}=\begin{bmatrix} -0.4082 \\ 0.8165 \\ -0.4082\end{bmatrix}q1=AA=123/14 =0.26730.53450.8018q2=BB=16/74/78/7/336/49 =0.87290.21820.4362q3=CC=1/32/31/3/6/9 =0.40820.81650.4082

  See below for the programming implementation of the above-mentioned Schmidt orthogonalization method. The orth function that can also be included in matlab can directly find a set of standard orthogonal bases , and can verify the correctness of the results.

a = [1,3,1;2,2,2;3,1,1];

%% 施密特正交化方法
[m,n] = size(a);
if(m<n)
    error('行小于列,无法计算,请转置后重新输入');
end
b=zeros(m,n);
%正交化
b(:,1)=a(:,1);
for i=2:n
    for j=1:i-1
        b(:,i)=b(:,i)-dot(a(:,i),b(:,j))/dot(b(:,j),b(:,j))*b(:,j);
    end
    b(:,i)=b(:,i)+a(:,i);
end

%单位化
for k=1:n
    b(:,k)=b(:,k)/norm(b(:,k));
end

%% 直接用orth方法
result = orth(a);
b
result

b =

    0.2673    0.8729   -0.4082
    0.5345    0.2182    0.8165
    0.8018   -0.4364   -0.4082


result =

   -0.5494   -0.7071   -0.4451
   -0.6295   -0.0000    0.7770
   -0.5494    0.7071   -0.4451

Guess you like

Origin blog.csdn.net/weixin_47305073/article/details/126165315