Article directory
I. Introduction
We know that for a n × nn\times nn×matrixAA of nA , RugoAAA havenn linearly independent feature vectors, thenAAA can be similarly diagonalized, that is, there is an invertible matrixPPLet A = P Λ P − 1 A=P\Lambda P^{-1}A=PΛP− 1 , whereΛ \LambdaΛ isAAA diagonal matrix composed of the eigenvalues of A. PPThe column of P is actuallyAAEigenvector of A. PutAAA decomposes intoP Λ P − 1 P\Lambda P^{-1}PΛPThe process of − 1 is called eigendecomposition of the matrix. However, form × nm\times nm×Matrix of n , wherem ≠ nm\ne nm=n , there is nothing we can do. How should we decompose this matrix at this time? Here we introduce the concept of singular value decomposition (SVD).
2. Singular value
设AAA is am × nm\times nm×n matrix. We are already familiar with eigenvalues, so our definition of singular values is also derived from eigenvalues. What kind of matrix has eigenvalues? The answer is square array. ButAAA is not necessarily a square matrix, but we have a way to turn it into a square matrix -ATAA^TAAT Aisn × nn\times nn×n square matrix. Next we examineATAA^TAAEigenvalues of T A.
Logic 1 ATAA^TAAEach eigenvalueλ \lambda of T Aλ are all greater than or equal to0 00。
Proof : Suppose ATA x = λ x A^TA\boldsymbol{x}=\lambda\boldsymbol{x}ATAx=λ x , wherex \boldsymbol{x}x是 A T A A^T A AAn eigenvector of T A. Thenx TATA x = λ x T x ∥ A x ∥ 2 = λ ∥ x ∥ 2 \boldsymbol{x}^TA^TA\boldsymbol{x}=\lambda \boldsymbol{x}^T\boldsymbol{x}\ \ \|A\boldsymbol{x}\|^2=\lambda\|\boldsymbol{x}\|^2xT ATAx=λxTx∥Ax∥2=λ∥x∥2Note that∥ A x ∥ 2 \|A\boldsymbol{x}\|^2∥Ax∥2 and∥ x ∥ 2 \|\boldsymbol{x}\|^2∥x∥2 are both non-negative numbers, soλ ≥ 0 \lambda\ge 0l≥0。∎
Now let's define singular values.
Enclosure 2 1 , λ 2 , ⋯ , λ n \lambda_1,\lambda_2,\cdots,\lambda_nl1,l2,⋯,ln是 A T A A^T A AT A的奇异值,使用λ 1 ≥ λ 2 ≥ ⋯ ≥ λ n ≥ 0 \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n\ge 0l1≥l2≥⋯≥ln≥0 ifσ i = λ i \sigma_i=\sqrt{\lambda_i}pi=li,则 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ n ≥ 0 \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_n\ge 0 p1≥p2≥⋯≥pn≥0 __picalled AAThe singular value of A.
Mission 3 AAThe number of non-zero singular values of A is equal to AAA 's rank.
Proof : Immediate confirmation r ( ATA ) = r ( A ) r(A^TA)=r(A)r(AT A)=r ( A ) . Consider the system of homogeneous linear equationsATA x = 0 A^TA\boldsymbol{x}=\boldsymbol{0}ATAx=0 , letξ \boldsymbol{\xi}ξ is one of its solutions, that is,ATA ξ = 0 A^TA\boldsymbol{\xi}=\boldsymbol{0}ATAξ=0,则 ξ T A T A ξ = 0 \boldsymbol{\xi}^T A^T A\boldsymbol{\xi}=\boldsymbol{0} XT ATAξ=0,ie∥A ξ ∥ 2 = 0 \|A\boldsymbol{\xi}\|^2=0∥Aξ∥2=0,故A ξ = 0 A\boldsymbol{\xi}=0Aξ=0 . This shows thatATA x = 0 A^TA\boldsymbol{x}=0ATAx=The solution to 0 is also A x = 0 A\boldsymbol{x}=0Ax=0 solution. At the same timeA x = 0 A\boldsymbol{x}=0Ax=The solution to 0 is obviouslyATA x = 0 A^TA\boldsymbol{x}=0ATAx=0 solution, so the two linear equations have the same solution, which means thatr (ATA) = r (A) r(A^TA)=r(A)r(AT A)=r(A)。∎
Many times we will encounter such a problem: ∥ x ∥ \|\boldsymbol{x}\|∥ x ∥ and∥ A x ∥ \|A\boldsymbol{x}\|What is the relationship between the size of ∥ A x ∥ ? Put matrixAAA is regarded as a linear transformation, which acts onx \boldsymbol{x}The length of x can be changed, so how many times can the length change at most? With singular values, we can easily solve this problem.
Proposition 4 Let AAA is am × nm\times nm×n matrix,x \boldsymbol{x}x is ann × 1 n\times 1n×1 vector. Then∥ A x ∥ ≤ σ 1 ∥ x ∥ \|A\boldsymbol{x}\|\le\sigma_1\|\boldsymbol{x}\|∥Ax∥≤p1∥ x ∥ , whereσ 1 \sigma_1p1Yes AAThe largest singular value of A , and the equality condition isx \boldsymbol{x}x是 A T A A^T A AT Acorresponds to the eigenvalueσ 1 2 \sigma_1^2p12eigenvector.
Proof : Note ATAA^TAAT Ais a real symmetric matrix, so it has unit orthogonal eigenvector groupsv 1 , v 2 , ⋯ , vn \boldsymbol{v}_1,\boldsymbol{v}_2,\cdots,\boldsymbol{v}_nv1,v2,⋯,vn。若 x ∈ R n \boldsymbol{x}\in\mathbb{R}^n x∈Rn , you can putx \boldsymbol{x}x is expressedasx=c1v1+c2v2+⋯+cnvnAmong them c 1 , c 2 , ⋯ , cn c_1,c_2,\cdots,c_nc1,c2,⋯,cnis a scalar, satisfying c 1 2 + c 2 2 + ⋯ + cn 2 = ∥ x ∥ 2 c_1^2+c_2^2+\cdots+c_n^2=\|\boldsymbol{x}\|^2c12+c22+⋯+cn2=∥x∥2 . Examine again∥ A x ∥ 2 \|A\boldsymbol{x}\|^2∥Ax∥2: ∥ A x ∥ 2 = x T A T A x = ⟨ x , A T A x ⟩ = ⟨ ∑ i = 1 n c i v i , ∑ i = 1 n c i A T A v i ⟩ \|A\boldsymbol{x}\|^2=\boldsymbol{x}^T A^T A\boldsymbol{x}=\langle\boldsymbol{x},A^T A \boldsymbol{x}\rangle=\left\langle\sum\limits_{i=1}^n c_i\boldsymbol{v}_i,\sum\limits_{i=1}^n c_i A^T A \boldsymbol{v}_i\right\rangle ∥Ax∥2=xT ATAx=⟨x,ATAx⟩=⟨i=1∑ncivi,i=1∑nciATAvi⟩ notevi \boldsymbol{v}_ivi是 A T A A^T A AT Acorresponds to the eigenvalueσ i 2 \sigma_i^2pi2special direction, so ATA vi = σ i 2 vi A^TA\boldsymbol{v}_i=\sigma_i^2\boldsymbol{v}_iATAvi=pi2vi。因此 ∥ A x ∥ 2 = ⟨ ∑ i = 1 n c i v i , ∑ i = 1 n c i σ i 2 v i ⟩ = ∑ i = 1 n c i 2 σ i 2 ≤ ∑ i = 1 n c i 2 σ 1 2 = σ 1 2 ∥ x ∥ 2 \|A\boldsymbol{x}\|^2=\left\langle\sum\limits_{i=1}^n c_i\boldsymbol{v}_i,\sum\limits_{i=1}^n c_i \sigma_i^2 \boldsymbol{v}_i\right\rangle= \sum\limits_{i=1}^n c_i^2\sigma_i^2\le\sum\limits_{i=1}^n c_i^2\sigma_1^2=\sigma_1^2\|\boldsymbol{x}\|^2 ∥Ax∥2=⟨i=1∑ncivi,i=1∑ncipi2vi⟩=i=1∑nci2pi2≤i=1∑nci2p12=p12∥x∥The equal condition for 2 is c 1 2 = ∥ x ∥ 2 c_1^2=\|\boldsymbol{x}\|^2c12=∥x∥2且 c 2 = c 3 = ⋯ = c n = 0 c_2=c_3=\cdots=c_n=0 c2=c3=⋯=cn=0 , at this timex = c 1 v \boldsymbol{x}=c_1\boldsymbol{v}x=c1v , sox \boldsymbol{x}x是 A T A A^T A AT Acorresponds to the eigenvalueσ 1 2 \sigma_1^2p12eigenvector. Certification completed. ∎
If x ⊥ v 1 \boldsymbol{x}\perp\boldsymbol{v}_1x⊥v1, that is, c 1 = 0 c_1=0c1=0 , then it can be proved similarlythat ∥ A x ∥ ≤ σ 2 ∥ x ∥ 2 \|A\boldsymbol{x}\|\le\sigma_2\|\boldsymbol{x}\|^2∥Ax∥≤p2∥x∥2 ; ifx ⊥ v 1 \boldsymbol{x}\perp\boldsymbol{v}_1x⊥v1And x ⊥ v 2 \boldsymbol{x}\perp\boldsymbol{v}_2x⊥v2, that is, c 1 = c 2 = 0 c_1=c_2=0c1=c2=0,则 ∥ A x ∥ ≤ σ 3 ∥ x ∥ 2 \|A\boldsymbol{x}\|\le\sigma_3\|\boldsymbol{x}\|^2 ∥Ax∥≤p3∥x∥2 ; and so on.
3. Definition of singular value decomposition
Singular values have been introduced above. Next, we will introduce how to use singular values to decompose matrices.
设AAA is am × nm\times nm×n矩阵, σ 1 ≥ σ 2 ≥ ⋯ ≥ σ n ≥ 0 \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_n\ge 0 p1≥p2≥⋯≥pn≥0 is its singular value. Letrrr为AAThe rank of A , which isAAThe number of non-zero singular values of A.
Definition 5 AAA singular value decomposition of A is a decomposition of the form: A = U Σ VTA=U\Sigma V^TA=UΣVT among them
- U U U is am × mm\times mm×m orthogonal matrix;
- VVV is ann × nn\times nn×northogonal matrix;
- Σ \Sigma Σ is am × nm\times nm×n matrix, which is very similar to a diagonal matrix: itsiiThe i diagonal elements areσ i \sigma_ipi, for i = 1, 2, ⋯, ri=1,2,\cdots,ri=1,2,⋯,r。 Σ \Sigma The other elements of Σ are all 0 00。
For example, when AAWhen A is a symmetric square matrix, its singular values are actually the absolute values of the eigenvalues.
4. How to perform singular value decomposition
引理6
(1) ∥ A v i ∥ = σ i \|A\boldsymbol{v}_i\|=\sigma_i ∥Avi∥=pi;
(2) 若i ≠ ji\ne ji=j,则 A v i A\boldsymbol{v}_i A viwith A vj A\boldsymbol{v}_jA vjorthogonal.
proof:⟨ A vi , A vj ⟩ = vi TATA vj = vi T σ j 2 vj = σ j 2 ⟨ vi , vj ⟩ \langle A\boldsymbol{v}_i,A\boldsymbol{v}_j\rangle=\ boldsymbol{v}_i^TA^TA\boldsymbol{v}_j=\boldsymbol{v}_i^T\sigma_j^2\boldsymbol{v}_j=\sigma_j^2\langle\boldsymbol{v}_i,\boldsymbol {v}_j\rangle⟨ A vi,A vj⟩=viTATAvj=viTpj2vj=pj2⟨vi,vj⟩。
- Waka i = ji=ji=j,由∥ vi ∥ = 1 \|\boldsymbol{v}_i\|=1∥vi∥=1知 ∥ A v i ∥ 2 = σ i 2 \|A\boldsymbol{v}_i\|^2=\sigma_i^2 ∥Avi∥2=pi2;
- 若i ≠ ji\ne ji=j,由vi ⊥ vj \boldsymbol{v}_i\perp \boldsymbol{v}_jvi⊥vj知 A v i ⊥ A v j A\boldsymbol{v}_i\perp A\boldsymbol{v}_j A vi⊥A vj。
Theorem 7 Let AAA is am × nm\times nm×n matrix. Then we can construct anAASingular value decomposition of A = U Σ VTA=U\Sigma V^TA=UΣVT , where:
- VVV是 A T A A^T A AThe unitorthogonal eigenvector groupv 1 , v 2 , ⋯ , vn of T A \boldsymbol{v}_1,\boldsymbol{v}_2,\cdots,\boldsymbol{v}_nv1,v2,⋯,vn,满足 A T A v i = σ i 2 v i A^T A\boldsymbol{v}_i=\sigma_i^2 \boldsymbol{v}_i ATAvi=pi2vi;
- 若i ≤ ri\le ri≤r (at this timeσ i ≠ 0 \sigma_i\ne 0pi=0 ),则UUU partiii列是 1 σ 1 A v i \frac{1}{\sigma_1}A\boldsymbol{v}_i p11A vi. According to Lemma 6, these columns are unit orthogonal, and other columns can be arbitrarily expanded by R m \mathbb{R}^mRThe unit orthonormal basis of m is obtained.
Proof : We only need to prove that if UUUwaVV __V is defined as above, thenA = U Σ VTA=U\Sigma V^TA=UΣVT. _ We cannot directly prove thatA = U Σ VTA=U\Sigma V^TA=UΣVT , but we can prove that∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^n∀x∈Rn, U Σ V T x = A x U\Sigma V^T\boldsymbol{x}=A\boldsymbol{x} UΣVTx=Ax . _ This can show thatA = U Σ VTA=U\Sigma V^TA=UΣVT (because if∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^n∀x∈Rn hasA x = B x A\boldsymbol{x}=B\boldsymbol{x}Ax=Bx,则 ∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^n ∀x∈Rn has( A − B ) x = 0 (AB)\boldsymbol{x}=0(A−B)x=0 , that is, the rank of the basic solution system of this linear equation system isnnn, r ( A − B ) = 0 r(A-B)=0 r(A−B)=0, A − B = O A-B=O A−B=O, A = B A=B A=B)。考虑 V T x = [ v 1 T v 2 T ⋮ v n T ] x = [ v 1 T x v 2 T x ⋮ v n T x ] V^T\boldsymbol{x}=\begin{bmatrix}\boldsymbol{v}_1^T\\\boldsymbol{v}_2^T\\\vdots\\\boldsymbol{v}_n^T\end{bmatrix}\boldsymbol{x}=\begin{bmatrix}\boldsymbol{v}_1^T\boldsymbol{x}\\\boldsymbol{v}_2^T\boldsymbol{x}\\\vdots\\\boldsymbol{v}_n^T\boldsymbol{x}\end{bmatrix} VTx= v1Tv2T⋮vnT x= v1Txv2Tx⋮vnTx 则 Σ V T x = [ σ 1 v 1 T x σ 2 v 2 T x ⋮ σ r v r T x 0 ⋮ 0 ] \Sigma V^T\boldsymbol{x}=\begin{bmatrix}\sigma_1\boldsymbol{v}_1^T\boldsymbol{x}\\\sigma_2\boldsymbol{v}_2^T\boldsymbol{x}\\\vdots\\\sigma_r\boldsymbol{v}_r^T\boldsymbol{x}\\0\\\vdots\\0\end{bmatrix} ΣVTx= p1v1Txp2v2Tx⋮prvrTx0⋮0 Left side UUU profit U Σ V T x = ( σ 1 v 1 T x ) 1 σ 1 A v 1 + ( σ 2 v 2 T x ) 1 σ 2 A v 2 + ⋯ + ( σ r v r T x ) 1 σ r A v r = A v 1 v 1 T x + A v 2 v 2 T x + ⋯ + A v r v r T x = A v 1 v 1 T x + A v 2 v 2 T x + ⋯ + A v r v r T x + ⋯ + A v n v n T x = A ( v 1 v 1 T + v 2 v 2 T + ⋯ + v n v n T ) x = A V T V x = A x \begin{aligned} U\Sigma V^T\boldsymbol{x}&=(\sigma_1\boldsymbol{v}_1^T\boldsymbol{x})\frac{1}{\sigma_1}A\boldsymbol{v}_1+(\sigma_2\boldsymbol{v}_2^T\boldsymbol{x})\frac{1}{\sigma_2}A\boldsymbol{v}_2+\cdots+(\sigma_r\boldsymbol{v}_r^T\boldsymbol{x})\frac{1}{\sigma_r}A\boldsymbol{v}_r\\ &=A\boldsymbol{v}_1\boldsymbol{v}_1^T\boldsymbol{x}+A\boldsymbol{v}_2\boldsymbol{v}_2^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_r\boldsymbol{v}_r^T\boldsymbol{x}\\ &=A\boldsymbol{v}_1\boldsymbol{v}_1^T\boldsymbol{x}+A\boldsymbol{v}_2\boldsymbol{v}_2^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_r\boldsymbol{v}_r^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_n\boldsymbol{v}_n^T\boldsymbol{x}\\ &=A(\boldsymbol{v}_1\boldsymbol{v}_1^T+\boldsymbol{v}_2\boldsymbol{v}_2^T+\cdots+\boldsymbol{v}_n\boldsymbol{v}_n^T)\boldsymbol{x}\\ &=AV^T V\boldsymbol{x}\\ &=A\boldsymbol{x} \end{aligned} UΣVTx=( p1v1Tx)p11A v1+( p2v2Tx)p21A v2+⋯+( prvrTx)pr1A vr=A v1v1Tx+A v2v2Tx+⋯+A vrvrTx=A v1v1Tx+A v2v2Tx+⋯+A vrvrTx+⋯+A vnvnTx=A(v1v1T+v2v2T+⋯+vnvnT)x=A VT Vx=AxNote that when i > r i>r is used herei>r时 A v i = σ i = 0 A\boldsymbol{v}_i=\sigma_i=0 A vi=pi=0 . This proves thatA = U Σ VTA=U\Sigma V^TA=UΣVT。∎
The singular value decomposition of matrices is widely used in machine learning, such as playing an important role in Principal Component Analysis (PCA).