[Linear Algebra/Machine Learning] Singular values of matrices and singular value decomposition (SVD)

I. Introduction

We know that for a n × nn\times nn×matrixAA of nA , RugoAAA havenn linearly independent feature vectors, thenAAA can be similarly diagonalized, that is, there is an invertible matrixPPLet A = P Λ P − 1 A=P\Lambda P^{-1}A=PΛP1 , whereΛ \LambdaΛ isAAA diagonal matrix composed of the eigenvalues ​​of A. PPThe column of P is actuallyAAEigenvector of A. PutAAA decomposes intoP Λ P − 1 P\Lambda P^{-1}PΛPThe process of − 1 is called eigendecomposition of the matrix. However, form × nm\times nm×Matrix of n , wherem ≠ nm\ne nm=n , there is nothing we can do. How should we decompose this matrix at this time? Here we introduce the concept of singular value decomposition (SVD).

2. Singular value

AAA is am × nm\times nm×n matrix. We are already familiar with eigenvalues, so our definition of singular values ​​is also derived from eigenvalues. What kind of matrix has eigenvalues? The answer is square array. ButAAA is not necessarily a square matrix, but we have a way to turn it into a square matrix -ATAA^TAAT Aisn × nn\times nn×n square matrix. Next we examineATAA^TAAEigenvalues ​​of T A.

Logic 1 ATAA^TAAEach eigenvalueλ \lambda of T Aλ are all greater than or equal to0 00

Proof : Suppose ATA x = λ x A^TA\boldsymbol{x}=\lambda\boldsymbol{x}ATAx=λ x , wherex \boldsymbol{x}x A T A A^T A AAn eigenvector of T A. Thenx TATA x = λ x T x ∥ A x ∥ 2 = λ ∥ x ∥ 2 \boldsymbol{x}^TA^TA\boldsymbol{x}=\lambda \boldsymbol{x}^T\boldsymbol{x}\ \ \|A\boldsymbol{x}\|^2=\lambda\|\boldsymbol{x}\|^2xT ATAx=λxTxAx2=λx2Note that∥ A x ∥ 2 \|A\boldsymbol{x}\|^2Ax2 and∥ x ∥ 2 \|\boldsymbol{x}\|^2x2 are both non-negative numbers, soλ ≥ 0 \lambda\ge 0l0。∎

Now let's define singular values.

Enclosure 2 1 , λ 2 , ⋯ , λ n \lambda_1,\lambda_2,\cdots,\lambda_nl1,l2,,ln A T A A^T A AT A的奇异值,使用λ 1 ≥ λ 2 ≥ ⋯ ≥ λ n ≥ 0 \lambda_1\ge\lambda_2\ge\cdots\ge\lambda_n\ge 0l1l2ln0 ifσ i = λ i \sigma_i=\sqrt{\lambda_i}pi=li ,则 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ n ≥ 0 \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_n\ge 0 p1p2pn0 __picalled AAThe singular value of A.

Mission 3 AAThe number of non-zero singular values ​​of A is equal to AAA 's rank.

Proof : Immediate confirmation r ( ATA ) = r ( A ) r(A^TA)=r(A)r(AT A)=r ( A ) . Consider the system of homogeneous linear equationsATA x = 0 A^TA\boldsymbol{x}=\boldsymbol{0}ATAx=0 , letξ \boldsymbol{\xi}ξ is one of its solutions, that is,ATA ξ = 0 A^TA\boldsymbol{\xi}=\boldsymbol{0}ATAξ=0,则 ξ T A T A ξ = 0 \boldsymbol{\xi}^T A^T A\boldsymbol{\xi}=\boldsymbol{0} XT ATAξ=0,ie∥A ξ ∥ 2 = 0 \|A\boldsymbol{\xi}\|^2=0Aξ2=0,故A ξ = 0 A\boldsymbol{\xi}=0Aξ=0 . This shows thatATA x = 0 A^TA\boldsymbol{x}=0ATAx=The solution to 0 is also A x = 0 A\boldsymbol{x}=0Ax=0 solution. At the same timeA x = 0 A\boldsymbol{x}=0Ax=The solution to 0 is obviouslyATA x = 0 A^TA\boldsymbol{x}=0ATAx=0 solution, so the two linear equations have the same solution, which means thatr (ATA) = r (A) r(A^TA)=r(A)r(AT A)=r(A)。∎

Many times we will encounter such a problem: ∥ x ∥ \|\boldsymbol{x}\|x and∥ A x ∥ \|A\boldsymbol{x}\|What is the relationship between the size ofA x ∥ ? Put matrixAAA is regarded as a linear transformation, which acts onx \boldsymbol{x}The length of x can be changed, so how many times can the length change at most? With singular values, we can easily solve this problem.

Proposition 4 Let AAA is am × nm\times nm×n matrix,x \boldsymbol{x}x is ann × 1 n\times 1n×1 vector. Then∥ A x ∥ ≤ σ 1 ∥ x ∥ \|A\boldsymbol{x}\|\le\sigma_1\|\boldsymbol{x}\|Axp1x , whereσ 1 \sigma_1p1Yes AAThe largest singular value of A , and the equality condition isx \boldsymbol{x}x A T A A^T A AT Acorresponds to the eigenvalueσ 1 2 \sigma_1^2p12eigenvector.

Proof : Note ATAA^TAAT Ais a real symmetric matrix, so it has unit orthogonal eigenvector groupsv 1 , v 2 , ⋯ , vn \boldsymbol{v}_1,\boldsymbol{v}_2,\cdots,\boldsymbol{v}_nv1,v2,,vn。若 x ∈ R n \boldsymbol{x}\in\mathbb{R}^n xRn , you can putx \boldsymbol{x}x is expressedasx=c1v1+c2v2++cnvnAmong them c 1 , c 2 , ⋯ , cn c_1,c_2,\cdots,c_nc1,c2,,cnis a scalar, satisfying c 1 2 + c 2 2 + ⋯ + cn 2 = ∥ x ∥ 2 c_1^2+c_2^2+\cdots+c_n^2=\|\boldsymbol{x}\|^2c12+c22++cn2=x2 . Examine again∥ A x ∥ 2 \|A\boldsymbol{x}\|^2Ax2 ∥ A x ∥ 2 = x T A T A x = ⟨ x , A T A x ⟩ = ⟨ ∑ i = 1 n c i v i , ∑ i = 1 n c i A T A v i ⟩ \|A\boldsymbol{x}\|^2=\boldsymbol{x}^T A^T A\boldsymbol{x}=\langle\boldsymbol{x},A^T A \boldsymbol{x}\rangle=\left\langle\sum\limits_{i=1}^n c_i\boldsymbol{v}_i,\sum\limits_{i=1}^n c_i A^T A \boldsymbol{v}_i\right\rangle Ax2=xT ATAx=x,ATAx=i=1ncivi,i=1nciATAvi notevi \boldsymbol{v}_ivi A T A A^T A AT Acorresponds to the eigenvalueσ i 2 \sigma_i^2pi2special direction, so ATA vi = σ i 2 vi A^TA\boldsymbol{v}_i=\sigma_i^2\boldsymbol{v}_iATAvi=pi2vi。因此 ∥ A x ∥ 2 = ⟨ ∑ i = 1 n c i v i , ∑ i = 1 n c i σ i 2 v i ⟩ = ∑ i = 1 n c i 2 σ i 2 ≤ ∑ i = 1 n c i 2 σ 1 2 = σ 1 2 ∥ x ∥ 2 \|A\boldsymbol{x}\|^2=\left\langle\sum\limits_{i=1}^n c_i\boldsymbol{v}_i,\sum\limits_{i=1}^n c_i \sigma_i^2 \boldsymbol{v}_i\right\rangle= \sum\limits_{i=1}^n c_i^2\sigma_i^2\le\sum\limits_{i=1}^n c_i^2\sigma_1^2=\sigma_1^2\|\boldsymbol{x}\|^2 Ax2=i=1ncivi,i=1ncipi2vi=i=1nci2pi2i=1nci2p12=p12xThe equal condition for 2 is c 1 2 = ∥ x ∥ 2 c_1^2=\|\boldsymbol{x}\|^2c12=x2 c 2 = c 3 = ⋯ = c n = 0 c_2=c_3=\cdots=c_n=0 c2=c3==cn=0 , at this timex = c 1 v \boldsymbol{x}=c_1\boldsymbol{v}x=c1v , sox \boldsymbol{x}x A T A A^T A AT Acorresponds to the eigenvalueσ 1 2 \sigma_1^2p12eigenvector. Certification completed.

If x ⊥ v 1 \boldsymbol{x}\perp\boldsymbol{v}_1xv1, that is, c 1 = 0 c_1=0c1=0 , then it can be proved similarlythat ∥ A x ∥ ≤ σ 2 ∥ x ∥ 2 \|A\boldsymbol{x}\|\le\sigma_2\|\boldsymbol{x}\|^2Axp2x2 ; ifx ⊥ v 1 \boldsymbol{x}\perp\boldsymbol{v}_1xv1And x ⊥ v 2 \boldsymbol{x}\perp\boldsymbol{v}_2xv2, that is, c 1 = c 2 = 0 c_1=c_2=0c1=c2=0,则 ∥ A x ∥ ≤ σ 3 ∥ x ∥ 2 \|A\boldsymbol{x}\|\le\sigma_3\|\boldsymbol{x}\|^2 Axp3x2 ; and so on.

3. Definition of singular value decomposition

Singular values ​​have been introduced above. Next, we will introduce how to use singular values ​​to decompose matrices.

AAA is am × nm\times nm×n矩阵, σ 1 ≥ σ 2 ≥ ⋯ ≥ σ n ≥ 0 \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_n\ge 0 p1p2pn0 is its singular value. LetrrrAAThe rank of A , which isAAThe number of non-zero singular values ​​of A.

Definition 5 AAA singular value decomposition of A is a decomposition of the form: A = U Σ VTA=U\Sigma V^TA=UΣVT among them

  • U U U is am × mm\times mm×m orthogonal matrix;
  • VVV is ann × nn\times nn×northogonal matrix;
  • Σ \Sigma Σ is am × nm\times nm×n matrix, which is very similar to a diagonal matrix: itsiiThe i diagonal elements areσ i \sigma_ipi, for i = 1, 2, ⋯, ri=1,2,\cdots,ri=1,2,,r Σ \Sigma The other elements of Σ are all 0 00

For example, when AAWhen A is a symmetric square matrix, its singular values ​​are actually the absolute values ​​of the eigenvalues.

4. How to perform singular value decomposition

引理6
(1) ∥ A v i ∥ = σ i \|A\boldsymbol{v}_i\|=\sigma_i Avi=pi
(2) 若i ≠ ji\ne ji=j,则 A v i A\boldsymbol{v}_i A viwith A vj A\boldsymbol{v}_jA vjorthogonal.

proof⟨ A vi , A vj ⟩ = vi TATA vj = vi T σ j 2 vj = σ j 2 ⟨ vi , vj ⟩ \langle A\boldsymbol{v}_i,A\boldsymbol{v}_j\rangle=\ boldsymbol{v}_i^TA^TA\boldsymbol{v}_j=\boldsymbol{v}_i^T\sigma_j^2\boldsymbol{v}_j=\sigma_j^2\langle\boldsymbol{v}_i,\boldsymbol {v}_j\rangleA vi,A vj=viTATAvj=viTpj2vj=pj2vi,vj

  • Waka i = ji=ji=j,由∥ vi ∥ = 1 \|\boldsymbol{v}_i\|=1vi=1 ∥ A v i ∥ 2 = σ i 2 \|A\boldsymbol{v}_i\|^2=\sigma_i^2 Avi2=pi2
  • i ≠ ji\ne ji=j,由vi ⊥ vj \boldsymbol{v}_i\perp \boldsymbol{v}_jvivj A v i ⊥ A v j A\boldsymbol{v}_i\perp A\boldsymbol{v}_j A viA vj

Theorem 7 Let AAA is am × nm\times nm×n matrix. Then we can construct anAASingular value decomposition of A = U Σ VTA=U\Sigma V^TA=UΣVT , where:

  • VVV A T A A^T A AThe unitorthogonal eigenvector groupv 1 , v 2 , ⋯ , vn of T A \boldsymbol{v}_1,\boldsymbol{v}_2,\cdots,\boldsymbol{v}_nv1,v2,,vn,满足 A T A v i = σ i 2 v i A^T A\boldsymbol{v}_i=\sigma_i^2 \boldsymbol{v}_i ATAvi=pi2vi
  • i ≤ ri\le rir (at this timeσ i ≠ 0 \sigma_i\ne 0pi=0 ),则UUU partiii列是 1 σ 1 A v i \frac{1}{\sigma_1}A\boldsymbol{v}_i p11A vi. According to Lemma 6, these columns are unit orthogonal, and other columns can be arbitrarily expanded by R m \mathbb{R}^mRThe unit orthonormal basis of m is obtained.

Proof : We only need to prove that if UUUwaVV __V is defined as above, thenA = U Σ VTA=U\Sigma V^TA=UΣVT. _ We cannot directly prove thatA = U Σ VTA=U\Sigma V^TA=UΣVT , but we can prove that∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^nxRn U Σ V T x = A x U\Sigma V^T\boldsymbol{x}=A\boldsymbol{x} UΣVTx=Ax . _ This can show thatA = U Σ VTA=U\Sigma V^TA=UΣVT (because if∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^nxRn hasA x = B x A\boldsymbol{x}=B\boldsymbol{x}Ax=Bx,则 ∀ x ∈ R n \forall\boldsymbol{x}\in\mathbb{R}^n xRn has( A − B ) x = 0 (AB)\boldsymbol{x}=0(AB)x=0 , that is, the rank of the basic solution system of this linear equation system isnnn r ( A − B ) = 0 r(A-B)=0 r(AB)=0 A − B = O A-B=O AB=O A = B A=B A=B)。考虑 V T x = [ v 1 T v 2 T ⋮ v n T ] x = [ v 1 T x v 2 T x ⋮ v n T x ] V^T\boldsymbol{x}=\begin{bmatrix}\boldsymbol{v}_1^T\\\boldsymbol{v}_2^T\\\vdots\\\boldsymbol{v}_n^T\end{bmatrix}\boldsymbol{x}=\begin{bmatrix}\boldsymbol{v}_1^T\boldsymbol{x}\\\boldsymbol{v}_2^T\boldsymbol{x}\\\vdots\\\boldsymbol{v}_n^T\boldsymbol{x}\end{bmatrix} VTx= v1Tv2TvnT x= v1Txv2TxvnTx Σ V T x = [ σ 1 v 1 T x σ 2 v 2 T x ⋮ σ r v r T x 0 ⋮ 0 ] \Sigma V^T\boldsymbol{x}=\begin{bmatrix}\sigma_1\boldsymbol{v}_1^T\boldsymbol{x}\\\sigma_2\boldsymbol{v}_2^T\boldsymbol{x}\\\vdots\\\sigma_r\boldsymbol{v}_r^T\boldsymbol{x}\\0\\\vdots\\0\end{bmatrix} ΣVTx= p1v1Txp2v2TxprvrTx00 Left side UUU profit U Σ V T x = ( σ 1 v 1 T x ) 1 σ 1 A v 1 + ( σ 2 v 2 T x ) 1 σ 2 A v 2 + ⋯ + ( σ r v r T x ) 1 σ r A v r = A v 1 v 1 T x + A v 2 v 2 T x + ⋯ + A v r v r T x = A v 1 v 1 T x + A v 2 v 2 T x + ⋯ + A v r v r T x + ⋯ + A v n v n T x = A ( v 1 v 1 T + v 2 v 2 T + ⋯ + v n v n T ) x = A V T V x = A x \begin{aligned} U\Sigma V^T\boldsymbol{x}&=(\sigma_1\boldsymbol{v}_1^T\boldsymbol{x})\frac{1}{\sigma_1}A\boldsymbol{v}_1+(\sigma_2\boldsymbol{v}_2^T\boldsymbol{x})\frac{1}{\sigma_2}A\boldsymbol{v}_2+\cdots+(\sigma_r\boldsymbol{v}_r^T\boldsymbol{x})\frac{1}{\sigma_r}A\boldsymbol{v}_r\\ &=A\boldsymbol{v}_1\boldsymbol{v}_1^T\boldsymbol{x}+A\boldsymbol{v}_2\boldsymbol{v}_2^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_r\boldsymbol{v}_r^T\boldsymbol{x}\\ &=A\boldsymbol{v}_1\boldsymbol{v}_1^T\boldsymbol{x}+A\boldsymbol{v}_2\boldsymbol{v}_2^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_r\boldsymbol{v}_r^T\boldsymbol{x}+\cdots+A\boldsymbol{v}_n\boldsymbol{v}_n^T\boldsymbol{x}\\ &=A(\boldsymbol{v}_1\boldsymbol{v}_1^T+\boldsymbol{v}_2\boldsymbol{v}_2^T+\cdots+\boldsymbol{v}_n\boldsymbol{v}_n^T)\boldsymbol{x}\\ &=AV^T V\boldsymbol{x}\\ &=A\boldsymbol{x} \end{aligned} UΣVTx=( p1v1Tx)p11A v1+( p2v2Tx)p21A v2++( prvrTx)pr1A vr=A v1v1Tx+A v2v2Tx++A vrvrTx=A v1v1Tx+A v2v2Tx++A vrvrTx++A vnvnTx=A(v1v1T+v2v2T++vnvnT)x=A VT Vx=AxNote that when i > r i>r is used herei>r A v i = σ i = 0 A\boldsymbol{v}_i=\sigma_i=0 A vi=pi=0 . This proves thatA = U Σ VTA=U\Sigma V^TA=UΣVT。∎

The singular value decomposition of matrices is widely used in machine learning, such as playing an important role in Principal Component Analysis (PCA).

References

  1. https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix
  2. https://math.berkeley.edu/~hutching/teach/54-2017/svd-notes.pdf

Guess you like

Origin blog.csdn.net/qaqwqaqwq/article/details/132620280