[Linear Algebra 08] Singular Value Decomposition SVD and Its Application

  This article revolves around SVD (Singular Value Decomposition), first showing its basic process, and then explaining how it is used to find violations in general and the internal mechanism of PCA dimensionality reduction. Of course, these are just two of the SVD applications. a typical. In addition, the relevant content of the MIT18.06 course has come to an end here, and I will catch up with the next 18.065 when I have time.


SVD

First look at the definition of SVD
A = U Σ V ∗ A = U\Sigma V^*A=UΣV
That is to say,the singular value decomposition decomposes a matrix into 3 parts, and completes the approximate diagonalization of A just like the eigenvalue decomposition. As a reminder, a unitary matrix is ​​a generalization of an orthonormal matrix over the complex domain: for a real matrix, its inverse is equal to its transpose, and for a complex matrix, its inverse is equal to its conjugate transpose.


geometric transformation

From the perspective of geometric transformation, it is a combination of two rotations and one stretch:
insert image description here
the corresponding explanation is as follows

The SVD separates a matrix into three steps: (orthogonal) x (diagonal) x (orthogonal). Ordinary words can express the geometry behind it: (rotation) x (stretching) x (rotation)

When using matrix multiplication to map to spatial transformation, the right-to-left rule should be followed . Therefore, for the space transformation, the singular value decomposition is to perform a rotation first, then a stretch, and finally a rotation, which is the process shown in the figure.


solve

How to find these three parts? Let's have a look at the outline

The singular value theorem for A A A is the eigenvalue theorem for A T A A^T A ATA and A A T AA^T AAT.

The so-called AAThe singular value theorem of A isATAA^TAAT AAAT AA^TAAThe eigenvalue theorem of T. GivenUUU (order m) andV ∗ V^*V (order n) are all unitary matrices, so you can think ofATAA^TAAT AAAT AA^TAAT(对称矩阵)其实将消去一个酉矩阵的影响,即
A A T = U Σ T V ∗ V Σ U ∗ = U Σ T Σ U ∗ A T A = V Σ U ∗ U Σ T V ∗ = V Σ Σ T V ∗ AA^T=U\Sigma^T V^* V\Sigma U^*=U\Sigma^T\Sigma U^* \\ A^TA=V\Sigma U^*U\Sigma^T V^* =V\Sigma\Sigma^T V^* AAT=The STVVΣU=The ST U_AT A=VΣUUΣTV=V S STV
Visible,AAT AA^TAAT available unitary matrixUUU is diagonalized,ATAA^TAAT Aavailable unitary matrixVVV is diagonalized, andΣ T Σ \Sigma^T\SigmaSTΣ Σ Σ T \Sigma \Sigma^T S SThe diagonal elements of T are all σ i 2 \sigma_i^2pi2, and the rest are 0. So this problem further becomes to ATAA^TAAT AAAT AA^TAAT for the problem of eigenvalue decomposition. Using the determinants, their corresponding eigenvalues ​​can be obtained.


For example, solve such an example
A = [ 1 2 3 4 5 6 ] A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}A=[142536]
calculationATAA^TAAT A
ATA = [ 1 4 2 5 3 6 ] [ 1 2 3 4 5 6 ] = [ 17 22 27 22 29 36 27 36 45 ] A^TA= \begin{bmatrix} 1 & 4 \\ 2 & 5 \ . \ 3 & 6 \ end{bmatrix} \begin {bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}= \begin{bmatrix} 17 & 22 & 27 \\ 22 & 29 & 36\ \27&36&45\end{bmatrix}AT A= 123456 [142536]= 172227222936273645
λ 22 27 22 29 − λ 36 27 36 45 − λ ∣ ATA − λ E ∣ = ∣ ∣ ?
|A^TA-\lambda E|=\begin{vmatrix} 17-\lambda & 22 & 27 \\ 22 & 29-\lambda & 36 \\ 27 & 36 & 45-\lambda \end{vmatrix}\\ ?AT AλE= 17l22272229l36273645l   ?
Use after solving the eigenvalue( ATA − λ E ) (A^TA - \lambda E)(AT Aλ E ) can obtain the eigenvectors, and finally combine them to obtain the unitary matrix V.


verify

We can verify the above process in Matlab, whereThe command for singular value decomposition is svd

>> A = [1,2,3;4,5,6];
>> B = A'*A  % 求得酉矩阵V

B =

    17    22    27
    22    29    36
    27    36    45

>> [V,D]= eig(B)

V =

   -0.4082   -0.8060    0.4287
    0.8165   -0.1124    0.5663
   -0.4082    0.5812    0.7039


D =

   -0.0000         0         0
         0    0.5973         0
         0         0   90.4027
         
>> C = A * A'  % 求得酉矩阵U

C =

    14    32
    32    77

>> [U,D]= eig(C)

U =

   -0.9224    0.3863
    0.3863    0.9224


D =

    0.5973         0
         0   90.4027

>> [U,S,V]=svd(A)  % svd 验证

U =

   -0.3863   -0.9224
   -0.9224    0.3863

% 交换Σ中两个奇异值的次序,其对应的特征向量也要交换次序,特征向量的正负并不影响正交性

S =

    9.5080         0         0
         0    0.7729         0

% S中的奇异值是之前特征值的平方

V =

   -0.4287    0.8060    0.4082
   -0.5663    0.1124   -0.8165
   -0.7039   -0.5812    0.4082

to violate

The Pseudo inverse can be generally obtained through the singular value decomposition, which is what we have not shown in the transpose and inverse of the matrix in [Linear Algebra 01] . Now let's talk about it in detail.


space display

Let's first look at its space display:
insert image description here
why does the inverse of the matrix not exist? Two intuitive reasons, one is for m × nm \times nm×The n matrix lacks definition; the second isn × nn \times nn×The n- matrix fails to have full rank. In the space representation, the meanings of these two faces are consistent, that is,they are represented as space transformations of rank r in column space and row space. Specifically, that is

AA + AA^+AA+ = projection matrix onto the column space of A A A      A + A A^+A A+A = projection matrix onto the row space of A A A

This is exactly what the title in figure 7.6 clarifies: AAThe m-dimensional vector A x + Ax^+ in the column space where A is locatedAx+ m × 1 m \times 1 m×1 )( x + x^+ x+ is forAAA linear combination of bases in the column space of A , so A x + Ax^+Ax+ inAAIn the column space of A ), by multiplyingA + A^+A+ , will return toAAThe line space where A is located, that is,A + A x + = x + A^+Ax^+=x^+A+Ax+=x+ , wherex + x^{+}xThe size of + isn × 1 n \times 1n×1 . So we should theoretically have:

Trying for A + A = I A^+A = I A+A=I (After projection, it can look like a unit matrix as much as possible)

And this is actually the definition of the inverse, A + AA^+AA+ The function realized by Ais to project the vector toAAIn the row space where A is located.


solve

Considering the singular value decomposition of A is
A = U Σ VTA = U\Sigma V^TA=UΣVT
if we construct the violationA + A^+A+
A + = V Σ + U T A^+ = V\Sigma ^+ U^T A+=V S+UT
whereΣ + \Sigma ^+S+ forΣ\SigmaViolation of Σ : take Σ \Sigma firstThe transpose of Σ , because it is approximately a diagonal matrix, so the non-zero elements take the reciprocal of their conjugates, and the zero elements are still zero. That is,
Σ + = [ 1 / σ 1 1 / σ 2 ⋱ 0 ] n × m \Sigma^+ = \begin{bmatrix} 1/\sigma_1 & & & \\ & 1/\sigma_2 & & \\ & & \ ddots & \\ & & & 0 \end{bmatrix}_{n \times m}S+= 1/ p11/ p20 n×m
Let's look at AA + AA^+AA+ sumA + AA^+AA+A
A A + = U Σ V T V Σ + U T = U Σ Σ + U T = U [ I 0 ] m × m U T = U D m U T AA^+ = U\Sigma V^T V\Sigma ^+ U^T=U\Sigma\Sigma^+ U^T=U \begin{bmatrix}I & \\ & 0 \end{bmatrix} _{m \times m}U^T=U D_mU^T AA+=UΣVT VS+UT=U S S+UT=U[I0]m×mUT=UDmUT
A + A = V Σ + U T U Σ V T = V Σ + Σ V T = V [ I 0 ] n × n V T = V D n V T A^+A = V\Sigma ^+ U^T U\Sigma V^T= V\Sigma ^+ \Sigma V^T=V \begin{bmatrix}I & \\ & 0 \end{bmatrix} _{n \times n}V^T=VD_nV^T A+A=V S+UTUΣVT=V S+ΣVT=V[I0]n×nVT=VDnVThe diagonal matrixDD in the middle of T
D is the closest to the unit matrixIII , conceivably, ifDDD ShuzeIII , then the result of the corresponding matrix multiplication isIII. _ This is why the full rank matrix ( r = n ≤ mr = n \le mr=nm ) has a left inverse matrix (withD n = I D_n=IDn=I,对应 A + A = I A^+A = I A+A=I ); row full rank matrix (r = m ≤ nr = m \le nr=mn ) has a right inverse matrix (withD m = I D_m=IDm=I , corresponding toAA + = I AA^+ = IAA+=I ); and when the rows and columns are full-ranked, that is, when a full-ranked square matrix is ​​taken, the inverse exists (withD n = I D_n=IDn=I D m = I D_m=I Dm=I,对应 A + A = A A + = I A^+A = AA^+ = I A+A=AA+=I ); and when the ranks are not satisfied, the violation retains the maximum approximation to the identity matrix.


verify

Following this A example
A = [ 1 2 3 4 5 6 ] A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}A=[142536]
Its violation is
A + = V Σ + UTA^+ = V\Sigma ^+ U^TA+=V S+UT
Borrow the result of svd decomposition in the first part, and verify it with Matlab as follows (the symbols and order of the columns are slightly different):

>> pA = V*[1/9.5080,0;0,1/0.7729;0,0]*U'

pA =

    0.4444    0.9444
    0.1111    0.1111
   -0.2222   -0.7222

>> pinv(A)

ans =

   -0.9444    0.4444
   -0.1111    0.1111
    0.7222   -0.2222

PCA

This is probably the most frequently mentioned application because of its wide application in the field of machine learning. In Python's sckit-learn library, the internal mechanism of its PCA implementation algorithm is also implemented using SVD. Let's see the definition

PCA (principal components analysis), that is, principal component analysis technology, aims to use the idea of ​​dimensionality reduction to convert multiple indicators into a few comprehensive indicators .

Faced with a huge amount of data, dimensionality reduction is necessary. Like the general principle of data compression, it retains the most effective part of the original data for us . The key to these problems is to select a set of suitable bases to describe the data, and to use the most convenient coefficients and methods to express the maximum effective components of these data .


Let's first look at a set of data with the mean center (need to be established) at the origin for an intuitive feeling:

% 二维数据降为一维数据

x1 = [3,-4,7,1,-4,-3]; %特征1
x2 = [7,-6,8,-1,-1,-7]; %特征2
scatter(x1,x2); % 数据点
hold on;
plot(mean(x1),mean(x2),'rd'); % 均值中心
A = [x1;x2]; % m行(特征),n列(样本点)

[U,S,V] = svd(A); % svd分解
hold on;syms x;
fplot(x*U(2,1)/U(1,1)); 
hold on;
fplot(x*U(2,2)/U(1,2)); 
a = [U(1,1);U(2,1)]; % 方向1有更大的奇异值
P = a*a'/(a'*a); % 投影矩阵
B = P*A;
hold on;
scatter(B(1,:),B(2,:),'+');%投影点

insert image description here
Give an explanation:

1.Data often comes in a matrix: n samples and m measurements per sample.
2.Center each row of the matrix A by subtracting the mean from each measurement.
3.The SVD finds combintions of the data that contain the most information.
4.Largest singular value σ 1 \sigma_1 p1 ↔ \leftrightarrow greatest variance ↔ \leftrightarrow most information in u 1 ⃗ \vec{u_1} u1

That is to say, the purpose of PCA is to calculate the covariance matrix XXT XX^TXXT retains the direction with the largest sample variance(equivalent to the largest difference in characteristics between samples at this time, thereby retaining the main characteristics).


More specifically, through SVD decomposition, we sort the singular values ​​from large to small, and select the largest kkk , and then its correspondingkkThe k eigenvectors are used as column vectors to form a new eigenvector matrix, so thatthe nnConvert n- dimensional data tokkkk constructed from k eigenvectorsIn the new k -dimensional space , this is consistent with the effect obtained by directly performing eigenvalue decomposition on the covariance matrix (and the eigenvalue decomposition is actually restricted by the conditions of the square matrix).

Guess you like

Origin blog.csdn.net/weixin_47305073/article/details/126326764