Understand the principle and application of Singular Value Decomposition (SVD) in one article

The following content refers to the study notes of Zhihu Singular Value Decomposition (SVD) and Singular Value Decomposition (SVD) of Matrix Analysis , summarized as follows:

1 Introduction

Singular Value Decomposition (SVD) is an algorithm widely used in the field of machine learning. It is a powerful tool for extracting information. It provides a very convenient way of matrix decomposition and can find interesting data in the data . potential model. This article makes an introduction to the principle of SVD and gives a practical use case.

2 Eigenvalues and eigenvectors

Before doing SVD derivation, let’s understand the definition of eigenvalues and eigenvectors:
$Ax=\lambda x$

where A is an $nn\times n$ matrix of $n$ $x$ is an n-dimensional vector, then $\lambda$ is an eigenvalue of matrix A, and $x$ is the eigenvalue $\lambda$ The eigenvector corresponding to $λ .$

What are the benefits of finding eigenvalues and eigenvectors? That is, we can decompose the matrix A into eigenvalues.

That is to say, the information of matrix A can be represented by its eigenvalues and eigenvectors.

If we find the n eigenvalues of matrix A $\lambda_{1}\leq \lambda_{2}\leq... \leq \lambda_{n}$ , and the eigenvectors $w_{1},w_{2}},...,w_{n}} corresponding to these n eigenvalues$ ，

Then the matrix A can be represented by the eigendecomposition of the following formula: where W is the $nn\times n$
insert image description here
spanned by these n eigenvectors $n \times n$ $nn\times n$ of the main diagonal of these n eigenvalues $n \times n-$ dimensional matrix.

Generally, we will standardize the n eigenvectors of W, that is, satisfy $\left| \left| w_{i} \right| \right|_{2}=1$ ，orwi $w_{i}^{T}w_{i}=1$ , at this time the n eigenvectors of W are orthonormal bases, satisfying $W^{T}W=I$ ，即 $W^{T}=W^{-1}$ , that is to say, W is a unitary matrix.

Orthogonal Matrix (Orthogonal Matrix) refers to the matrix whose transpose is equal to its inverse $A^T=A^{-1}$
unitary matrix is the generalization of the orthogonal matrix to the field of complex numbers

In this way, our eigendecomposition expression can be written as
insert image description here
Summarize, eigenvalue decomposition can obtain eigenvalues and eigenvectors, eigenvalues indicate how important this feature is, and eigenvectors indicate what this feature is. However, eigenvalue decomposition also has many limitations, for example, the transformed matrix must be a square matrix .

3 SVD decomposition

Eigenvalue decomposition is a very good method to extract matrix features, but it is only for square matrices. In the real world, most of the matrices we see are not square matrices. For example, there are M students, each Students have grades in N subjects, so an M*N matrix A cannot be a square matrix. How can we describe the important characteristics of such an ordinary matrix? Singular value decomposition is a decomposition method that can be applied to any matrix:
insert image description here
where U is a $mm\times m$ , $\Sigma$ is a $nm\times n$ is all 0 except for the elements on the main diagonal. Each element on the main diagonal is called a singular value. V is an $nn\times n$ matrix of $n .$ Both U and V are unitary matrices, which satisfy:

$U^{T}U=I,V^{T}V=I$
following figure can clearly see the definition of SVD above:
insert image description here
So how do we find the three matrices U, Σ, and V after SVD decomposition?

If we do matrix multiplication with the transpose of A and A, then we get $n \times$ A square matrix $A^{T}A$ $of n$ $A^{T} A$ _ $A^{T} A$ is a square matrix, then we can perform eigendecomposition, and the obtained eigenvalues and eigenvectors satisfy the following formula:
insert image description here
In this way, we can get the matrix $A^{T}A$ and the corresponding n eigenvectors v. will $A^{T}A$ All the eigenvectors of $^{T}$ $A$ $n \times The matrix V of n$ is the V matrix in our SVD formula. Generally we call each eigenvector in Vtheright singular vector

If we do matrix multiplication of A and the transpose of A, we get $m \times$ A square matrix $AA^{T}$ $of m$ $A A^{T.}$ _ Since $AA^{T}$ is a square matrix, then we can perform eigendecomposition, and the obtained eigenvalues and eigenvectors satisfy the following formula:
insert image description here
In this way, we can get the matrix $AA^{T}$ and the corresponding m eigenvectors u. will $AA^{T}$ All the eigenvectors of $^{T}$ $m \times The matrix U of m$ is the U matrix in our SVD formula. Generally we call each eigenvector in Utheleft singular vector

We have obtained both U and V, and now there is only the singular value matrix Σ that has not been obtained. Since Σ is 0 except for the singular value on the diagonal, then we only need to obtain each singular value σ. OK.

We noticed:
insert image description here
In this way we can find each of our singular values, and then find the singular value matrix Σ.

There is one more question that was not mentioned above, that is, we said $A^{T}A$ form the V matrix in our SVD, and $AA^{T}$ The eigenvector of $^{T is the U matrix in our SVD. Is there any basis for this?}$ This is actually very easy to prove. Let's take the proof of the V matrix as an example.
insert image description here
The above formula proves that $U^{U}=I,\Sigma^{T}= \Sigma$ . It can be seen that $A^{T}A$ The eigenvectors of $^{T}$ $A$ A similar method can get $AA^{T}$ form the U matrix in our SVD.

Further, we can also see that our eigenvalue matrix is equal to the square of the singular value matrix, which means that the eigenvalues and singular values satisfy the following relationship: In other words,
insert image description here
we can use $\sigma_{i}=\ frac{Av_{i}}{u_{i}}$ To calculate the singular value, you can also find $A^{T}A$ Take the square rootof the eigenvalues of $^{T}$ $A$ to find the singular values.

4 SVD Calculation Example

For the singular value of matrix A, we can first calculate it through wolframalpha:
insert image description here
manual calculation steps:

Singular Value Decomposition (SVD) of Matrix Analysis
https://www.bilibili.com/video/av15971352/?p=5

As shown in the figure above, we can see that SVD can divide the matrix into different matrices and add them together, and the coefficients in front of the division are the SVD values. For the singular value, it is similar to the eigenvalue in our eigendecomposition, and it is also in the singular value matrix. Arranged from large to small, and the singular values decrease very fast. In many cases, the sum of the top 10% or even 1% of the singular values accounts for more than 99% of the total sum of the singular values.

In other words, we can also use the largest k singular values and the corresponding left and right singular vectors to approximate the description matrix.

Due to this important property, SVD can be used for PCA dimension reduction for data compression and denoising , and can also be used for recommendation algorithms to decompose the matrix corresponding to users and preferences, and then obtain implicit user needs for recommendation .

5 SVD for PCA

PCA dimensionality reduction, need to find the sample covariance matrix $X^{T}X$ , and then use the matrix formed by the largest d eigenvectors to perform low-dimensional projection dimensionality reduction. It can be seen that in this process, the covariance matrix $X^{T}X$ , when the number of samples is large and the number of features of samples is large, the amount of calculation is very large.

Note that our SVD can also get the covariance matrix $X^{T}X$ The matrix formed by the largest d eigenvectors of $^{T}$ $X$ $X^{T}X$ , can also find our right singular matrix V.

In other words, our PCA algorithm can be done without eigendecomposition, but with SVD. This method is very effective when the sample size is large.

In fact, the real implementation behind scikit-learn's PCA algorithm is to use SVD instead of the violent feature decomposition we think.

On the other hand, notice that PCA only uses the right singular matrix of our SVD, not the left singular matrix, so what is the use of the left singular matrix?

Suppose our sample is $m \times The matrix X of n$ , if we find the matrix $XX^{T}$ spanned by the largest d eigenvectors of $^{T}$ $m \times d-$ dimensional matrix U, if we proceed as follows:
insert image description here
we can get a $d \times The matrix X' of n$ , this matrix and our original $m \times Compared with the n-$ dimensional sample matrix X, the number of rows is reduced from m to k, which shows that the number of rows is compressed.

Left singular matrices can be used to compress the number of rows.

The right singular matrix can be used to compress the number of columns, that is, the feature dimension, which is our PCA dimensionality reduction.

6 Summary

1. SVD decomposition is to decompose a matrix into its left eigenspace matrix and right eigenspace matrix, and the eigenvalue matrix in the middle represents the weight value, so we only need to select those important features and keep them every time. Restore the original matrix with almost no loss;

2. SVD decomposition can be regarded as a low-rank expression of the original matrix ;

3. SVD can be used for data denoising;

4. SVD can be used for feature dimension compression;