Singular Value Decomposition (SVD)

Singular value decomposition is often encountered in machine learning, and I will talk about it in detail today. The "matrix"/"vector" mentioned in this article refers to the real number matrix/real number vector, and we only talk about the situation in the real number field .

Integers have prime factorization, such as 12=2*2*3. After decomposing it into 2*2*3, we will get some information more easily than just studying the number 12. For example, the number 12 cannot be divided by 5; a number n multiplied by 12 will divide 2 and 3; and so on.

What about matrices, can we decompose them like the prime factorization of integers? In this way, a lot of useful information may be obtained by studying this matrix alone. The answer is that any matrix can be singular value decomposed, and singular value decomposition is useful.

The content of this article is as follows:

Table of contents

Eigendecomposition

Eigenvectors and Eigenvalues

Properties of square matrices with n linearly independent eigenvectors, including geometric interpretation

What kind of matrix has n linearly independent eigenvectors

Singular Value Decomposition

left singular vector, right singular vector, singular value

Geometric Interpretation of Singular Value Decomposition

Tight SVD and Truncated SVD

Singular value decomposition and matrix approximation

Applications of Singular Value Decomposition


Before we talk about singular value decomposition, we need to talk about eigenvalue decomposition.

Eigendecomposition

Eigenvectors and Eigenvalues

First, eigendecomposition works only for square matrices .

We can define eigenvectors. If a non-zero vector  \boldsymbol{v} satisfies  \boldsymbol{A}\boldsymbol{v}=\lambda \boldsymbol{v}, then this non-zero vector  \boldsymbol{v} is  \boldsymbol{A} the eigenvector of .

A matrix  \boldsymbol{A}_{n\times n} may or may not have eigenvectors. If there are eigenvectors, there may also be  \boldsymbol{n} a linearly independent eigenvector, or  <\boldsymbol{n} a linearly independent eigenvector.

Properties of square matrices with n linearly independent eigenvectors, including geometric interpretation

If a matrix  \boldsymbol{A}_{n\times n}has eigenvectors, and there is  \boldsymbol{n} a linearly independent eigenvector, we can analyze some useful information, then what information can be analyzed? We can simply deduce that:

1. Algebraic properties

We denote \boldsymbol{n}this linearly independent eigenvector as \{\boldsymbol{v^{(1)},...,v^{(n)}}\}, and the corresponding eigenvalue as \{\lambda_{1},...,\lambda_{n}\}. We put together each eigenvector as a column to form an eigenvector matrix \boldsymbol{V}=[\boldsymbol{v^{(1)}},...,\boldsymbol{v^{(n)}}] . Similarly, we put the corresponding eigenvalues ​​into a vector \boldsymbol{\lambda }=[\lambda _{1},...,\lambda _{n}]^{T}, then we can get:

\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}diag(\boldsymbol{\lambda })

Since  \boldsymbol{V} it is a square matrix of order n, and all columns are linearly independent of each other, all \boldsymbol{V}inverses \boldsymbol{V}^{-1}exist, and all can be obtained:

\boldsymbol{A}=\boldsymbol{V}diag(\boldsymbol{\lambda })\boldsymbol{V}^{-1}  

If we \boldsymbol{V} turn each column in into a unit vector and it is orthogonal to other vectors, of course, it \boldsymbol{\lambda }has also changed at this time, then we can get an orthogonal matrix  \boldsymbol{Q} . Due to the orthogonal matrix  \boldsymbol{Q}^{-1}=\boldsymbol{Q}^{T}, we can get:

\boldsymbol{A}=\boldsymbol{Q}diag(\boldsymbol{\Lambda })\boldsymbol{Q}^{T}

This formula is a formula we often see, and it is very convenient to use this formula to derive other formulas.

2. Geometric properties

The above are all based on formula derivation and understanding. Is there any geometric understanding of eigenvalue decomposition? Let us first do some analysis based on the two-dimensional plane.

Suppose \boldsymbol{A}_{2\times 2} there are 2 linearly independent eigenvector sums  \boldsymbol{v}^{(1)} (  \boldsymbol{v}^{(2)} assuming we have simplified these two eigenvectors to orthogonal unit vectors), and the corresponding eigenvalue  \lambda _{1} sums  \lambda _{2}. We can analyze the points on the unit circle of the two-dimensional plane, assuming that the coordinates of each point on the unit circle are  (x,y) , and the point vector of each point is  \boldsymbol{u} , we all know  \boldsymbol{u} = x\boldsymbol{v}^{(1)}+y\boldsymbol{v}^{(2)} and  x^{2}+y^{2}=1 .

If we   get  the vector by \boldsymbol{u} left multiplication  , the coordinates of the point are   , then \boldsymbol{A}_{2\times 2}\boldsymbol{Au}({x}',{y}')\boldsymbol{Au}=\boldsymbol{A}(x\boldsymbol{v}^{(1)}+y\boldsymbol{v}^{(2)})=x\boldsymbol{A}\boldsymbol{v}^{(1)}+y\boldsymbol{A}\boldsymbol{v}^{(2)}=x\lambda _{1}\boldsymbol{v}^{(1)}+y\lambda _{2}\boldsymbol{v}^{(2)}

According to the deduced formula we know \boldsymbol{Au} that the point coordinates are  (x\lambda _{1},y\lambda _{2}) ,  \boldsymbol{Au} the coordinates of the two points of the vector are equal, therefore \left\{\begin{matrix} {x}'=x\lambda _{1}\\ {y}'=y\lambda _{2}\end{matrix}\right.  . Because   x^{2}+y^{2}=1, so \frac{​{x}'^{2}}{\lambda _{1}^{2}}+\frac{​{y}'^{2}}{\lambda _{2}^{2}}=1, this is an ellipse~, we can draw a conclusion, multiplying all points on a circle to the left by one  \boldsymbol{A} will make the circle an ellipse, and the larger the eigenvalue of which eigenvector is, the more biased the original vector is to which eigenvector , the angle between the eigenvector with the large eigenvalue will become smaller, as shown in the following figure:

 We generalize the points on the unit circle to the points on all circles of the two-dimensional plane (that is, all points on the two-dimensional plane), and the left multiplication of the vector corresponding to the point will change the vector (both the direction and the modulus change \boldsymbol{A}) , \boldsymbol{A}which of the eigenvectors has a larger eigenvalue, the more the transformed vector is biased towards that eigenvector, and the angle between it and it will be smaller, and the modulus of the transformed vector is affected by the largest eigenvalue \boldsymbol{A}.

What kind of matrix has n linearly independent eigenvectors

A real symmetric matrix must have n linearly independent eigenvectors, but a matrix with n linearly independent eigenvectors is not necessarily a real symmetric matrix. I won’t talk about the specific proof here. If you want to find the proof, you can look it up in the book~

Singular Value Decomposition

left singular vector, right singular vector, singular value

Only square matrices can be eigendecomposed. For general matrices, singular value decomposition can be used for decomposition. A general matrix can be decomposed like this:

\boldsymbol{A}=\boldsymbol{U}\boldsymbol{D}\boldsymbol{V}^{T}   (If you mark out the dimensions of each matrix  \boldsymbol{A}_{m\times n}=\boldsymbol{U}_{m\times m}\boldsymbol{D}_{m\times n}\boldsymbol{V}_{n\times n}^{T})

in:

1. \boldsymbol{U} is  \boldsymbol{A}\boldsymbol{A}^{T} the eigenvector matrix (is an orthogonal matrix);   \boldsymbol{U}the column vector is called the left singular vector (left singular vector). 

2, \boldsymbol{V} is  \boldsymbol{A}^{T}\boldsymbol{A} the eigenvector matrix (is an orthogonal matrix);  \boldsymbol{V}the column vector is called the right singular vector (right singular vector).  

3. \boldsymbol{D} It is a diagonal matrix, \boldsymbol{D}and the non-zero value on the diagonal is  \boldsymbol{A}^{T}\boldsymbol{A} the square root of the non-zero eigenvalue of , and is also \boldsymbol{A}\boldsymbol{A}^{T}the square root of the non-zero eigenvalue of . ( \boldsymbol{D}The values ​​on the middle diagonal are arranged in descending order from large to small; \boldsymbol{D}the number of non-zero values ​​​​on the diagonal is \boldsymbol{A}the rank of <=min(m,n)). \boldsymbol{D}The non-zero value is called singular value (singular value).

As for the proof of the fundamental theorem of singular value decomposition, you can refer to Chapter 15 Singular Value Decomposition ~ of Mr. Li Hang's Statistical Learning Methods Second Edition, the writing is really clear! There is no proof here.

Geometric Interpretation of Singular Value Decomposition

The geometric interpretation of the eigenvalue decomposition of a real symmetric matrix is: if any vector is multiplied  \boldsymbol{u} to the left by a real symmetric matrix \boldsymbol{A}, \boldsymbol{u} a scaling transformation will occur in the same space. At that time we did a derivation.

We will not deduce the singular value decomposition of the general matrix carefully, let's briefly understand it. Let me talk about the conclusion first, m\times nthe matrix of n represents  a linear transformation \boldsymbol{A}from n-dimensional space \boldsymbol{R}^{n} to m-dimensional space  .\boldsymbol{R}^{n}

Multiply a vector \boldsymbol{u} to the left by an arbitrary matrix  \boldsymbol{A}_{m\times n}, \boldsymbol{A}\boldsymbol{u}=\boldsymbol{U}\boldsymbol{D}\boldsymbol{V}^{T}\boldsymbol{u}=\boldsymbol{U}(\boldsymbol{D}(\boldsymbol{V}^{T}\boldsymbol{u})), we look from the back to the front, first multiply to \boldsymbol{u} the left \boldsymbol{V}^{T}, and do the rotation transformation on the same dimension n; then multiply it to the left on the basis of it \boldsymbol{D}, do the scaling transformation on the previous dimension n and then elevate/ Reduce the dimension to m; then multiply it to the left on the basis  \boldsymbol{U} , and do the rotation transformation on the m dimension.

Tight SVD and Truncated SVD

The above-mentioned singular value decomposition formula \boldsymbol{A}=\boldsymbol{U}\boldsymbol{D}\boldsymbol{V}^{T}is also called the \boldsymbol{A}complete singular value decomposition of the matrix. In fact, in order to compress the storage space of the matrix, the compact form and truncated form of the singular value decomposition are commonly used. A compact singular value decomposition is a singular value decomposition of equal rank to the original matrix, and a truncated singular value decomposition is a singular value decomposition of a lower rank than the original matrix.

1. Tight singular value decomposition:

If A_{m\times n}the rank of the general matrix is ​​rank( \boldsymbol{A}) = r, r <=min(m,n), then the  \boldsymbol{A} compact singular value decomposition of is:

\boldsymbol{A_{m\times n}} = \boldsymbol{U}_{m\times r}\boldsymbol{D}_{r\times r}\boldsymbol{V}_{n\times r}^{T}

Note that there is an equal sign here. In fact, all the 0 items in \boldsymbol{D}_{r\times r}the original  \boldsymbol{D} are removed, and only the diagonal square matrix composed of r non-zero singular values ​​is retained, which  \boldsymbol{U}_{m\times r} is  \boldsymbol{U} the first r columns of , which  \boldsymbol{V}_{n\times r} is  \boldsymbol{V} the first r columns of .

2. Truncated singular value decomposition:

If A_{m\times n}the rank of the general matrix is ​​rank( \boldsymbol{A}) = r, r <=min(m,n), and 0<k<r, then  \boldsymbol{A} the truncated singular value decomposition is:

\boldsymbol{A_{m\times n}} \approx \boldsymbol{U}_{m\times k}\boldsymbol{D}_{k\times k}\boldsymbol{V}_{n\times k}^{T}

Note that here is an approximate equal sign, here \boldsymbol{D}_{k\times k} is the original \boldsymbol{D}diagonal square matrix of the first k rows and first k columns, which  \boldsymbol{U}_{m\times k} is  \boldsymbol{U} the first k columns of , which  \boldsymbol{V}_{n\times k} is  \boldsymbol{V} the first k columns of .

Singular value decomposition and matrix approximation

Singular value decomposition is a method of matrix approximation, which is the optimal approximation to the matrix in the sense of (Frobenius norm).

 Frobenius norm for matrix A:

The specific proof is a bit complicated, please refer to  Chapter 15 Singular Value Decomposition of Mr. Li Hang's Statistical Learning Methods Second Edition.

Applications of Singular Value Decomposition

 Regarding the application of singular value decomposition, there are PCA principal component calculation, LSA and so on.

Can refer to:

Principal component analysis (PCA) (principal component analysis)

Latent semantic analysis (LSA) (latent semantic analysis)

Huh, it's finally over. Today's singular value decomposition is over here. Welcome to leave a message~

Guess you like

Origin blog.csdn.net/qq_32103261/article/details/120557726