The relationship between singular value decomposition SVD, PCA principal component analysis dimensionality reduction, latent semantic analysis LSA

Singular value decomposition SVD is a method of matrix decomposition and data compression , which is the optimal approximation to the matrix in the sense of Frobenius norm (that is, square loss) . For details, please refer to Singular Value Decomposition (SVD) (Singular Value Decomposition)

The complete singular value decomposition is as follows:

A_{m\times n} The complete singular value decomposition of  a general matrix is ​​as follows:

\boldsymbol{A}_{m\times n}=\boldsymbol{U}_{m\times m}\boldsymbol{D}_{m\times n}\boldsymbol{V}_{n\times n}^{T}

The compact singular value decomposition is as follows:

If A_{m\times n}the rank of the general matrix is ​​rank( \boldsymbol{A}) = r, r <=min(m,n), then the  \boldsymbol{A} compact singular value decomposition of is:

\boldsymbol{A_{m\times n}} = \boldsymbol{U}_{m\times r}\boldsymbol{D}_{r\times r}\boldsymbol{V}_{n\times r}^{T}

 Here \boldsymbol{D}_{r\times r}are the original   \boldsymbol{D}_{m\times n} first r rows and first r columns, which  \boldsymbol{U}_{m\times r} are  \boldsymbol{U} the first r columns of , which  \boldsymbol{V}_{n\times r} are  \boldsymbol{V} the first r columns of .

The truncated singular value decomposition is as follows:

If A_{m\times n}the rank of the general matrix is ​​rank( \boldsymbol{A}) = r, r <=min(m,n), and 0<k<r, then  \boldsymbol{A} the truncated singular value decomposition is:

\boldsymbol{A_{m\times n}} \approx \boldsymbol{U}_{m\times k}\boldsymbol{D}_{k\times k}\boldsymbol{V}_{n\times k}^{T}

 Here \boldsymbol{D}_{k\times k} are the original \boldsymbol{D}_{m\times n}first k rows and first k columns, which  \boldsymbol{U}_{m\times k} are  \boldsymbol{U} the first k columns of , which  \boldsymbol{V}_{n\times k} are  \boldsymbol{V} the first k columns of .

For details on LSA, please refer to Latent Semantic Analysis (LSA) (latent semantic analysis)

LSA latent semantic analysis introduces topics for dimensionality reduction to solve the problems of inaccuracy in calculating similarity of sparse matrix, polysemy of one word and monosemy of multiple words.

If you use SVD to realize the idea of ​​LSA, treat the singular value decomposition as a  \boldsymbol{A}_{m\times n} word-text matrix,  \boldsymbol{U}_{m\times k} treat it as a word-topic matrix, and  \boldsymbol{D}_{k\times k}\boldsymbol{V}_{n\times k}^{T} treat it as a topic-text matrix, then you can decompose the word-text matrix into a word-topic matrix And the topic-text matrix, because the singular value decomposition is the optimal approximation to the matrix in the sense of square loss, and the actual meaning of each matrix after the decomposition is also reasonable by LSA, so the idea of ​​​​LSA can be directly realized by SVD. So if LSA uses SVD as the specific solution of LSA (there are other solutions, I won’t go into details here), then LSA will do the same thing as SVD, which is the optimal approximation of the matrix. The matrix gives practical meaning to the explanation nothing more.

PCA dimensionality reduction is a method of dimensionality reduction. For details, please refer to Principal Component Analysis (PCA) (principal component analysis)

Assuming that the matrix composed of m points in n-dimensional space is  \boldsymbol{A}_{m\times n} , suppose the dimension-reduced matrix we want to obtain is \boldsymbol{A^{'}}_{m\times k}, k< n. Then can we   treat \boldsymbol{A}_{m\times n} the matrix composed of singular value decomposition and left singular vector   as it? I think it is possible. At this time,  it is the transformation matrix between k-dimensional and n-dimensional, but there is a problem, that is, the dimension k after dimensionality reduction needs to satisfy 0<k<r (rank( ) = r, r <= min (m,n)), if the rank of is small, then the selection range of dimension k after dimensionality reduction is small.\boldsymbol{U}_{m\times k}\boldsymbol{A^{'}}_{m\times k}\boldsymbol{D}_{k\times k}\boldsymbol{V}_{n\times k}^{T}\boldsymbol{A}\boldsymbol{A}_{m\times n}

 Our original idea for dimensionality reduction is to make 0<k<nit possible. PCA provides a method that can make the reduced dimension have a wider range of choices. It defines a solution linear transformation matrix \boldsymbol{G}_{n\times k}, and then obtains the optimal linear transformation matrix \boldsymbol{G}_{n\times k}^{T} . Then  \boldsymbol{c}_{k\times 1}=\boldsymbol{G}_{n\times k}^{T}\boldsymbol{x}_{n\times 1}, the vector dimension k after dimension reduction is satisfied 0<k<n. This meets our initial needs. How can I ask for it \boldsymbol{G}_{n\times k}? PCA finds the optimum  by minimizing the Frobenius norm (that is, the square loss) between the 'original matrix' and the 'matrix after the original matrix has been reduced in dimension ( linear transformation ) and increased in dimension ( de-linear transformation\boldsymbol{G}_{n\times k} ) . In the end, we deduced that   \boldsymbol{G}_{n\times k} there are two methods, one is to perform eigendecomposition on the covariance matrix of , and the other is to perform singular value decomposition  \boldsymbol{A}_{m\times n}on the decentralized one .\boldsymbol{A}_{m\times n}

Singular value decomposition is only an implementation method used by PCA to solve the linear transformation matrix, and it does not participate in the central idea of ​​PCA . What they both have in common is that in the central idea of ​​SVD and PCA, the goal is to minimize the square loss between the 'original matrix' and the 'approximate matrix', but PCA pre-sets a linear transformation of dimensionality reduction , while SVD does not.

If there is something wrong, please leave a message~

Guess you like

Origin blog.csdn.net/qq_32103261/article/details/120612008