SL- principal component analysis (PCA)

SL- principal component analysis (PCA)

This article is for Liu Jianping of this article , a book mark notes, refer to "deep learning"

Principal component analysis (Principal components analysis, PCA) is the most common means of dimensionality reduction, but also a typical unsupervised learning tools. The idea is for the data points on the original space, to find a suitable subspace with smaller vectors to represent these data and information as much as possible to retain the original data.

Assuming that the original design matrix \ (X-\ in R & lt ^ {m \ Times n-} \) , average of these m data is 0; we hope to vector length obtained as \ (L \) , the definition of reduced / decoding matrix \ ( D \ {^ in n-R & lt \ L} Times \) , we say the final reduction out of the vector as \ (\ Hat X and Dw = \) , where \ (W \) is a vector representing us after compression.

Before the calculation, first look represents the vector \ (w \) come from, which requires \ (D \) with a deeper understanding of \ (\ the X-Hat = \ SUM ^ L w_id_i \) , that is, that each column of matrix D that we the vector space represents a group (represented in the original space), such as a unique result, we require these basis vectors orthogonal to each other and a length of 1. That \ (D = D'the I \) . So how to get representation vector \ (w \) it? X is clearly projected on the respective basis vectors, \ (= W_i d_i'x \) i.e. \ (W = D'x \) .

Derivation 1: projection distance based on the minimum

Obviously, we hope that "loss of information" as small as possible, that is, \ (\ hat x \) and \ (x \) between as close as possible, with two norm, to measure, that is to say, we want to minimize the
\ [ \ sum ^ m || \ hat x_i
-x_i || ^ 2_2 \] finishing the formula
\ [\ sum ^ m || \ hat x_i-x_i || ^ 2_2 = \ sum ^ m || DD'x_i-x_i | | ^ 2_2 \\ = \ sum ^ m || x_i'D'D-x_i '|| ^ 2_2 = || XD'DX || _F \\ = Tr (D'DX'-X') (XD'DX ) = - Tr (X'XDD ')
+ Tr (X'X) \] Thus
\ [D = \ arg \ min_D
-Tr (D'X'XD) st D'D = I \] to, this is a typical the optimization problem (Lagrange multiplier), specifically, that the optimal D \ (the X'X \) maximum \ (L \) eigenvalues corresponding eigenvector matrix composed.

Derivation: projection based on the maximum variance

A deduction method given above, the following is given a further derived based on the idea maximum projected variance.

First of all, our aim is to hope that the projection points as possible dispersion, we can use this degree of dispersion of sample points evil covariance matrix to describe, but this optimization brings to the difficulty. Therefore, we take each sample point "variance": that is, \ ((D'x-D ' \ bar x)' (D'x-D '\ bar x) = x'DD'x \) because we have \ (x \) the center of the. Thus, we want to maximize
\ [\ sum ^ m Tr (
D'xx'D) = Tr (D'X'XD) st D'D = I \] can be seen above the objective function is the same.

PCA algorithm flow

  • First performed for the center of the sample point
  • Calculating sample covariance matrix \ (the X'X \)
  • Eigenvalue decomposition, whichever is the largest group \ (L \) eigenvalues corresponding to eigenvectors matrix \ (D \)
  • Computed for each sample \ (w_i = D'x_i \)

Wherein, we need to reduce the degree of pre-defined dimensions \ (L \) , may be based on the distribution of feature values to ensure that data loss rate control within an acceptable range, i.e., it requires the selected \ (L \) satisfies
\ [{\ sum ^ l \ lambda_i \ over \ sum ^ n \ lambda_i} \ ge t \]

Kernel Principal Component Analysis Introduction KPCA

We return to the derivation of the objective function to
\ [D = \ arg \ min_D
-Tr (D'X'XD) st D'D = I \] using the Lagrange multiplier method and the derivative, finishing is
\ [X'XD = \ sum ^ m
x_ix_i'D = \ Lambda D \] so you can see sample D is the eigenvector of the covariance matrix, and \ (\ the Lambda \) is a diagonal matrix of eigenvectors .

Sometimes, we are very difficult to carry out a projection for better data collection on the hyperplane, so consider borrow the idea of SVM kernel function, that is to say, the above equation becomes
\ [\ sum ^ m \ phi (x_i) \ phi (x_i) 'D =
\ Lambda D \] That is, we have the problem for \ (the X'X \) eigen decomposition into a higher dimensional space covariance matrix decomposition. Of course, this will bring the computational complexity.

Guess you like

Origin www.cnblogs.com/easonshi/p/12510608.html