[ML-15] Principal component analysis (PCA)

table of Contents

  1. PCA Thought
  2. Algorithm derivation
  3. PCA algorithm flow
  4. Introduction to Nuclear Principal Component Analysis KPCA
  5. PCA algorithm summary

Principal components analysis (hereinafter referred to as PCA) is one of the most important dimensionality reduction methods. It is widely used in the fields of data compression to eliminate redundancy and data noise elimination.

1. PCA 's Idea

  PCA, as the name suggests, is to find out the most important aspects of the data and replace the original data with the most important aspects of the data . In other words: try to keep as much information as possible.

For example: if our data set is n-dimensional, there are m data (x (1), x (2), ..., x (m)). We hope to reduce the dimensions of these m data from n to n 'dimensions, and hope that these m n' dimensional data sets represent the original data set as much as possible. We know that there will definitely be a loss of data from n-dimensional to n'-dimensional, but we want the loss to be as small as possible. So how to make this n'-dimensional data represent the original data as much as possible?

  Let's look at the simplest case first, that is, n = 2, n '= 1, that is, reducing the data from two dimensions to one dimension. The data is shown below. We hope to find a certain dimension direction, which can represent the data of these two dimensions. The figure lists two vector directions, u1 and u2, so which vector can better represent the original data set? It can also be seen intuitively that u1 is better than u2.

  Why is u1 better than u2? There can be two explanations. The first explanation is that the distance between the sample point and this line is close enough. The second explanation is that the projection of the sample point on this line can be separated as much as possible, that is, the variance is as large as possible.

  If we generalize n 'from 1 dimension to any dimension, our criterion for reducing the dimension is: the distance between the sample point and this hyperplane is close enough, or the projection of the sample point on this hyperplane can be separated as much as possible .

  Based on the above two standards, we can get two equivalent derivations of PCA.

Two, algorithm derivation

Two ideas to derive the same expression. The first is to minimize the loss after projection (the loss due to projection is the smallest), and the second is to maximize the variance after projection.

2.1 Derivation of PCA : based on minimum projection distance

We first look at the derivation of the first explanation, that is, the distance between the sample point and this hyperplane is close enough.

Organizing this formula, we can get:

In this way, it can be more clearly seen that W is a matrix composed of n ′ eigenvectors of XX, and λ is a matrix composed of several eigenvalues ​​of XX, the eigenvalues ​​are on the main diagonal, and the remaining positions are 0. When we reduce the data set from n dimensions to n 'dimensions, we need to find the feature vector corresponding to the largest n' feature values. The matrix W composed of these n ′ feature vectors is the matrix we need. For the original data set, we only need to use z (i) = WTx (i) to reduce the original data set to the n'-dimensional data set with the smallest projection distance.

Note: The optimization process of spectral clustering will find that it is very similar to PCA, except that spectral clustering is to find the eigenvector corresponding to the top k smallest eigenvalues, and PCA is to find the top k largest eigenvalues. Feature vector.  

2.2 Based on maximum projection variance

It is easy to find that this is basically the same as 2.1 .

3. PCA algorithm process

4. Introduction to KPCA

In the above PCA algorithm, we assume that there is a linear hyperplane that allows us to project the data. However, sometimes the data is not linear and PCA dimensionality reduction cannot be performed directly . Here we need to use the same kernel function idea as support vector machine: it is called Kernelized PCA (Kernelized PCA, hereinafter referred to as KPCA) . Assuming that the data in high-dimensional space is generated from the data in n -dimensional space through the mapping ϕ . For feature deformation in n -dimensional space:

Five, PCA algorithm summary

  Here is a summary of the PCA algorithm. As a dimensionality reduction method for unsupervised learning, it only needs eigenvalue decomposition to compress and denoise the data. Therefore, it is widely used in actual scenarios. To overcome the PCA some shortcomings, there have been a lot PCA variants, such as for solving nonlinear dimensionality reduction section VI KPCA , as well as to address the incremental memory limitations of PCA method Incremental PCA , as well as solve the sparse data dimension reduction of PCA method Sparse PCA, etc.

advantage:

1 ) Only the amount of information needs to be measured by variance, and is not affected by factors outside the data set. 

2 ) The orthogonality of the principal components can eliminate the mutual influence factors between the original data components.

3 ) The calculation method is simple, and the main operation is eigenvalue decomposition, which is easy to implement.

Disadvantages:

1 ) The meaning of each feature dimension of the principal component has a certain degree of ambiguity, which is not as interpretable as the original sample features.

2 ) Non-principal components with small variance may also contain important information about sample differences, because discarding dimensions may affect subsequent data processing.

The above is mainly from: < https://www.cnblogs.com/pinard/p/6239403.html >

Appendix 1: Handwriting exercises

   

   

Guess you like

Origin www.cnblogs.com/yifanrensheng/p/12700128.html