Principal component analysis (PCA) algorithm

Principal component analysis (PCA) algorithm

PCA (Principal Component Analysis), principal component analysis, is the most widely used method of dimensionality reduction. PCA main idea is to map the n-dimensional feature to the k-dimensional, which is a new k-dimensional orthogonal feature is also known as the main ingredient, is re-constructed on the basis of the original n-dimensional feature of the k-dimensional feature. PCA is the work order to find a set of mutually orthogonal axes from the original space, is closely related to the new data is selected coordinate axis itself. Wherein, the first new axis is selected in the original direction of maximum data variance, the second is a new choice of coordinate axes of a coordinate plane perpendicular to the axis such that the greatest variance, the third axis is the first , a plane perpendicular to the axis of the two largest variance. And so on, can be n such axes. The new axes obtained in this manner, we found that most of the variance in the front are contained in the k axis, the rear axis of the variance contained almost zero. Thus, we can ignore the rest of the axes, leaving only the front of the k axis contains most of the variance. In fact, this is equivalent to retain only dimensional features include most of the variance, variance and ignore contains almost as feature dimensions 0, realize dimension reduction processing on the data characteristics.

Bottom line: with less comprehensive index representing various types of information present in each variable, principal component analysis and factor analysis was part of this dimensionality reduction algorithm

Dimensionality reduction

Preconditioned method dimension reduction is high dimensional feature data. The dimensionality reduction is a high-dimensional data retention characteristics under the most important ones, remove noise and unimportant features, in order to achieve the purpose of enhancing data processing speed. In the actual production and application, the dimensionality reduction within a certain range of information loss, can save a lot of time and money for us. Dimensionality reduction has become a widely used method of data preprocessing

Dimensionality reduction has the following advantages:

  • 1) it is easier to use such data set.

  • 2) reduce the computational overhead of the algorithm.

  • 3) remove noise.

  • 4) so ​​that the result be readily understood.

There are a lot of dimension reduction algorithm, such as singular value decomposition (the SVD) , principal component analysis (the PCA), factor analysis (the FA), Independent Component Analysis (ICA).

PCA algorithm process:   

    (1) to the average value, i.e. by subtracting each average value of each feature;

    (2) calculation of the covariance matrix;

    (3) calculation of the covariance eigenvalues ​​and eigenvectors;

    (4) feature values ​​sorted in descending order;

    (5) Reserved largest eigenvectors;

    (6) to convert the data into a new space in the feature vector construction.

PCA dimensionality reduction guidelines:

    (1) Recent Reconstruction: the concentration of all sample points, the reconstructed point of the distance error and the original point minimum.

    (2) Maximum Separability: Sample far apart as possible in the projection of the low-dimensional space.

PCA algorithm advantages:

    (1) easier to use such data set;

    (2) the algorithm reduces the computational overhead;

    (3) removing noise;

    (4) so ​​that the result be readily appreciated;

    (5) completely parameter limit.

PCA algorithm Disadvantages:

    (1) If you have some prior knowledge of the object to be observed, to master some of the characteristics of the data, but can not intervene in the process by parametric methods, may not be the desired effect, the efficiency is not high;

    (2) eigenvalue decomposition has some limitations, such as the transformation matrix must be square;

    (3) In the case of non-Gaussian distribution, PCA method derived PCA may not be optimal.

PCA algorithm:

    (1) exploring high-dimensional data set and visualization.

    (2) data compression.

    (3) data preprocessing.

    (4) an image analysis processing, voice communication.

    (5) dimension reduction (most important), data redundancy is removed with noise

Guess you like

Origin www.cnblogs.com/yanruizhe/p/11988286.html