Nine, principal component analysis

Reference url:

https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html

Principal component analysis (principal component analysis, PCA), one unsupervised algorithm, the PCA is a dimensionality reduction very basic algorithm for data visualization, noise filtering, feature extraction and the field of engineering characteristics.

1, principal component analysis Introduction

  Principal component analysis is a fast and flexible data unsupervised dimensionality reduction method.

  

 

   

 

   

  These vector data indicating spindle, the length of the arrow represents the 'importance' of the input data in each axis, i.e., it is a measure of the size of the variance of the projected data of the spindle. 'Main component' each data point is projected on the spindle data.

  The main component of these raw data are drawn, and the resulting 'main data conversion' results shown in FIG.

  

 

   This transformation from the coordinate axis data is transformed to a spindle affine transformation, the affine transformation comprises a translation (Translation), rotation (rotation) and uniform scaling (uniform scaling) three steps.

  1, with PCA dimension reduction

    PCA is a dimensionality reduction means or removing a plurality of minimum principal components to obtain a lower dimension and retain the maximum variance of the data of the projection data.

    

 

    Light-colored raw data point, dark point is the version of the projection.

    PCA dimensionality reduction meanings: least important information along the spindle have been removed, leaving only the data components contain the highest variance, the party is that small part of the difference can be seen as the removal of basic data after the loss of dimension reduction 'information' amount.

    This drop after the cube data in a way sufficient to reflect the most important relationship: Although 50% of the data dimensions to be cut, but the overall relationship between the data still generally retained.

  2, using PCA for data visualization: Handwritten Numerals

    Drop useful in the dimensional data it may not be obvious when only two dimensions, but when high data dimensions, its value will be reflected.

    

 

     64 is a whole-dimensional data point cloud, and these points or maximum variance of each data point along the projection direction.

  3, meaning ingredients

    From the perspective of a combination of the basis vectors to understand the problem.

  4, the number of selected component

    In actual use, the PCA process correctly estimate the number used to describe the data component is very important part of the cumulative variance contribution rate can be considered as a function of the number of components to determine the number of required components.

    

 

 

2, using PCA as a noise filter

  PCA can also be used as noise data filtering method - are much larger than the variance of any component of the variance of the noise, as compared to the noise component should be relatively unaffected, and therefore if only the main component of the largest subset of reconstructed data, then the signal should be reserved for selective noise and discarded.

  

 

   

 

 

  

Guess you like

Origin www.cnblogs.com/nuochengze/p/12535797.html