Component analysis PCA and the significance of eigenvalues and eigenvectors

https://blog.csdn.net/weixin_38314865/article/details/84190175

Definition:
Principal Component Analysis (Principal Component Analysis, PCA), is a statistical method. Orthogonal transform by the presence of a variable set of possible correlation variables into a set of linearly independent, the set of the converted variables called principal components. Thought PCA is to map n-dimensional feature to the k-dimensional (k <n), which is a new k-dimensional orthogonal feature. This is referred to as a k-dimensional feature a main component is re-constructed in the k-dimensional feature, rather than simply removing the remaining nk-dimensional feature from the n-dimensional features.
Simple Explanation:

Specifically, if we are n-dimensional data set, a total of m data. We hope that these dimensions down to the m-dimensional k-dimensional data from n, k hope that the m-dimensional data sets as possible on behalf of the original data set. We know that the data from n-dimensional k Viken down there will be a loss, but we hope losses as small as possible. So how to make this k-dimensional data representation of the original data as much as possible of it?

Let's look at the simplest case, that is, n = 2, k = 1, that is, the data from two-dimensional to one-dimensional dimensionality reduction. Data as shown below. We wanted to find a certain dimension direction, it can represent data of these two dimensions. Figure two columns in the direction of the vector, u1 and u2, which vector can then better representation of the original data set it? Can also be seen intuitively, u1 better than u2, since the maximum variance between the data sample point projected in this direction.

Reference to "geometric meaning of linear algebra" Description: "matrix multiplication corresponds to a transformation, a vector is arbitrary length into another direction or a new vector are often different in the process of transformation, the original vector rotation occurs mainly. , telescopic change. If the matrix occurs only for scaling a vector or a certain vectors, these vectors do not produce the effect of the rotation, then these vectors are called the eigenvector matrix, the ratio is telescopic feature value. "

mathematically derived on, we can know, eigenvalues corresponding eigenvectors that want to get the right over the axis, the feature value corresponding to the variance of the data is equal to the dimension in the coordinates after the rotation.

That is, the matrix A is obtained directly derived feature vector corresponding eigenvectors. We can find the right rotation axes. This is a practical application of the eigenvalues and eigenvectors: "so that the data obtained in the various dimensions to achieve maximum discrimination axes."

Therefore, data mining, it will be described directly by eigenvalue corresponding to the eigenvector directions comprising the amount of information, and a characteristic value divided by the value of all the features and the value is as follows: contribution to the variance of the feature vectors (contribution rate represents the proportion of variance inherent in this dimension the amount of information).

After the data is usually the feature vector transform is called the main component variable, m principal components of the current cumulative variance contribution rate reaches a higher percentage (e.g., 85% or more), then it retains the data of the m principal components . Achieving the purpose of the data dimension reduction. Principal component analysis algorithm principle of the whole is this.

Guess you like

Origin www.cnblogs.com/jwg-fendi/p/11098545.html