PCA analysis (principal component analysis)--result interpretation

Principal Component Analysis ( PCA ) is a great tool that can be used to reduce the dimensionality of the feature space. The significant advantage of PCA is that it produces uncorrelated features and improves the performance of the model.

PCA is used to reduce the number of feature dimensions used to train a model, and it does this by constructing so-called principal components ( PCs ) from multiple features.

PC is constructed in such a way that the PC1 direction explains as much of your feature as possible on the largest change, and then PC2 explains the remaining features as much as possible on the largest change. PC1 and PC2 can usually explain the vast majority of the information in the overall feature change .

PCA which allows us to visualize the categorical power of the data on a two-dimensional plane

PC (principal components) A (analysis)

1. Score chart

The score chart is the most commonly used principal component analysis chart. For some better results, different scatter points can be aggregated and the same type of scatter points can be viewed as a whole. As shown in the figure above, there are three wholes in total. The pink whole and The remaining two wholes are far apart, so they are significantly different from each other. The green and purple wholes partially overlap, that is, they are close to each other, so the similarity of sample point 9 is high and the difference is not significant.

2. Gravel chart

 One of the functions of the gravel chart is to allow you to make an early decision when choosing a PC. You can see whether to choose the combination of PC1 and PC2 or the combination of PC1 and PC2 or the combination of PC2 and PC3... The selection principle is based on The cumulative contribution rate is generally considered to be 60% as the minimum standard. Contribution rate: refers to the variance of a certain component divided by the sum of the variances of all principal components, and the one with the largest contribution rate is PC1, and so on. The largest variance is PC1, the second largest is PC2, and so on.

Supplement: The number of principal components is pre-set by yourself. For example, the data of three kinds of flowers include dozens of indicators such as petal length, width, leaf length, width, etc., but their principal components are 3. That is its flower type. It may be abstract to understand here, but you can think about why we use this method? The purpose of using PCA analysis is to achieve dimensionality reduction processing of data. For example, if we conduct variance analysis on three types of flowers and dozens of indicators one by one, it will be an extremely huge workload, and the results may not be clear. We are In the image, the principal component is used on the horizontal axis and the vertical axis, and this linear combination represents the situation where the previous value is the horizontal axis and the vertical axis.

3. Load diagram

 Whether it is component 1 (PC1) or PC2, it completely contains various measurement indicators, and the e number of PC is determined by the flower type in this example and they are equal.

The actual meaning represented by the PC axis coordinates is as shown in the figure. According to the corresponding relationship, they are the coefficients of the linear combination. At the same time, it is sufficient to describe the correlation (the closer the positive correlation, the stronger)

##Source of material:

Guess you like

Origin blog.csdn.net/qq_72899974/article/details/129468010
Recommended