Talking about the Principal Component Analysis Method

Principal component analysis

**目的是将许多相关性很高的变量转化成彼此相互独立或不相关的变量
再从这些变量中选出比原始变数少,能解释大部分数据中的几个新变量(主成分,解释数据的综合性指标)**

step

  1. Standardize the original data (normalization method: standardize the data based on the mean and standard deviation of the original values) The value
    of the j-th index variable of i evaluation objects is Aij, and
    each Aij is converted into a standardized index value

  2. Calculate the correlation coefficient matrix R

  3. Calculate eigenvalues ​​and eigenvectors
    (suppose A is a square matrix of order n, if there is a number λ and n-dimensional non-zero vector α so that
    Aα=λα holds,
    then λ is an eigenvalue of square matrix A, and α is square matrix A corresponding to An eigenvector of the eigenvalue λ)
    Calculate the eigenvalue of the correlation coefficient matrix R λ1>=λ2>=…>=λm>=0
    and the corresponding eigenvector u1, u2,…, um, where uj=[u 1j,u 2j,..., u mj】^T (transpose) is
    composed of feature vectors to form m new index variables:
    y1, y2,..., ym.

  4. Select p (p<=m) principal components to calculate the comprehensive evaluation value
    (1). Calculate the information contribution rate and cumulative contribution rate of the characteristic value λj (j=1, 2,..., m)
    (the contribution rate is in statistics Generally refers to the effect of the growth of a certain part of the whole on the overall growth.
    In fact , it refers to the proportion of the growth of a certain part of the whole as a whole.)
    The information contribution rate of bj as the main component yj (each λ j and The total eigenvalue ratio)
    ap is the main component y1, y2,..., the cumulative contribution rate of yp (the proportion of the remaining eigenvalues)
    when ap is close to 1, (generally ap is 0.85, 0.90, 0.95), then choose The first p index variables are used as p principal components to replace the original m index variables,
    so that p principal components can be comprehensively analyzed

  5. Calculate the comprehensive score
    Z (the information contribution rate b ij of the j principal components is multiplied by the y principal components yj),
    and the evaluation can be performed according to the comprehensive score value

Guess you like

Origin blog.csdn.net/dangshan5/article/details/113030846