Evaluation model (2) Principal component analysis, factor analysis, comparison between the two and their corresponding PYTHON implementation code and example explanations

Mathematical modeling series of articles:

The following are some model algorithms and codes compiled by me when preparing for the National Digital Analog Competition. I will update the content when I have time:
Evaluation Model (1) Analytic Hierarchy Process (AHP) , Entropy weight method, TOPSIS analysis and its corresponding PYTHON implementation code and explanation of examples
Evaluation model (2) Principal component analysis, factor analysis, comparison between the two and their corresponding explanation of PYTHON implementation code and examples< /span>Optimization model (2) Detailed explanation of nonlinear programming, and examples, Scipy.optimize solves nonlinear programmingOptimization model (1) Detailed explanation of linear programming, and examples, Use Python's Pulp library function to solve linear programming
Optimization model (zero) Overview, classification, analysis of various optimization models and universal problem-solving steps

1.4 Principal component analysis

Principal component analysis (PCA) explained in detail

Principal Component Analysis (PCA) :

Principal Component Analysis (PCA) is a relatively basic data dimensionality reduction method and an important part of multivariate statistics. It is widely used in data analysis, machine learning, etc. . The purpose of principal component analysis is to replace the original more variables with fewer variables and reflect most of the information of the original variables.

Principal component analysis is a statistical analysis method that divides multiple original variables into a few comprehensive indicators. From a mathematical point of view, this is a dimensionality reductionprocessing technique

The role of data dimensionality reduction:
  • Make the data set easier to use and help simplify the problem;

  • The reduction of variables after data dimensionality reduction will greatly reduce the data processed by the computer, thus shortening the data processing time. ;

  • remove noise;

  • To make the results easy to understand, you can analyze them by looking at the weights of the principal components;

The idea of ​​principal component analysis (PCA):

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-bF1I3Ga9-1693302842724) (D:\S\typora folder\img\image-20230807124215589.png)]

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-67VJH08P-1693302842725) (D:\S\typora folder\img\image-20230807124246480.png)]

Illustration: You can understand the above "the one with the largest variance in the linear combination" by rotating the coordinate axis

For a sample containing n data and the number of variables is p, we can use n points in p-dimensional space to represent these data. For example, it contains 2 variables (three variables would be a stereoscopic image) and 3 data (1, 2), (2, 2), (3 , 3) sample can be expressed as:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-i0Cxg9nB-1693302842725) (D:\S\typora folder\img\image-20230807222334764.png)]

We will propose many variables at the beginning of the experiment and collect these data.Each variable in these data will oftenhave a certain correlation< /span> and these variables can be replaced with fewer variables. correlations mean that the dimensionality of the data can be reduced. And these

The intuitive understanding of principal component analysis can be thought of as rotating the coordinate axis, so that after rotating the coordinate axis, these points are in the new coordinate system The variance projected in each coordinate axis (variable) direction becomes larger. Among them, if the variance is the largest on a certain coordinate axis, then the coordinate axis of these scattered points corresponding to this coordinate axis is the first principal component**, and secondly It is the second principal component, and so on. As shown in the figure above, if these points have the largest variance on a certain slope Z=aX+bY, then this is the first principal component similar to the above formulaz1=l1x1 + l2x2

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-HfPHvz8f-1693302842726) (D:\S\typora folder\img\image-20230807223508030.png)]
For the 8 sets of data in the above figure, we found that has a large variance in the x-axis direction, while the variance in the y-axis direction is 0. Therefore, you can use the abscissa data of these points as the first principal component , and select only the first principal component You can meet the requirements (see cumulative contribution rate).

For the situation in the figure below, we find that the data are almost arranged on a straight line, and the variances in the x-axis and y-axis directions are relatively large. But if the coordinate axis is rotated by a certain angle so that the variance of the projection of these data on a certain coordinate axis is relatively large, then the new coordinate system can be used* *The axis coordinate with the larger variance is used as the principal component.
> [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-s6U2d5sG-1693302842726) (D:\S\typora folder\img\image-20230807223740367.png) ]
> [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-zGmpgaCC-1693302842726) (D:\S\typora folder\img\image-20230807224208534.png) ]
In most cases, each variable of the data basically obeys the normal distribution, so
the data scatter distribution with variable 2 is roughly an ellipse**, . By rotating the coordinate system, the major axis of the hyperellipse falls on one coordinate axis, and then the other axis of the hyperellipse also falls on the coordinate axis as much as possible. In this way, the coordinate values ​​on each new coordinate axis are the corresponding principal components. The data of p variables are roughly distributed in a hyperellipse,The scatter distribution of variable 3 is roughly an ellipsoid
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-pxiAdeOW-1693302842727) (D:\S\typora folder\img\image-20230807224411421.png)]

You can rotate the coordinate system so that the two axes of the ellipse fall on the coordinate axes as much as possible.
>
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-HpBLZdrJ-1693302842727) (D:\S\typora folder\img\image-20230807224435927.png)]

In this way, we use the x coordinate of the scatter point in the new coordinate system as the first principal component (because the variance in the x direction is the largest) ,The coordinate of the y-axis is the second principal component.

Covariance matrix related concepts, properties, application significance and the use of matrix eigenvectors

The basic steps:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-x9bCVORp-1693302842728) (D:\S\typora folder\img\image-20230807231037777.png)]

  1. Data standardization
    >[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-yxYSV7HP-1693302842728) (D:\S\typora folder\img\image-20230807231108149.png) ]
    >[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-BfC138t4-1693302842728) (D:\S\typora folder\img\image-20230807231134425.png) ]

  2. Calculate the covariance matrix of the standardized sample:
    >[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-RMnFP4fT-1693302842729) (D:\S\typora folder\img\image-20230807231218034.png) ]

  3. Compute eigenvalues ​​and eigenvectors of R:

  • Eigenvalues ​​and eigenvectors can generally be calculated directly with the help of software
    >[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-t7OeSzZP-1693302842730) (D:\S\typora folder\img\image-20230807231255712.png) ]
  1. Calculate the principal component contribution rate and cumulative contribution rate:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-3C8MnNVH-1693302842730) (D:\S\typora folder\img\image-20230807231351750.png)]
Insert image description here
5. Write the principal components&&analyze the meaning of the principal components according to the coefficients:
>[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-JcOMWmyh-1693302842730) (D:\S\typora folder\img\image-20230807231422195.png) ]

  1. Use the results of the principal components for subsequent analysis:
  • Cluster analysis
  • regression analysis
Code:
import numpy as np
import pandas as pd
 
df=pd.read_csv('corn.csv') #读取文件
df=pd.DataFrame(df)
R=df.corr() #样本相关阵
l,T=np.linalg.eig(R) #求特征值l与特征向量T,特征值默认从大到小排序
n,p=df.shape 
s=0
t=0
cr=[] #累计贡献率
for i in range(len(l)):
    t+=1
    contri=l[i]/np.sum(l) #第i主成分贡献率
    cr.append(contri)
    s+=contri #累计贡献率
    if s>=0.8: #累计贡献率达80%即停止
        break
pc=[] #主成分
for i in range(t):
    Y=np.dot(df,T[i].T) #计算第i主成分
    pc.append(Y)
factor_loading=[]
for i in range(t):
    a=[]
    for j in range(p):
        a.append(np.sqrt(l[i])*T[i][j]) #计算第i主成分在第j个变量上的载荷
    factor_loading.append(a)
factor_loading=np.array(factor_loading)
print('主成分个数:',t)
print('主成分:',np.mat(pc))
print('贡献率:',cr)
print('累计贡献率:',s)
print('因子载荷:',factor_loading)
Additional notes and explanations:
  • The meaning of the explanation of principal components is generally somewhat vague, not as clear and precise as the meaning of the original variables. This is a price that has to be paid in the process of variable dimensionality reduction.
  • The difficulty of principal component analysis is mainly to be able to give a better explanation of the principal components. If one of the extracted principal components cannot be explained, the entire principal component will The component analysis also failed.
  • Here are some examples of principal component analysis explanations
  • First example:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-e6M2A4Xa-1693302842731) (D:\S\typora folder\img\image-20230807232944363.png)]
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-6p2rA56M-1693302842732) (D:\S\typora folder\img\image-20230807232955549.png)]
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-1e0uEHLh-1693302842732) (D:\S\typora folder\img\image-20230807233007871.png)]

  • second example

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-3YeltklL-1693302842733) (D:\S\typora folder\img\image-20230807233207621.png)]

1.5 Factor analysis

Factor Analysis Examples

Brief description of factor analysis (FA) algorithm

Basic idea Principle:

Overview:

​ Study the statistical techniques for extracting common factors from variable groups. Factor analysis method studies the Correlation coefficient matrix, because the premise of factor analysis is that there is an internal relationship between variables Xi, so that it can be decomposed into factors), and the intricate relationships between these variables are reduced to a few Comprehensive factors, because the number of factors concluded is less than the number of original variables, but they also contain the information of the original variables, < a i=7>So, this analysis process is also called dimensionality reduction. Since factors are often easier to explain than principal components, factor analysis is more likely to be successful than principal component analysis and thus has wider applications. For example, if a student's English, data, and Chinese scores are all very good, then the potential common factor may be a high level of intelligence. Therefore, the process of factor analysis is actually the process of finding common factors and individual factors and obtaining the optimal explanation.

​ Here note that the correlation coefficient matrix is ​​a non-unit matrix, so factor analysis can be implemented, because the premise of factor analysis is that between variables Xi There are internal relationships so that it can be decomposed into factors.

origin:

​ Factor analysis was first proposed by Spearman in 1904, which to some extent can be seen as a generalization and expansion of principal component analysis.

key problem:

There are two core issues in factor analysis: One is how to construct factor variables, a>The second is how to name and explain factor variables.

type:

Factor analysis type: R-type factor analysis and Q-type factor analysis, just like cluster analysis is divided into R-type and Q-type Similarly, R-type factor analysis performs factor analysis on variables, Q-type factor analysis performs factor analysis on samples, and R-type is a commonly used type in mathematical modeling. Therefore, this article mainly explains the R type.

Principles of factor analysis:

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ub6FHgoF-1693302842733) (D:\S\typora folder\img\image-20230808125119382.png)]

Matrix form:
> [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-C6ZBYoCS-1693302842733) (D:\S\typora folder\img\image-20230808185123419.png) ]

Give some assumptions to calculate the A matrix

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-NOY2pXoO-1693302842734) (D:\S\typora folder\img\image-20230808134551679.png)]

Assumption: Common factors are uncorrelated with each other and have unit variance; special factors are uncorrelated with each other and with the common factors.

Solving the factor loading matrix:

  1. Principal Component Analysis:
    • Find the eigenvalues ​​and eigenmatrix of the correlation matrix. Generally, the eigenvalue is greater than 1. This method is more commonly used. This method is used in the following calculation steps
  2. principal factor method
  3. maximum likelihood estimation method
Summary of basic steps:
  1. Determine whether several original variables are suitable for factor analysis
    • correlation test
  2. Construct solution factor variables
    • Input the original data X (n*p dimension), calculate the sample mean and variance, and standardize the data samples
    • Calculate the correlation matrix R of the sample
    • Find the eigenroots and eigenvectors of the correlation matrix R
    • According to the cumulative contribution rate required by the system && scree plot, determine the number of common factors and factor loading matrix A
  3. Use rotation to make factor variables && explain the common factors
    • Rotate the loading matrix to better explain the common factors
    • Interpret common factors
    • Select the indicators with the highest loading coefficients in each factor to comprehensively explain the role of the factors
  4. Calculate the score of a factor variable
    • Calculate component score coefficient matrix table
    • Factor weight analysis
    • Based on the above calculation results, find the factor scores and analyze the system.
Detailed step analysis:

1. Determine whether the original variables are suitable for factor analysis:

The basic logic of factor analysis is to construct a few representative factor variables from the original variables, which requires a relatively strong correlation between the original variables, and therefore requires a correlation test.

  • Correlation test, KMO test method and Bartlett sphericity test method are generally used to test the correlation of original variables;

    • KMO test method

      • Proposed by Kaiser, Meyer and Olkin, this test tests the relative sizes of simple correlation coefficients and partial correlation coefficients between original variables, and is mainly used in factor analysis of multivariate statistics.
      • The closer the KMO value is from 0 to 1, the closer it is to 1, which means the stronger the correlation between variables, and the more suitable the original variables are for factor analysis; when the sum of the squares of the simple correlation coefficients between all variables is close to 0, the KMO value is closer to 0, means that the weaker the correlation between variables, the less suitable the original variables are for factor analysis.
      • Among them, Kaiser gives a KMO inspection standard: KMO>0.9, very suitable; 0.8<KMO<0.9, suitable; 0.7<KMO<0.8, average; 0.6<KMO<0.7, not very suitable; KMO<0.5, Not suitable.
    • Bartlett's test of sphericity

      • Bartlett's test of sphericity is based on the correlation coefficient matrix of variables. Its null hypothesis is that the correlation coefficient matrix isa unit matrix (not suitable for factor analysis, the correlation between indicators is too poor, and it is not suitable for dimensionality reduction )

      • The statistic of Bartlett's test of sphericity is obtained based on the determinant of the correlation coefficient matrix. If the value is large and its corresponding p-value is less than the significance level in the user's mind (generally 0.05), then the null hypothesis should be rejected. think phase

        The correlation coefficient cannot be a unit matrix, that is, there is a correlation between the original variables, which is suitable for factor analysis.

2. Construct the solution factor variable:

  • Input the original data X n*p dimensions, calculate the sample mean and variance, and standardize the data samples;

    • Principal component analysis and factor analysis are used to process Gaussian data, so the data needs to be standardized.
  • Calculate the correlation matrix R of the sample;

    • Use the software to generate the correlation matrix R of the sample variables with one click: Suppose a certain socio-economic system problem, its main characteristics can be expressed by 4 indicators, which are production, technology , transportation and environment. Itscorrelation matrix is:

      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Z9J4rhJg-1693302842734) (D:\S\typora folder\img\image-20230808164740937.png)]

  • Find the eigenroots and eigenvectors of the correlation matrix R;

    • The correlation matrix is:
      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-I4Yxxzay-1693302842735) (D:\S\typora folder\img\image-20230808164424115.png)]

      According to the correlation matrix, the corresponding eigenvalues, total percentages and cumulative percentages are found in the following table:

      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Oi39zlKH-1693302842735) (D:\S\typora folder\img\image-20230808164511537.png)]

      The eigenvector matrix corresponding to the eigenvalue (one eigenvalue corresponds to a vertical vector) is:

      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-aEcQyMmG-1693302842735) (D:\S\typora folder\img\image-20230808164540874.png)]

  • Determine the number of common factors and the factor loading matrix A according to the cumulative contribution rate&&scree plot required by the system;

    • Generally, the public factor whose cumulative contribution rate reaches 80% is taken as the final factor.
      > [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-knb1DMLf-1693302842736) (D:\S\typora folder\img\image-20230808165223879-1691484873504- 2.png)]

    • Scree test(scree test)

      • The number of factors can be determined by directly observing changes in eigenvalues. When a certain eigenvalue has a large decrease compared with the previous eigenvalue, and this eigenvalue is small, the following eigenvalues ​​do not change much, indicating that adding factors corresponding to this eigenvalue can only add very little information. , so the first few eigenvalues ​​are the number of common factors that should be extracted.
        > [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-VgSSoGvH-1693302842736) (D:\S\typora folder\img\image-20230808165536346.png) ]
    • If the amount of information reflected by the obtained feature values ​​is required to account for more than 90% of the total information amount, then from the perspective of the percentage of cumulative feature values, only the first two items are needed. That is, just take the two main factors. Corresponding to the eigenvectors of the first two columns of eigenvalues, the factor loading matrix A that can be obtained is:
      Insert image description here

3. Use rotation to make the factor variables && explain the common factors:

  • Rotate the loading matrix to better explain the common factors;

    It is precisely because the factor loading matrix A is not unique that in practical applications we often take advantage of this to make the new factors more easily interpretable through factor transformation. This is why factor analysis is often easier to solve than the results of principal component analysis.

    reason for explanation.

    [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-dlgticwR-1693302842737) (D:\S\typora folder\img\image-20230808170358446.png)]

    When we obtain an estimate of a factor loading matrix, it is possible that multiple variables may have large factor loadings on the same factor, or that a variable may appear on multiple factors. There are large loadings on the factors, and it is difficult to explain or name the factors at this time. At this time, we hope to obtain a new simplified factor loading matrix by rotating the factor loading matrix. The new The factor loadings are more differentiated, which facilitates factor analysis and naming.

  • Interpret common factors

    This is equivalent to A being a 15*4 loading matrix. If the variable value corresponding to the factor is large, it means that the variable has a large contribution to the factor.
    > [External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-MozbjDY3-1693302842737) (D:\S\typora folder\img\image-20230808164603609.png) ]

  • Select the indicators with the highest loading coefficients in each factor to comprehensively explain the role of the factors:

    • Assume that n factors have been determined in the previous article, and the factor loading coefficients of a, b, c, and d in factor i are relatively large, so factor i can be determined as a certain component (can be summarized and renamed)

    • For example, the indicators with the highest loading coefficients for factor 2 are computer playing, walking, shopping, and daily exercise; then this factor can obviously be summarized asleisure and entertainment factor , the same is true for other factors.

    • Another example is the following:

Insert image description hereInsert image description here

  • The rightmost column is the degree of commonality (common factor variance) In fact, these common factors can Explain what percentage of the original variables, the more, the better
    Insert image description here

  • Factor loading matrix heat map, intuitively see the variables of factors that can be summarized.
    Insert image description here

4. Calculate the score of the factor variable:

  • Calculate the component score coefficient matrix table;

    • The component score coefficient matrix table isthe rotated factor loading matrix, which is intended to illustrate the factor score coefficients contained in each component. (principal component loading), used to calculate component scores and derive the principal component formula.
      Insert image description here
  • Factor weight analysis:
    Insert image description here

    • The above table shows the principal component weight analysis based on information such as loading coefficients of factor analysis. The calculation formula is: variance explanation rate/cumulative variance explanation rate after rotation.
  • Based on the above calculation results, find the factor scores and analyze the system.
       [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-2zSilL9n-1693302842741) (D:\S\typora folder\img\image-20230808174906160.png)]

    • Finally, four factors (represented by 15 variables) were extracted (data dimensionality reduction)
    • Compute an overall score based on factors and weights:
      [The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-UbyZ6mnE-1693302842741) (D:\S\typora folder\img\image-20230808180141418.png)]

应用例子

slightly

Realization method

Factor analysis by spsspro spsspro

1.6 Comparison between factor analysis and principal component analysis

  • Differences in thinking:
    • Factor analysis (FA factor analysis) is similar to a process of factoring to find common factors.Find the common factors and then use the common factors to represent all variables xi. The factors are linearly independent. of
    • Principal component analysis (PCA) is a new method that uses linear combinations of all variables to form linearly independent variable, and then use the new variable to continue the analysis. In fact,this variable is no longerthe original variable characteristic. Therefore, the last national competition example question selected the 20 best out of 729 features and could not use principal component analysis.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Hwpkbut5-1693302842743) (D:\S\typora folder\img\image-20230808124044756.png)]

  • Other differences:
  1. Different assumptions: Principal component analysis is just a simple numerical calculation, there is no need to construct a model, and there are almost no assumptions; while factor analysis requires the construction of a factor model, < a i=2>and accompanied by several key assumptions.

  2. Number of solutions: The solution of the principal component is unique, while the factor can have many solutions and can be rotated to find the optimal one. The possibility of success in factor explanation is much greater than the possibility of success in principal component explanation.

  3. The solution methods are different: The solution method of principal component analysis starts from the covariance matrix, while the solution methods of factor analysis include principal component method, principal axis factor method, and maximum likelihood method , least squares method, a factor extraction method, etc.;

  4. Linear representation methods are different; Factor analysis expresses variables as linear combinations of common factors; principal component analysis expresses principal components as linear combinations of variables.

  5. The explanation focus is different; Principal component analysis: focuses on explaining the total variance of each variable; factor analysis: focuses on explaining the covariance between variables.

  6. Differences in algorithms; Principal component analysis: the diagonal elements of the covariance matrix are the variances of the variables; factor analysis: the diagonal elements of the covariance matrix used are not The variance of a variable is the degree of commonality corresponding to the variable (the part of the variance of the variable that is explained by each factor).

  • Application route:
  • Factor analysis is the same as principal component analysis. Since the focus is on data dimensionality reduction, it is rarely used alone. In most cases, some models are used in combination. For example:
    (1) Factor analysis (principal component analysis) + multiple regression analysis: regression prediction is made after judging and solving the collinearity problem;
    (2) Factor Analysis (principal component analysis) + cluster analysis: Cluster the dimensionally reduced data and analyze the data characteristics, but factor analysis will be more suitable because the clustering results based on factors are easier to interpret, while the clustering results based on principal components are easier to interpret. Class results are difficult to interpret;
    (3) Factor analysis (principal component analysis) + classification: Classification prediction is performed after data dimensionality reduction (or data compression), which is also a commonly used combination method.
  • Factor analysis achieves the purpose of data dimensionality reduction by finding common factors (factor analysis can also be used to analyze the intrinsic relationship between different variables), while principal component analysis finds the feature matrix to achieve data dimensionality reduction.
  • The main functions of factor analysis:
    (1) Seek basic data structure;
    (2) Use a small number of factors to describe multiple indicators;
    is the same as point analysis. Since the focus is on data dimensionality reduction, it is rarely used alone. In most cases, some models are used in combination. For example:
    (1) Factor analysis (principal component analysis) + multiple regression analysis: regression prediction is made after judging and solving the collinearity problem;
    (2) Factor Analysis (principal component analysis) + cluster analysis: Cluster the dimensionally reduced data and analyze the data characteristics, but factor analysis will be more suitable because the clustering results based on factors are easier to interpret, while the clustering results based on principal components are easier to interpret. Class results are difficult to interpret;
    (3) Factor analysis (principal component analysis) + classification: Classification prediction is performed after data dimensionality reduction (or data compression), which is also a commonly used combination method.
  • Factor analysis achieves the purpose of data dimensionality reduction by finding common factors (factor analysis can also be used to analyze the intrinsic relationship between different variables), while principal component analysis finds the feature matrix to achieve data dimensionality reduction.
  • The main functions of factor analysis:
    (1) Seek basic data structure;
    (2) Use a small number of factors to describe multiple indicators;
    (3) Data simplification, that is, dimensionality reduction.

おすすめ

転載: blog.csdn.net/m0_63669388/article/details/132567197