Exploratory factor analysis process

Steps for exploratory factor analysis:

Next, a case is used to demonstrate how each step of factor analysis (exploratory factor analysis) should be performed.

Case:To explore the railway transportation capacity of different provinces in my country, some relevant data were collected as follows:

Upload the data toSPSSAU system. In the [Advanced Methods] module, select [Exploratory Factor Analysis] and drag the variables to In the analysis box on the right, check "Factor Score" and "Comprehensive Score", and select the default "Maximum Variance Method" for the rotation method. The operation is as follows:

1. Standardization of indicator data

Due to the different nature of indicator data, with different orders of magnitude and dimensions, the analysis results will be inaccurate or errors will occur. Therefore, the original data is standardized first. SPSSAUFactor analysis will automatically normalize so no further processing of the data is required.

Standardized calculation formula:(X-Mean)/Std

2. Factor analysis applicability test

The premise for factor analysis is that the data is suitable for this method, usually using the KMO test and Bartlett's sphericity test. The KMO test is used to check the correlation between variables, and the value is 0~1. The closer the KMO value is to 1, the stronger the correlation between variables. Generally, if the value is greater than 0.6, factor analysis can be performed. Bartlett's sphericity test is used to test whether variables are independent. Generally, when the significance is less than 0.05, it indicates that it meets the standard and is suitable for factor analysis.

This caseSPSSAUThe output KMO and Bartlett sphericity test results are as follows:

From the results, the KMO value of 0.722 is greater than 0.6, so factor analysis can be performed. At the same time, the Bartlett sphericity test result shows that the p value is less than 0.05, and factor analysis can be performed.

3. Extract common factors

Extract common factors based on the standard that the characteristic root is greater than 1, SPSSAU obtain the characteristic root of each factor And the variance explanation rate is shown in the table below:

Analyzing the above table, we can see that there are two factors with characteristic roots greater than 1. The cumulative variance explanation rate of these two common factors is 78.808%. The variance explanation rate of the first factor is 41.346%, and the variance explanation rate of the second factor is 41.346%. 37.462%, indicating that the two extracted common factors can represent 78.808% of the information of the original six railway transportation capacity indicators. Overall, there is less loss of information variables, and the factor analysis effect is ideal.

In addition,the common factors to be extracted can be seen more intuitively from the gravel plot of the characteristic roots. As shown in the figure above, the characteristic root values ​​of the first two factors are both greater than 1, and the curve ratio is steep. The remaining four characteristic root values ​​are all less than 1, and the characteristic root value curve gradually becomes relatively gentle. That is, the extraction of the first two factors can represent Most of the information for all original rail transport indicators is consistent with the results obtained for the variance explained rate.

4. Naming and explanation of common factors

After finding the common factors, in order to understand the actual meaning of the common factors and facilitate analysis of the problem, you need to continue factor rotation. A commonly used method for rotation is the maximum variance method. The rotated factor loading matrix can intuitively reflect the contribution of each variable to the principal component.The greater the absolute value of the loading coefficient of a variable on a certain common factor, the greater the relationship between the variable and the common factor. The more relevant.

The following table is a table of factor loading coefficients obtained after rotation using the maximum variance method:

Analysis of the above table shows that factor 1 has a larger load on the total railway freight volume, railway operating mileage, and total railway cargo turnover, so these three variables are grouped into one category and named < a i=1>Freight factor (denoted as F1). Factor 2 has larger loadings on railway passenger volume, railway passenger turnover, and the number of railway transportation employees, so these three variables are classified into another category and named Passenger Transport Factor (recorded as F2).

5. Calculate factor scores

After determining the factors, further calculate the scores of each factor,SPSSAUThe output component score coefficient matrix is ​​as follows:

According to the component score coefficient matrix, the common factor F is obtained and the variable X represents the factor score function of the linear combination:

F1=-0.203*Railway passenger volume-0.178*Railway passenger turnover+0.537*Total railway freight volume+0.294*Railway operating mileage+0.333*Total railway freight turnover+0.135*Number of railway transportation employees

F2=0.506*Railway passenger volume+0.488*Railway passenger turnover-0.321*Total railway freight volume+0.025*Railway operating mileage-0.014*Total railway freight turnover+0.197*Number of railway transportation employees

This process can be completed by hand calculation, but it should be noted that the standardized data substitution formula is used.

Before we perform analysis, check [Factor Score], and SPSSAU automatically saves the common factor score, as shown below:

6. Calculate the comprehensive score

Carry out comprehensive evaluation by substituting indicator data into factor expressions, calculating comprehensive scores, analyzing the results and conducting comprehensive evaluation. That is, based on the scores of two common factors, and then performing a linear weighted average based on the weight of the variance explanation rate of each factor, finally a comprehensive score model is obtained:

Note: The numerator is the variance explanation rate after rotation of two common factors, and the denominator is the cumulative variance explanation rate after rotation.

After checking [Comprehensive Score], SPSSAU will automatically save the comprehensive score. The results are shown in the figure below:

After obtaining the comprehensive score, you can download the data to a local computer and use excel to sort the comprehensive score. This ranking represents the railway transportation capacity of 31 provinces. Finally, it was compiled into the following table:

Analyzing the comprehensive score table of railway transportation capacity in 31 provinces, it can be seen that Hebei Province has the strongest railway transportation capacity and Hainan Province has the weakest railway transportation capacity...

At this point, factor analysis ends.

Guess you like

Origin blog.csdn.net/m0_37228052/article/details/134672902