R language 3.12 correspondence analysis

Correspondence analysis is a further extension of principal component and factor analysis.
Correspondence analysis is proposed based on the deficiencies of factor analysis. Factor analysis methods are divided into R-type factor analysis and Q-type factor analysis. R-type factor analysis studies the correlation between variables (indicators), and Q-type factor analysis studies the correlation between samples. But sometimes it is not only concerned about the correlation between variables or samples, but also about the corresponding relationship between variables and samples, which cannot be explained by factor analysis.
Correspondence analysis: an effective method to analyze the relationship between two or more groups of factors, in the case of discrete, establish a contingency table between factors to analyze the data.
Correspondence analysis: Before performing correspondence analysis on the data, you need to understand whether the factors are independent. If the factors are independent of each other, there is no need to conduct a correspondence analysis. Therefore, before deciding whether to do correspondence analysis, a chi-square test is performed to test whether the factors are independent.

d3.12=read.table("clipboard",header = T)
d3.12
X=data.frame(d3.12)
X
chisq.test(X)(卡方检验)

Insert picture description here
Chi-square=118.1 degrees of freedom (5-1)×(4-1)=12 p-value<0.01, it is believed that income and satisfaction are related.
The basic principle of correspondence analysis:
the probability matrix p=(pij) is obtained by probabilistic transformation of the data matrix X,
Insert picture description here
followed by the standardized transformation of multiple probability matrices, and the transition matrix z=(zij) can be obtained. The
Insert picture description here
transition matrix has very good properties, eg, calculates the coordination of variables. Variance matrix Insert picture description here
Calculate the covariance matrix of the sample Insert picture description here
Theorem: Insert picture description here
Therefore, the complex problem is transformed into the problem of calculating the eigenvalues ​​and eigenvectors.
Q-type and R-type factor analysis reflect different aspects of data, and there must be an internal connection between them. Correspondence analysis combines Q-type and R-type factor analysis organically through clever mathematical transformations.
That is, by obtaining the transition matrix z, the variable covariance matrix A and the sample covariance matrix B are obtained, and A and B have the same non-zero eigenvalues.
Correspondence analysis function ca usage
ca(X) X data matrix, usually frequency table data

library(ca)
ca1=ca(X)
summary(ca1)

Insert picture description here
Compressed to two dimensions, it contains 99.8 information

ca1$rowcoord(行坐标)
ca1$colcoord(列坐标)

Insert picture description here

plot(ca1,gap=0)

Insert picture description here
Correspondence analysis chart:
Group 1: Variable: <10,000
Samples: Some dissatisfaction, very dissatisfied
Group 2: Variables: 30,000, 30,000-50,000
Samples: relatively satisfactory
Group 3: Variables: 50,000-100,000, >100,000
samples: very satisfied

Correspondence analysis has several issues that should be paid attention to:
1. The hypothesis test of correlation cannot be used
2. The dimension is determined by the smallest category contained in the variable
3. The sensitivity to extreme values
4. The research object must be comparable
5. The category of the variable Should cover all situations
6. Different standardized analysis results are different
Insert picture description here

Guess you like

Origin blog.csdn.net/m0_46445293/article/details/104813568