Python implements factor analysis (FA)

1 Introduction to factor analysis

Factor analysis is the extension and development of principal component analysis and is a method of dimensionality reduction in multivariate statistical analysis. Factor analysis is to study the internal dependence of the correlation matrix or covariance matrix. It synthesizes multiple variables into a few factors to reproduce the correlation between the original variables and factors.

Corresponding concept reference:
https://blog.csdn.net/mengjizhiyou/article/details/83241469

General steps for factor analysis:

1) Standardize the original data

2) Whether it is suitable for factor analysis

Before factor analysis, first perform the KMO test and Bartley's sphere test:

  • KMO(Kaiser-Meyer-Olkin)

The test statistic is an indicator used to compare simple correlation coefficients and partial correlation coefficients between variables.

The KMO statistic takes a value between 0 and 1. When the sum of squares of simple correlation coefficients among all variables is much greater than the sum of squares of partial correlation coefficients:

The value is close to 1. The closer the KMO value is to 1, the stronger the correlation between variables, which is suitable for factor analysis/

The closer the value is to 0, the weaker the correlation between variables and is not suitable for factor analysis.

0.9 or above means very suitable; 0.8 means suitable; 0.7 means average; 0.6 means not very suitable; below 0.5 means extremely unsuitable.

  • Bartlett test of sphericity

The statistic of Bartlett's test of sphericity is obtained from the determinant of the correlation coefficient matrix.

If this value is large and its corresponding associated probability value is less than the significance level in the user's mind, then the null hypothesis should be rejected and it is considered that the correlation coefficient matrix cannot be the unit matrix, that is, there is a correlation between the original variables and it is suitable to make decisions. Ingredient analysis.

On the contrary, if the statistic is relatively small and its corresponding accompanying probability is greater than the significance level, the null hypothesis cannot be rejected and it is considered that the correlation coefficient matrix may be a unit matrix and is not suitable for factor analysis.

3) Calculate the correlation matrix R

4) Calculate the eigenvalues ​​and eigenvectors of the correlation matrix R

5) Determine the number of common factors k

The factor number is determined according to the variance contribution, >80%.

6) Construct the initial factor loading matrix and interpret it

7) Whether factor conversion is required

Perform a rotation transformation on the initial factor loading matrix A. The rotation transformation simplifies the structure of the initial factor loading matrix and makes the relationship clear, making the factor variables more interpretable. If the initial factors are not relevant, you can use varimax orthogonal rotation. If the initial For the correlation between factors, oblique rotation can be used. After rotation, a relatively ideal new factor loading matrix R' can be obtained.

8) Calculate factor scores

9) Calculate the total score

The total score is calculated based on the weight of the variance contribution rate after common factor rotation. The calculated total score can be used for ranking and evaluation analysis.

10) Build a model

Simply model the factor scores as variables.

2 python implementation

# 参考:https://cloud.tencent.com/developer/article/1642275

import numpy.linalg as nlg #导入nlg函数,linalg=linear+algebra
from math import sqrt
from numpy import eye, asarray, dot, sum, diag #导入eye,asarray,dot,sum,diag 函数
from numpy.linalg import svd #导入奇异值分解函数


"""构建数据集"""
data=pd.DataFrame(np.random.randint(50,100,size=(5, 10)))
data.columns=["特征1","特征2","特征3","特征4","特征5","特征6","特征7","特征8","特征9","特征10"]
data.index=['对象1','对象2','对象3','对象4','对象5']


"""将原始数据标准化处理 """
data=(data-data.mean())/data.std() # 0均值规范化

"""计算相关矩阵"""
C=data.corr() #相关系数矩阵

"""计算相关矩阵及其特征值和特征向量 """
eig_value,eig_vector=nlg.eig(C) #计算特征值和特征向量
eig=pd.DataFrame() #利用变量名和特征值建立一个数据框
eig['names']=data.columns#列名
eig['eig_value']=eig_value#特征值


"""确定公共因子个数k """
for k in range(1,11): #确定公共因子个数
    if eig['eig_value'][:k].sum()/eig['eig_value'].sum()>=0.8: #如果解释度达到80%, 结束循环
        print(k)
        break
        

"""构造初始因子载荷矩阵A"""
col0=list(sqrt(eig_value[0])*eig_vector[:,0]) #因子载荷矩阵第1列
col1=list(sqrt(eig_value[1])*eig_vector[:,1]) #因子载荷矩阵第2列
col2=list(sqrt(eig_value[2])*eig_vector[:,2]) #因子载荷矩阵第3列
A=pd.DataFrame([col0,col1,col2]).T #构造因子载荷矩阵A
A.columns=['factor1','factor2','factor3'] #因子载荷矩阵A的公共因子


"""因子转换"""
h=np.zeros(10) #变量共同度,反映变量对共同因子的依赖程度,越接近1,说明公共因子解释程度越高,因子分析效果越好
D=np.mat(np.eye(10))#特殊因子方差,因子的方差贡献度 ,反映公共因子对变量的贡献,衡量公共因子的相对重要性
A=np.mat(A) #将因子载荷阵A矩阵化

for i in range(10):
    a=A[i,:]*A[i,:].T #行平方和
    h[i]=a[0,0]  #计算变量X共同度,描述全部公共因子F对变量X_i的总方差所做的贡献,及变量X_i方差中能够被全体因子解释的部分
    D[i,i]=1-a[0,0] #因为自变量矩阵已经标准化后的方差为1,即Var(X_i)=第i个共同度h_i + 第i个特殊因子方差

def varimax(Phi, gamma = 1.0, q =10, tol = 1e-6): #定义方差最大旋转函数
    p,k = Phi.shape #给出矩阵Phi的总行数,总列数
    R = eye(k) #给定一个k*k的单位矩阵
    d=0
    for i in range(q):
        d_old = d
        Lambda = dot(Phi, R)#矩阵乘法

        u,s,vh = svd(dot(Phi.T,asarray(Lambda)**3 - (gamma/p) * dot(Lambda, diag(diag(dot(Lambda.T,Lambda)))))) #奇异值分解svd

        R = dot(u,vh)#构造正交矩阵R

        d = sum(s)#奇异值求和

    if d_old!=0 and d/d_old:
        return dot(Phi, R)#返回旋转矩阵Phi*R

rotation_mat=varimax(A)#调用方差最大旋转函数
rotation_mat=pd.DataFrame(rotation_mat)#数据框化


"""计算因子得分"""
data=np.mat(data) #矩阵化处理
factor_score=(data).dot(A) #计算因子得分
factor_score=pd.DataFrame(factor_score)#数据框化
factor_score.columns=['因子A','因子B','因子C'] #对因子变量进行命名
factor_score
#factor_score.to_excel(outputfile)#打印输出因子得分矩阵

You can call the package directly factor_analyzer, see for details:

https://cloud.tencent.com/developer/article/1897844?from=article.detail.1642275

reference:

https://cloud.tencent.com/developer/article/1642275
http://www.cdadata.com/7460

Guess you like

Origin blog.csdn.net/mengjizhiyou/article/details/127824294