lavaan简明教程 [中文翻译版]

译者注：此文档原作者为比利时Ghent大学的Yves Rosseel博士，lavaan亦为其开发，完全开源、免费。我在学习的时候顺手翻译了一下，向Yves的开源精神致敬。此翻译因偷懒部分删减，但也有增加，有错误请留言

「转载请注明出处」

在开始之前
安装lavaan包
模型语法
例1：验证性因子分析(CFA)
例2：结构方程(SEM)
更多关于语法的内容
6.1 固定参数
6.2 初值
6.3 参数标签
6.4 修改器
6.5 简单相等约束
6.6 非线性相等和不相等约束
引入平均值
结构方程的多组分析
8.1 在部分组中固定参数
8.2 约束一个参数使其在各组中相等
8.3 约束一组参数使其在各组中相等
8.4 测量不变性
增长曲线模型
使用分类变量
将协方差矩阵作为输入
估计方法，标准误差和缺失值
12.1 估计方法
12.2 最大似然估计
12.3 缺失值
间接效应和中介分析
修正指标
从拟合方程中提取信息

摘要

此教程首先介绍lavaan的基本组成部分：模型语法，拟合方程(CFA, SEM和growth)，用来呈现结果的主要函数(summary, coef, fitted, inspect)；
然后提供两个实例；
最后再讨论一些重要话题：均值结构模型(meanstructures)，多组模型(multiple groups)，增长曲线模型(growth curve models)，中介分析(mediation analysis)，分类数据(categorial data)等。

1. 在开始之前

在开始之前，有以下几点需要注意：

lavaan包需要安装 3.0.0或更新版本的R。
lavaan包仍处于未完成阶段，目前尚未实现的功能有：对多层数据的支持，对离散潜变量的支持，贝叶斯估计。
虽然目前是测试版，但是结果精确，语法成熟，可以放心使用。
lavaan包对R语言预备知识要求很低。
此教程不等于参考手册，相关文档正在准备。
lavaan包是完全免费开源的软件，不做任何承诺。
发现bug，到 https://groups.google.com/d/forum/lavaan/ 加组沟通。

2. 安装lavaan包

启动R，并输入：

install.packages("lavaan", dependencies = TRUE)  # 安装lavaan包
library(lavaan)   # 载入lavaan包

出现以下提示，表示载入成功。

3. 模型语法

lavaan包的核心是描述整个模型的“模型语法”。这部分简单介绍语法，更多细节在接下来的示例中可见。

R环境下的回归方程有如下形式：

y ~ x1 + x2 + x3 + x4  # ~左边为因变量y

在lavaan中，一个典型模型是一个回归方程组，其中可以包含潜变量，例如：

y ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2

我们必须通过指示符=~(measured by)来“定义”潜变量。例如，通过以下方式来定义潜变量f1, f2和f3:

f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10

方差和协方差表示如下：

y1 ~~ y1 # 方差
y1 ~~ y2 # 协方差
f1 ~~ f2 # 协方差

只有截距项的回归方程表达如下：

y1 ~ 1
f1 ~ 1

以上4种公式种类(~, ~~, =~, ~ 1)组合成完整的模型语法，用单引号表示如下：

myModel <- ' # 主要回归方程
             y1 + y2 ~ f1 + f2 + x1 + x2
                  f1 ~ f2 + f3
                  f2 ~ f3 + x1 + x2
             
             # 定义潜变量
             f1 =~ y1 + y2 + y3
             f2 =~ y4 + y5 + y6
             f3 =~ y7 + y8 + y9 + y10

             # 方差和协方差
             y1 ~~ y1
             y1 ~~ y2
             f1 ~~ f2
          
             # 截距项
             y1 ~ 1
             f1 ~ 1

如果模型很长，可以将模型(单引号之间的内容)储存入myModel.lav的txt文档中，用以下命令读取：

myModel <- readLines("/mydirectory/myModel.lav")  # 这里需要绝对路径

4. 例1：验证性因子分析(CFA)

lavaan包提供了一个内置数据集叫做HolzingerSwineford，输入以下命令查看数据集描述：

?HolzingerSwineford

数据形式是这样的：

此数据集包含了来自两个学校的七、八年级孩子的智力能力测验分数。在我们的版本里，只包含原有26个测试中的9个，这9个测试分数作为9个测量变量分别对应3个潜变量：

视觉因子(visual)对应x1，x2，x3
文本因子(textual)对应x4，x5，x6
速度因子(speed)对应x7，x8，x9

模型如下图所示：

建立模型代码如下：

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9'
# 然后拟合cfa函数，第一个参数是模型，第二个参数是数据集
fit <- cfa(HS.model, data = HolzingerSwineford1939)
# 再通过summary函数给出结果
summary(fit, fit.measure = TRUE)

结果展示如下：

# 前6行为头部，包含版本号，收敛情况，迭代次数，观测数，用来计算参数的估计量，模型检验统计量，自由度和相关的p值
lavaan (0.5-23.1097) converged normally after  35 iterations 

  Number of observations                           301

  Estimator                                         ML
  Minimum Function Test Statistic               85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

#参数fit.measure = TRUE会显示下面从model test baseline model到SRMR的部分
Model test baseline model:

  Minimum Function Test Statistic              918.852
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092

  Number of free parameters                         21
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (BIC)         7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent Confidence Interval          0.071  0.114
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.065

#以下是参数估计部分
Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000

例1是一个定义模型、拟合模型、提取结果的过程。

5. 例2：结构方程(SEM)

例2中我们使用名为PoliticalDemocracy的数据集，图示如下：

数据形式大致如下(只显示前三行)：

变量	含义	变量	含义
ind60	1960年的非民主情况	y5	1965年专家对出版物自由的评价
dem60	1960年的民主情况	y6	1965年的反对党派自由
dem65	1965年的民主情况	y7	1965年选举的公平性
y1	1960年专家对出版物自由的评价	y8	1965年选举产生的立法机关效率
y2	1960年的反对党派自由	x1	1960年每单位资本GNP
y3	1960年选举的公平性	x2	1960年每单位资本的物质能量消费
y4	1960年选举产生的立法机关效率	x3	1960年工业劳动力占比

模型如下：

model <- '# measurement model 测量模型
          ind60 =~ x1 + x2 + x3
          dem60 =~ y1 + y2 + y3 + y4
          dem65 =~ y5 + y6 + y7 + y8
          
          # regressions 回归
          dem60 ~ ind60
          dem65 ~ ind60 + dem60
          
          # residual correlations 残余相关
          y1 ~~ y5
          y2 ~~ y4 + y6
          y3 ~~ y7
          y4 ~~ y8
          y6 ~~ y8'

# 拟合SEM
fit <- sem(model, data = PoliticalDemocracy)

# 提取结果
summary(fit, standardized = TRUE)  

#与上例不同，这里我们忽略了参数fit.measure = TRUE，用standardized = TRUE来标准化参数值）

结果如下：

lavaan (0.5-23.1097) converged normally after  68 iterations

  Number of observations                            75

  Estimator                                         ML
  Minimum Function Test Statistic               38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ind60 =~                                                              
    x1                1.000                               0.670    0.920
    x2                2.180    0.139   15.742    0.000    1.460    0.973
    x3                1.819    0.152   11.967    0.000    1.218    0.872
  dem60 =~                                                              
    y1                1.000                               2.223    0.850
    y2                1.257    0.182    6.889    0.000    2.794    0.717
    y3                1.058    0.151    6.987    0.000    2.351    0.722
    y4                1.265    0.145    8.722    0.000    2.812    0.846
  dem65 =~                                                              
    y5                1.000                               2.103    0.808
    y6                1.186    0.169    7.024    0.000    2.493    0.746
    y7                1.280    0.160    8.002    0.000    2.691    0.824
    y8                1.266    0.158    8.007    0.000    2.662    0.828

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  dem60 ~                                                               
    ind60             1.483    0.399    3.715    0.000    0.447    0.447
  dem65 ~                                                               
    ind60             0.572    0.221    2.586    0.010    0.182    0.182
    dem60             0.837    0.098    8.514    0.000    0.885    0.885

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .y1 ~~                                                                 
   .y5                0.624    0.358    1.741    0.082    0.624    0.296
 .y2 ~~                                                                 
   .y4                1.313    0.702    1.871    0.061    1.313    0.273
   .y6                2.153    0.734    2.934    0.003    2.153    0.356
 .y3 ~~                                                                 
   .y7                0.795    0.608    1.308    0.191    0.795    0.191
 .y4 ~~                                                                 
   .y8                0.348    0.442    0.787    0.431    0.348    0.109
 .y6 ~~                                                                 
   .y8                1.356    0.568    2.386    0.017    1.356    0.338

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.082    0.019    4.184    0.000    0.082    0.154
   .x2                0.120    0.070    1.718    0.086    0.120    0.053
   .x3                0.467    0.090    5.177    0.000    0.467    0.239
   .y1                1.891    0.444    4.256    0.000    1.891    0.277
   .y2                7.373    1.374    5.366    0.000    7.373    0.486
   .y3                5.067    0.952    5.324    0.000    5.067    0.478
   .y4                3.148    0.739    4.261    0.000    3.148    0.285
   .y5                2.351    0.480    4.895    0.000    2.351    0.347
   .y6                4.954    0.914    5.419    0.000    4.954    0.443
   .y7                3.431    0.713    4.814    0.000    3.431    0.322
   .y8                3.254    0.695    4.685    0.000    3.254    0.315
    ind60             0.448    0.087    5.173    0.000    1.000    1.000
   .dem60             3.956    0.921    4.295    0.000    0.800    0.800
   .dem65             0.172    0.215    0.803    0.422    0.039    0.039

可以看出sem()和cfa()这两个函数很相似，实际上，这两个函数现在几乎是一样的，但在将来可能发生变化。

在添加了standardized参数后，出现了Std.lv， Std.all两列，前者只有潜变量被标准化了，后者为所有变量都标准化了，也被称为“完全标准化解”。

6. 更多关于语法的内容

6.1 固定参数

对于一个对应4个指标的潜变量，lavaan默认将第一个指标的因子载荷固定为1，其他指标为自由。

但如果你有一个很好的理由来让所有因子载荷都固定为1，可以照如下做法：

f =~ y1 + 1*y2 + 1*y3 + 1*y4

一般来说，你需要通过给相关变量预先乘以一个数字来固定公式的参数，这个叫做“预乘机制”。

回想例1中的模型，默认3个潜变量两两相关。如果你想将某对变量的相关系数设为0，你需要为其添加一个协方差公式并将参数设为0.

在下面的语法中，除visual，textual间的协方差自由外，其他均设为0。并且我们希望固定speed的因子为一个单位，所以不需要设定第一个指标x7为1，因此我们给x7乘以NA使x7的因子载荷自由且未知，整个模型如下：

# three-factor model
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ NA*x7 + x8 + x9

# orthogonal factors 正交因子
  visual  ~~ 0*speed
  textual ~~ 0*speed

# fix variance of speed factor
  speed ~~ 1*speed

你也可以用orthogonal = TRUE参数使所有潜变量正交，操作如下：

HS.model <- 'visual  =~ x1 + x2 + x3
             textual =~ x4 + x5 + x6
             speed   =~ x7 + x8 + x9'

fit.HS.ortho <- cfa(HS.model, data = HolzingerSwineford1939, orthogonal = TRUE)

相似的，你可以用std.lv = TRUE使所有潜变量的方差固定为一个单位：

HS.model <- 'visual  =~ x1 + x2 + x3
             textual =~ x4 + x5 + x6
             speed   =~ x7 + x8 + x9'

fit <- cfa(HS.model, data = HolzingerSwineford1939, std.lv = TRUE)

6.2 初值

lavaan包默认为自由参数自动生成初值，你也可以通过start()函数来自己设定初值，示例如下：

visual  =~ x1 + start(0.8)*x2 + start(1.2)*x3
textual =~ x4 + start(0.5)x5 + start(1.0)x6
speed   =~ x7 + start(0.7)x8 + start(1.8)x9

6.3 参数标签

我们也可以自己定义参数标签，标签不能以数字开头：

model <- '
  # latent variable definitions
    ind60 =~ x1 + x2 + myLabel*x3   #将ind60 =~ x3显示为myLabel
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
  # residual (co)variances
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8'
    
fit <- sem(model,
           data = PoliticalDemocracy)
coef(fit)

结果如下：

6.4 修改器

我们预乘机制所使用的固定参数，修改初始值和参数标签等操作，都可以称为修改器，比如：

f =~ y1 + y2 + myLabel*y3 + start(0.5)*y3 + y4
#虽然y3项出现了两次，过滤器仍然会将y3作为一个指标对待

6.5 简单相等约束

有些情况我们会预设变量参数相等，可以用相同的label达到效果，也可以用equal()函数：

visual  =~ x1 + v2*x2 + v2*x3 
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9

#或者用equal()
visual  =~ x1 + x2 + equal("visual=~x2")*x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9

6.6 非线性相等和不相等约束

假设有如下回归方程：

y ~ b1*x1 + b2*x2 + b3*x3

我们用随机数生成一个数据集，并用sem函数拟合：

set.seed(1234)
Data <- data.frame(y = rnorm(100),
                   x1 = rnorm(100),
                   x2 = rnorm(100),
                   x3 = rnorm(100))
model <- 'y ~ b1*x1 + b2*x2 + b3*x3'
fit <- sem(model, data = Data)
coef(fit)

结果为：

    b1     b2     b3   y~~y
-0.052  0.084  0.139  0.970

如果我们想要为b1增加两个约束：\(b_1=(b_2+b_3)^2\) 和 \(b_1\ge\exp(b_2+b_3)\)，操作如下：

model.constr <- '# model with labeled parameters
                 y ~ b1*x1 + b2*x2 + b3*x3
                 # constraints
                 b1 == (b2 + b3)^2
                 b1 > exp(b2 + b3)'
fit <- sem(model.constr, data = Data)
coef(fit)

结果如下：

    b1     b2     b3   y~~y
 0.495 -0.405 -0.299  1.610

7. 引入平均值

总的来说，结构方程用来模拟指标变量的协方差矩阵，但是在一些应用中，引入指标变量的均值也是有用的。一个方法是求助于截距项，使用如下所示的“截距公式”：

variable ~ 1

右边的1代表截距，比如在例1中，我们可以增加截距项到指标变量中：

# three-factor model
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
# intercepts
  x1 ~ 1
  x2 ~ 1 
  x3 ~ 1
  x4 ~ 1
  x5 ~ 1
  x6 ~ 1
  x7 ~ 1
  x8 ~ 1
  x9 ~ 1

然而还有一种更简单的方法：

fit <- cfa(HS.model,
           data = HolzingerSwineford1939,
           meanstructure = TRUE)
summary(fit)

结果如下：

lavaan (0.5-23.1097) converged normally after  35 iterations

  Number of observations                           301

  Estimator                                         ML
  Minimum Function Test Statistic               85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.936    0.067   73.473    0.000
   .x2                6.088    0.068   89.855    0.000
   .x3                2.250    0.065   34.579    0.000
   .x4                3.061    0.067   45.694    0.000
   .x5                4.341    0.074   58.452    0.000
   .x6                2.186    0.063   34.667    0.000
   .x7                4.186    0.063   66.766    0.000
   .x8                5.527    0.058   94.854    0.000
   .x9                5.374    0.058   92.546    0.000
    visual            0.000                           
    textual           0.000                           
    speed             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000

如上图所示，模型包含观测变量和潜变量的截距参数。lavaan包默认cfa()和sem()函数将潜变量截距固定为0，否则模型不能被估计。要注意的是均值结构模型中卡方统计量和自由度和原方程一样，原因是我们引入了9个观测变量的均值，也同时加上了9个额外的结截距参数。新老两模型的估计结果是完全一样的。在实践中，我们增加截距公式的唯一原因是因为一些约束必须被规定。例如，我们必须设定x1，x2，x3，x4的截距都为0.5时，语法如下：

# three-factor model 
  visual  =~ x1 + x2 + x3 
  textual =~ x4 + x5 + x6 
  speed   =~ x7 + x8 + x9
# intercepts with fixed values
  x1 + x2 + x3 + x4 ~ 0.5*1

8. 结构方程的多组分析

lavaan包完全支持多组分析。为了进行多组分析，你需要在数据组中加上组变量的名称。默认情况下，所有组会拟合相同的模型。在以下示例中，我们针对两个学校(Pasteur和Grant-White，可以看到数据集中有关学校的变量名为school)拟合例1的模型：

HS.model <- 'visual  =~ x1 + x2 + x3
             textual =~ x4 + x5 + x6
             speed   =~ x7 + x8 + x9'
fit <- cfa(HS.model,
           data = HolzingerSwineford1939,
           group = "school")
summary(fit)

结果如下：

lavaan (0.5-23.1097) converged normally after  57 iterations

  Number of observations per group         
  Pasteur                                          156
  Grant-White                                      145

  Estimator                                         ML
  Minimum Function Test Statistic              115.851
  Degrees of freedom                                48
  P-value (Chi-square)                           0.000

Chi-square for each group:

  Pasteur                                       64.309
  Grant-White                                   51.542

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.394    0.122    3.220    0.001
    x3                0.570    0.140    4.076    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.183    0.102   11.613    0.000
    x6                0.875    0.077   11.421    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.125    0.277    4.057    0.000
    x9                0.922    0.225    4.104    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.479    0.106    4.531    0.000
    speed             0.185    0.077    2.397    0.017
  textual ~~                                          
    speed             0.182    0.069    2.628    0.009

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.095   52.249    0.000
   .x2                5.984    0.098   60.949    0.000
   .x3                2.487    0.093   26.778    0.000
   .x4                2.823    0.092   30.689    0.000
   .x5                3.995    0.105   38.183    0.000
   .x6                1.922    0.079   24.321    0.000
   .x7                4.432    0.087   51.181    0.000
   .x8                5.563    0.078   71.214    0.000
   .x9                5.418    0.079   68.440    0.000
    visual            0.000                           
    textual           0.000                           
    speed             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.298    0.232    1.286    0.198
   .x2                1.334    0.158    8.464    0.000
   .x3                0.989    0.136    7.271    0.000
   .x4                0.425    0.069    6.138    0.000
   .x5                0.456    0.086    5.292    0.000
   .x6                0.290    0.050    5.780    0.000
   .x7                0.820    0.125    6.580    0.000
   .x8                0.510    0.116    4.406    0.000
   .x9                0.680    0.104    6.516    0.000
    visual            1.097    0.276    3.967    0.000
    textual           0.894    0.150    5.963    0.000
    speed             0.350    0.126    2.778    0.005


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.736    0.155    4.760    0.000
    x3                0.925    0.166    5.583    0.000
  textual =~                                          
    x4                1.000                           
    x5                0.990    0.087   11.418    0.000
    x6                0.963    0.085   11.377    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.226    0.187    6.569    0.000
    x9                1.058    0.165    6.429    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.098    4.153    0.000
    speed             0.276    0.076    3.639    0.000
  textual ~~                                          
    speed             0.222    0.073    3.022    0.003

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.095   51.696    0.000
   .x2                6.200    0.092   67.416    0.000
   .x3                1.996    0.086   23.195    0.000
   .x4                3.317    0.093   35.625    0.000
   .x5                4.712    0.096   48.986    0.000
   .x6                2.469    0.094   26.277    0.000
   .x7                3.921    0.086   45.819    0.000
   .x8                5.488    0.087   63.174    0.000
   .x9                5.327    0.085   62.571    0.000
    visual            0.000                           
    textual           0.000                           
    speed             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.715    0.126    5.676    0.000
   .x2                0.899    0.123    7.339    0.000
   .x3                0.557    0.103    5.409    0.000
   .x4                0.315    0.065    4.870    0.000
   .x5                0.419    0.072    5.812    0.000
   .x6                0.406    0.069    5.880    0.000
   .x7                0.600    0.091    6.584    0.000
   .x8                0.401    0.094    4.249    0.000
   .x9                0.535    0.089    6.010    0.000
    visual            0.604    0.160    3.762    0.000
    textual           0.942    0.152    6.177    0.000
    speed             0.461    0.118    3.910    0.000

如果你想要固定参数，或者自己设定初值，只需要用到如下所示的参数数列。如果只用一个参数的话，这个参数会被用到所有组：

 HS.model <- 'visual  =~ x1 + 0.5*x2 + c(0.6, 0.8)*x3
              textual =~ x4 + start(c(1.2, 0.6))*x5 + a*x6
              speed   =~ x7 + x8 + x9'

注意如上所示的a只会被用于第一组，如果想为每组提供不同的标签，可以使用c(a1, a2)*x6，要注意不能使用c(a, a), 它们会被视为一个相同的参数，使得结果出现问题(请看8.2，这会导致两组中textual =~ x6的估计相等)。

8.1 在部分组中固定参数

操作如下：

f =~ item1 + c(1,NA,1,1)*item2 + item3

8.2 约束一个参数使其在各组中相等

操作如下：

 HS.model <- 'visual  =~ x1 + x2 + c(v3,v3)*x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9'

8.3 约束一组参数使其在各组中相等

一个更方便的办法，通过group.equal()来做“组间相等约束”。例如，约束所有因子载荷使其在各组中相等(观测变量对潜变量在各组中的对应系数都相等，即latent variables中的estimate）：

HS.model <- 'visual  =~ x1 + x2 + x3
             textual =~ x4 + x5 + x6
             speed   =~ x7 + x8 + x9'
fit <- cfa(HS.model,
           data = HolzingerSwineford1939,
           group = "school",
           group.equal = c("loadings"))
summary(fit)

结果如下：

lavaan (0.5-23.1097) converged normally after  42 iterations

  Number of observations per group         
  Pasteur                                          156
  Grant-White                                      145

  Estimator                                         ML
  Minimum Function Test Statistic              124.044
  Degrees of freedom                                54
  P-value (Chi-square)                           0.000

Chi-square for each group:

  Pasteur                                       68.825
  Grant-White                                   55.219

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard


Group 1 [Pasteur]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2      (.p2.)    0.599    0.100    5.979    0.000
    x3      (.p3.)    0.784    0.108    7.267    0.000
  textual =~                                          
    x4                1.000                           
    x5      (.p5.)    1.083    0.067   16.049    0.000
    x6      (.p6.)    0.912    0.058   15.785    0.000
  speed =~                                            
    x7                1.000                           
    x8      (.p8.)    1.201    0.155    7.738    0.000
    x9      (.p9.)    1.038    0.136    7.629    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.416    0.097    4.271    0.000
    speed             0.169    0.064    2.643    0.008
  textual ~~                                          
    speed             0.176    0.061    2.882    0.004

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.941    0.093   52.991    0.000
   .x2                5.984    0.100   60.096    0.000
   .x3                2.487    0.094   26.465    0.000
   .x4                2.823    0.093   30.371    0.000
   .x5                3.995    0.101   39.714    0.000
   .x6                1.922    0.081   23.711    0.000
   .x7                4.432    0.086   51.540    0.000
   .x8                5.563    0.078   71.087    0.000
   .x9                5.418    0.079   68.153    0.000
    visual            0.000                           
    textual           0.000                           
    speed             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.551    0.137    4.010    0.000
   .x2                1.258    0.155    8.117    0.000
   .x3                0.882    0.128    6.884    0.000
   .x4                0.434    0.070    6.238    0.000
   .x5                0.508    0.082    6.229    0.000
   .x6                0.266    0.050    5.294    0.000
   .x7                0.849    0.114    7.468    0.000
   .x8                0.515    0.095    5.409    0.000
   .x9                0.658    0.096    6.865    0.000
    visual            0.805    0.171    4.714    0.000
    textual           0.913    0.137    6.651    0.000
    speed             0.305    0.078    3.920    0.000


Group 2 [Grant-White]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2      (.p2.)    0.599    0.100    5.979    0.000
    x3      (.p3.)    0.784    0.108    7.267    0.000
  textual =~                                          
    x4                1.000                           
    x5      (.p5.)    1.083    0.067   16.049    0.000
    x6      (.p6.)    0.912    0.058   15.785    0.000
  speed =~                                            
    x7                1.000                           
    x8      (.p8.)    1.201    0.155    7.738    0.000
    x9      (.p9.)    1.038    0.136    7.629    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.437    0.099    4.423    0.000
    speed             0.314    0.079    3.958    0.000
  textual ~~                                          
    speed             0.226    0.072    3.144    0.002

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                4.930    0.097   50.763    0.000
   .x2                6.200    0.091   68.379    0.000
   .x3                1.996    0.085   23.455    0.000
   .x4                3.317    0.092   35.950    0.000
   .x5                4.712    0.100   47.173    0.000
   .x6                2.469    0.091   27.248    0.000
   .x7                3.921    0.086   45.555    0.000
   .x8                5.488    0.087   63.257    0.000
   .x9                5.327    0.085   62.786    0.000
    visual            0.000                           
    textual           0.000                           
    speed             0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.645    0.127    5.084    0.000
   .x2                0.933    0.121    7.732    0.000
   .x3                0.605    0.096    6.282    0.000
   .x4                0.329    0.062    5.279    0.000
   .x5                0.384    0.073    5.270    0.000
   .x6                0.437    0.067    6.576    0.000
   .x7                0.599    0.090    6.651    0.000
   .x8                0.406    0.089    4.541    0.000
   .x9                0.532    0.086    6.202    0.000
    visual            0.722    0.161    4.490    0.000
    textual           0.906    0.136    6.646    0.000
    speed             0.475    0.109    4.347    0.000

group.equal()的其他的参数就不翻译了，太累。

intercepts: the intercepts of the observed variables
means: the intercepts/means of the latent variables
residuals: the residual variances of the observed variables
residual.covariances: the residual covariances of the observed variables
lv.variances: the (residual) variances of the latent variables
lv.covariances: the (residual) covariances of the latent varibles
regressions: all regression coefficients in the model

8.4 测量不变性

因子分析的测量不变性(Measurement Invariance)是指测验的观察变量与潜变量之间的关系在不同组别或时间点上保持不变。测量不变性的成立是进行有意义组间比较的重要前提。可以用semTools包的measurementInvariance()函数来按特定次序执行多组分析，每个模型都与基线模型比较并且对先前的模型进行卡方差异性检验，并且展示拟合度量差别。不过这个方法仍旧比较原始。

library(semTools)
measurementInvariance(HS.model,
                      data = HolzingerSwineford1939,
                      group = "school")

9. 增长曲线模型

潜变量模型的另一个重要种类是潜增长曲线模型。增长模型常用于分析纵向和发展数据。这种数据的结果度量基于几个场景，我们需要研究跨时间变化。在许多情况下，随时间变化的轨迹可以用简单线性或二次函数曲线来模拟。随机效应用来捕捉个体间差异，这种随机效应可以很方便地用潜变量表示，也叫做增长因子。

在以下例子中，我们使用人工数据Demo.growth()，包含4个时间点。为了在4个时间点上拟合线性增长模型，我们需要用到含有一个随机截距潜变量和一个随机斜率潜变量的模型。

模型如下：

# linear growth model with 4 timepoints
# intercept and slope with fixed coefficients
 i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
 s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4

数据格式如下：

代码如下：

model <- 'i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
          s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4'
fit <- growth(model, data = Demo.growth)
summary(fit)

结果如下：

lavaan (0.5-23.1097) converged normally after  29 iterations

  Number of observations                           400

  Estimator                                         ML
  Minimum Function Test Statistic                8.069
  Degrees of freedom                                 5
  P-value (Chi-square)                           0.152

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  i =~                                                
    t1                1.000                           
    t2                1.000                           
    t3                1.000                           
    t4                1.000                           
  s =~                                                
    t1                0.000                           
    t2                1.000                           
    t3                2.000                           
    t4                3.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  i ~~                                                
    s                 0.618    0.071    8.686    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .t1                0.000                           
   .t2                0.000                           
   .t3                0.000                           
   .t4                0.000                           
    i                 0.615    0.077    8.007    0.000
    s                 1.006    0.042   24.076    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .t1                0.595    0.086    6.944    0.000
   .t2                0.676    0.061   11.061    0.000
   .t3                0.635    0.072    8.761    0.000
   .t4                0.508    0.124    4.090    0.000
    i                 1.932    0.173   11.194    0.000
    s                 0.587    0.052   11.336    0.000

从技巧上来看，growth()函数与sem()几乎相同。但是其自动包含了均值结构假设，并默认观测截距固定为0，潜变量结局和均值自由估计。现在我们再加入两个变量x1，x2，发展出以下模型：

代码如下：

# a linear growth model with a time-varying covariate
model <- '# intercept and slope with fixed coefficients
          i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
          s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4
          # regressions
          i ~ x1 + x2
          s ~ x1 + x2
          # time-varying covariates
          t1 ~ c1
          t2 ~ c2
          t3 ~ c3
          t4 ~ c4'
          
fit <- growth(model, data = Demo.growth)
summary(fit)

10. 使用分类变量

对于外生分类变量：直接转换成虚拟变量。

对于内生分类变量：可以直接处理，目前只有WLS方法可用。

 fit <- cfa(myModel, data = myData,
           ordered=c("item1","item2",
                     "item3","item4"))

11. 将协方差矩阵作为输入

在没有完整数据集但有协方差矩阵时可以使用，如下例：

# 输入协方差矩阵
lower <- '
 11.834
  6.947   9.364
  6.819   5.091  12.532
  4.783   5.028   7.495   9.986
 -3.839  -3.889  -3.841  -3.625  9.610
-21.899 -18.831 -21.748 -18.775 35.522 450.288'

# 用getCov建立完整协方差矩阵，输出是一个矩阵
wheaton.cov <- 
getCov(lower, names = c("anomia67", "powerless67",
                        "anomia71", "powerless71",
                        "education", "sei"))

# 模型描述
wheaton.model <- '
  # latent variables
    ses     =~ education + sei
    alien67 =~ anomia67 + powerless67
    alien71 =~ anomia71 + powerless71
  # regressions
    alien71 ~ alien67 + ses
    alien67 ~ ses
  # correlated residuals
    anomia67 ~~ anomia71
    powerless67 ~~ powerless71'
    
fit <- sem(wheaton.model,
           sample.cov = wheaton.cov,
           sample.nobs = 932)
summary(fit, standardized = TRUE)

结果如下：

lavaan (0.5-23.1097) converged normally after  73 iterations

  Number of observations                           932

  Estimator                                         ML
  Minimum Function Test Statistic                4.735
  Degrees of freedom                                 4
  P-value (Chi-square)                           0.316

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  ses =~                                                                
    education         1.000                               2.607    0.842
    sei               5.219    0.422   12.364    0.000   13.609    0.642
  alien67 =~                                                            
    anomia67          1.000                               2.663    0.774
    powerless67       0.979    0.062   15.895    0.000    2.606    0.852
  alien71 =~                                                            
    anomia71          1.000                               2.850    0.805
    powerless71       0.922    0.059   15.498    0.000    2.628    0.832

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  alien71 ~                                                             
    alien67           0.607    0.051   11.898    0.000    0.567    0.567
    ses              -0.227    0.052   -4.334    0.000   -0.207   -0.207
  alien67 ~                                                             
    ses              -0.575    0.056  -10.195    0.000   -0.563   -0.563

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .anomia67 ~~                                                           
   .anomia71          1.623    0.314    5.176    0.000    1.623    0.356
 .powerless67 ~~                                                        
   .powerless71       0.339    0.261    1.298    0.194    0.339    0.121

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .education         2.801    0.507    5.525    0.000    2.801    0.292
   .sei             264.597   18.126   14.597    0.000  264.597    0.588
   .anomia67          4.731    0.453   10.441    0.000    4.731    0.400
   .powerless67       2.563    0.403    6.359    0.000    2.563    0.274
   .anomia71          4.399    0.515    8.542    0.000    4.399    0.351
   .powerless71       3.070    0.434    7.070    0.000    3.070    0.308
    ses               6.798    0.649   10.475    0.000    1.000    1.000
   .alien67           4.841    0.467   10.359    0.000    0.683    0.683
   .alien71           4.083    0.404   10.104    0.000    0.503    0.503

12. 估计方法，标准误差和缺失值

12.1 估计方法

如果所有数据都是连续的，默认估计方法是最大似然法，还有一些其他方法：

"GLS": generalized least squares. 只能用于完整数据
"WLS": weighted least squares (sometimes called ADF estimation). 只能用于完整数据
"DWLS": diagonally weighted least squares
"ULS": unweighted least squares

许多估计方法都有“稳健”变体，意味着它们提供稳健标准误差和缩放的检验统计量。例如，对于最大似然，lavaan提供如下稳健变体：

"MLM": maximum likelihood estimation with robust standard errors and a Satorra-Bentler scaled test statistic. 只能用于完整数据
"MLMVS": maximum likelihood estimation with robust standard errors and a mean- and variance adjusted test statistic (aka the Satterthwaite approach). 只能用于完整数据
"MLMV": maximum likelihood estimation with robust standard errors and a mean- and variance adjusted test statistic (using a scale-shifted approach). 只能用于完整数据
"MLF": for maximum likelihood estimation with standard errors based on the first-order derivatives, and a conventional test statistic. 可以用于完整和非完整数据
"MLR": maximum likelihood estimation with robust (Huber-White) standard errors and a scaled test statistic that is (asymptotically) equal to the Yuan-Bentler test statistic. 可以用于完整和非完整数据

12.2 最大似然估计

lavaan默认使用有偏样本协方差矩阵，与Mplus所使用方法相似(“normal”)。我们也可以使用likelihood = "wishart"来使用无偏协方差用以接近EQS, LISREL或AMOS的计算结果(“Wishart”)：

fit <- cfa(HS.model,
           data = HolzingerSwineford1939,
fit

12.3 缺失值

默认为列表状态删除，在进行统计量的计算时，把含有缺失值的记录删除，这种方法可以用于计算全体无缺失值数据的均数、协方差和标准差。

13. 间接效应和中介分析

设想一个典型的中介分析，X->M->Y，随机生成人工数据，示例如下：

set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
model <- '# direct effect
            Y ~ c*X
          # mediator
            M ~ a*X
            Y ~ b*M
          # indirect effect (a*b)
            ab := a*b
          # total effect
            total := c + (a*b)'
            
fit <- sem(model, data = Data)
summary(fit)

结果如下：

lavaan (0.5-23.1097) converged normally after  12 iterations

  Number of observations                           100

  Estimator                                         ML
  Minimum Function Test Statistic                0.000
  Degrees of freedom                                 0
  Minimum Function Value               0.0000000000000

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X          (c)    0.036    0.104    0.348    0.728
  M ~                                                 
    X          (a)    0.474    0.103    4.613    0.000
  Y ~                                                 
    M          (b)    0.788    0.092    8.539    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y                 0.898    0.127    7.071    0.000
   .M                 1.054    0.149    7.071    0.000

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.374    0.092    4.059    0.000
    total             0.410    0.125    3.287    0.001

14. 修正指标

可以通过在summary()函数中增加参数modindices = TRUE，来得出修正指标，例如：

fit <- cfa(HS.model,
           data = HolzingerSwineford1939)
mi <- modindices(fit)
mi[mi$op == "=~",]

结果为：

       lhs op rhs     mi    epc sepc.lv sepc.all sepc.nox
25  visual =~  x4  1.211  0.077   0.069    0.059    0.059
26  visual =~  x5  7.441 -0.210  -0.189   -0.147   -0.147
27  visual =~  x6  2.843  0.111   0.100    0.092    0.092
28  visual =~  x7 18.631 -0.422  -0.380   -0.349   -0.349
29  visual =~  x8  4.295 -0.210  -0.189   -0.187   -0.187
30  visual =~  x9 36.411  0.577   0.519    0.515    0.515
31 textual =~  x1  8.903  0.350   0.347    0.297    0.297
32 textual =~  x2  0.017 -0.011  -0.011   -0.010   -0.010
33 textual =~  x3  9.151 -0.272  -0.269   -0.238   -0.238
34 textual =~  x7  0.098 -0.021  -0.021   -0.019   -0.019
35 textual =~  x8  3.359 -0.121  -0.120   -0.118   -0.118
36 textual =~  x9  4.796  0.138   0.137    0.136    0.136
37   speed =~  x1  0.014  0.024   0.015    0.013    0.013
38   speed =~  x2  1.580 -0.198  -0.123   -0.105   -0.105
39   speed =~  x3  0.716  0.136   0.084    0.075    0.075
40   speed =~  x4  0.003 -0.005  -0.003   -0.003   -0.003
41   speed =~  x5  0.201 -0.044  -0.027   -0.021   -0.021
42   speed =~  x6  0.273  0.044   0.027    0.025    0.025

再根据MI(一般以4为标准)和EPC值进行调整。

15. 从拟合方程中提取信息

summary()：结果概览
parameterEstimates(fit)：参数估计

 fit <- cfa(HS.model, data=HolzingerSwineford1939)
parameterEstimates(fit)

standardizedSolution()：标准化参数估计

 fit <- cfa(HS.model, data=HolzingerSwineford1939)
standardizedSolution(fit)

fitted()：拟合协方差矩阵和均值向量

 fit <- cfa(HS.model, data=HolzingerSwineford1939)
fitted(fit)

resid()：拟合模型残差，[观测协方差矩阵/均值向量]和[隐含协方差矩阵/均值向量]的差

 fit <- cfa(HS.model, data=HolzingerSwineford1939)
resid(fit, type="standardized")

fitMeasures()：返回所有拟合度量指标

 fit <- cfa(HS.model, data=HolzingerSwineford1939)
fitMeasures(fit)
fitMeasures(fit, "cfi") # 提取某个具体指标，例如cfi

inspect()：查看拟合的lavaan对象具体信息

基于R语言的结构方程：lavaan简明教程 [中文翻译版]

lavaan简明教程 [中文翻译版]

目录

摘要

1. 在开始之前

2. 安装lavaan包

3. 模型语法

4. 例1：验证性因子分析(CFA)

5. 例2：结构方程(SEM)

6. 更多关于语法的内容

6.1 固定参数

6.2 初值

6.3 参数标签

6.4 修改器

6.5 简单相等约束

6.6 非线性相等和不相等约束

7. 引入平均值

8. 结构方程的多组分析

8.1 在部分组中固定参数

8.2 约束一个参数使其在各组中相等

8.3 约束一组参数使其在各组中相等

8.4 测量不变性

9. 增长曲线模型

10. 使用分类变量

11. 将协方差矩阵作为输入

12. 估计方法，标准误差和缺失值

12.1 估计方法

12.2 最大似然估计

12.3 缺失值

13. 间接效应和中介分析

14. 修正指标

15. 从拟合方程中提取信息

猜你喜欢