How to deal with covariates stata structural equation model (SEM) with missing values in

Original link: http://tecdat.cn/?p=6349

 

This week I was discussing with a friend how to deal with covariate with missing values ​​in structural equation modeling (SEM) software. My friends think some SEM achieve certain packages that can be used so-called "full information maximum likelihood" automatically adapt to the deletion of covariates. In the following, I will explore later my lack of sem how Stata command processing covariates in the description.

 

In order to study how to deal with missing covariates, I will consider the simplest case, the results of which we have a Y and a covariate X, Y given X follows the simple linear regression model. First, we will simulate a large data set, so we know the true parameter values:

gen x = rnormal()
gen y = x + rnormal()
 

Here true intercept parameter is 0, the real slope parameter is the residual error variance 1. 1. Next, let's set some missing covariate values. To this end, we will use the lack of mechanism, in which the probability of missing depends on (fully observed) result Y. This means that the lack of mechanisms to meet the so-called random assumptions missing. Specifically, we will observe X probability is calculated based on a logistic regression model, where Y entered as the only covariate:

gen rxb = -2 + 2 * y
gen r =(runiform()<rpr) 

Now we can use the Stata command sem to adapt SEM:
(7270 observations with missing values excluded)

Endogenous variables

Observed:  y

Exogenous variables

Observed:  x

Fitting target model:

Iteration 0:   log likelihood = -6732.1256  
Iteration 1:   log likelihood = -6732.1256  

Structural equation model                       Number of obs      =      2730
Estimation method  = ml
Log likelihood     = -6732.1256

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  y <-       |
           x |   .6179208   .0179671    34.39   0.000      .582706    .6531355
       _cons |    .999025   .0200306    49.88   0.000     .9597658    1.038284
-------------+----------------------------------------------------------------
     var(e.y)|   .6472101   .0175178                      .6137707    .6824714
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .
 

In case there is no missing values, sem command defaults to using the maximum likelihood estimate model parameters.

But sem There is another option, which will allow us to observe the use of data from all 10,000 records to fit the model. From the command line, we can select it in the following ways:

 

*output cut

Structural equation model                       Number of obs      =     10000
Estimation method  = mlmv
Log likelihood     = -20549.731

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  y <-       |
           x |   .9804851   .0156235    62.76   0.000     .9498637    1.011107
       _cons |  -.0145543    .025363    -0.57   0.566    -.0642649    .0351562
-------------+----------------------------------------------------------------
      mean(x)|   .0032305   .0257089     0.13   0.900     -.047158    .0536189
-------------+----------------------------------------------------------------
     var(e.y)|    1.02696   .0324877                      .9652191     1.09265
       var(x)|   .9847265   .0314871                       .924907    1.048415
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Now estimated to be unbiased.

Thus, we obtain unbiased estimation (for this setting data generation), because the sem Stata command (here correctly) assuming normality combined X and Y, and deletion MAR satisfy assumptions.

Non-normal X
Let us now re-run the simulation, but now let X follows the chi-square distribution in one degree of freedom, () drawn by the square of rnormal:

clear
set seed 6812312
set obs 10000
gen x=(rnormal())^2
gen y=x+rnormal()

gen rxb=-2+*y
gen rpr=(rxb)/(1+exp(rxb))
gen r=(() rpr)
 x=. if r==0

Use the lack of value of the option to run sem, we get:

 

*output cut

Structural equation model                       Number of obs      =     10000
Estimation method  = mlmv
Log likelihood     = -25316.281

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  y <-       |
           x |   .8281994   .0066085   125.32   0.000      .815247    .8411518
       _cons |   .4792567   .0161389    29.70   0.000      .447625    .5108883
-------------+----------------------------------------------------------------
      mean(x)|   .5842649   .0224815    25.99   0.000     .5402019    .6283279
-------------+----------------------------------------------------------------
     var(e.y)|   .7537745   .0157842                      .7234643    .7853546
       var(x)|   3.073801   .0551011                       2.96768    3.183717
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Now we are again biased estimates, because the joint assumption of normality Y and X are no longer valid. So, if we use this option, when we are missing covariates, we will find joint normality assumption is crucial.

Missing completely at random


Let's last run simulation, the distribution of X-square card again, but now completely lost X random (MCAR):

 

gen x=(rnormal())^2
gen y=x+rnormal()
replace x=if (()<0.5)

*output cut

Structural equation model                       Number of obs      =     10000
Estimation method  = mlmv
Log likelihood     = -25495.152

------------------------------------------------------------------------------
             |                 OIM
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
  y <-       |
           x |   .9985166   .0093366   106.95   0.000     .9802173    1.016816
       _cons |  -.0092478   .0158659    -0.58   0.560    -.0403445    .0218488
-------------+----------------------------------------------------------------
      mean(x)|   .9738369   .0158113    61.59   0.000     .9428474    1.004826
-------------+----------------------------------------------------------------
     var(e.y)|   1.033884    .020162                      .9951133    1.074166
       var(x)|    1.83369   .0330307                       1.77008    1.899585
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

Although the joint normality assumption is violated, and now we again unbiased. I think this is because when the data is MCAR, even in violation of the normality assumption, it can be consistently estimated mean and covariance structure.

Thank you for reading this article, you have any questions please leave a comment below!

  

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

[Service] Scene  

Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

[Service] Scene  

Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!

 

Guess you like

Origin www.cnblogs.com/tecdat/p/11466303.html