Original link: http://tecdat.cn/?p=6349
This week I was discussing with a friend how to deal with covariate with missing values in structural equation modeling (SEM) software. My friends think some SEM achieve certain packages that can be used so-called "full information maximum likelihood" automatically adapt to the deletion of covariates. In the following, I will explore later my lack of sem how Stata command processing covariates in the description.
In order to study how to deal with missing covariates, I will consider the simplest case, the results of which we have a Y and a covariate X, Y given X follows the simple linear regression model. First, we will simulate a large data set, so we know the true parameter values:
Here true intercept parameter is 0, the real slope parameter is the residual error variance 1. 1. Next, let's set some missing covariate values. To this end, we will use the lack of mechanism, in which the probability of missing depends on (fully observed) result Y. This means that the lack of mechanisms to meet the so-called random assumptions missing. Specifically, we will observe X probability is calculated based on a logistic regression model, where Y entered as the only covariate:
Now we can use the Stata command sem to adapt SEM:
In case there is no missing values, sem command defaults to using the maximum likelihood estimate model parameters.
But sem There is another option, which will allow us to observe the use of data from all 10,000 records to fit the model. From the command line, we can select it in the following ways:
Now estimated to be unbiased.
Thus, we obtain unbiased estimation (for this setting data generation), because the sem Stata command (here correctly) assuming normality combined X and Y, and deletion MAR satisfy assumptions.
Non-normal X
Let us now re-run the simulation, but now let X follows the chi-square distribution in one degree of freedom, () drawn by the square of rnormal:
Use the lack of value of the option to run sem, we get:
Now we are again biased estimates, because the joint assumption of normality Y and X are no longer valid. So, if we use this option, when we are missing covariates, we will find joint normality assumption is crucial.
Missing completely at random
Let's last run simulation, the distribution of X-square card again, but now completely lost X random (MCAR):
Although the joint normality assumption is violated, and now we again unbiased. I think this is because when the data is MCAR, even in violation of the normality assumption, it can be consistently estimated mean and covariance structure.
Thank you for reading this article, you have any questions please leave a comment below!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!