R Modeling Language REGRESSION: error variance explained by Heteroscedasticity

Original link: http://tecdat.cn/?p=10207


 

OLS estimate in the social sciences when applied to the regression model, one hypothesis is that the same variance , I prefer always the error variance. This means that the error variance is no mode of the system, which means that the models predict the same difference at all levels.

Heteroskedasticity variance is the same complementary in nature, it does not make the OLS bias. If you do not like the social sciences most people do care about p value, heteroscedasticity may not be a problem.

Econometricians have developed a variety of heteroscedasticity consistent standard errors, so they can continue to apply OLS, while adjusting the non-constant error variance. These corrections Wikipedia page lists a number of these alternative name used by the standard error.

We provide a likelihood function, and both functions will find the parameters to maximize the likelihood estimate.

Let's look at a simple example:

First, I 500 observations extracted from 3 normal mean and standard deviation of 1.5, and saved to the data set:

dat <- data.frame(y = rnorm(n = 500, mean = 3, sd = 1.5))

Mean and standard deviation of the sample is:

mean(dat$y)
[1] 2.999048

sd(dat$y)
[1] 1.462059

I can ask this question, the normal distribution, mean and standard deviation parameters which can maximize the likelihood of the observed variables to?

m.sd <- mle2(y ~ dnorm(mean = a, sd = exp(b)), data = dat,
             start = list(a = rnorm(1), b = rnorm(1)))

In the above syntax, R variable y average is a constant A , and y standard deviation is a constant B . Standard deviation to take power, to ensure that it will never be negative. We provide an initial value , so it can be estimated before the start of convergence to maximize the value of possibilities. Sufficient random number initial value.

m.sd

Call:
mle2(minuslogl = y ~ dnorm(mean = a, sd = exp(b)), start = list(a = rnorm(1),
    b = rnorm(1)), data = dat)

Coefficients:
        a         b
2.9990478 0.3788449

Log-likelihood: -898.89

The coefficient a is very similar to the average data. We must coefficient b exponentiation, to obtain a standard deviation:

exp(coef(m.sd)[2])
       b
1.460596

This is similar to the standard deviation obtained above us. Another interesting fact is that the above syntax demonstrate lm()similar functions coef(), summary()and can mle2()use the object.

We performed similar to the above maximum likelihood estimation using only the intercept of the regression model OLS estimation:

coef(lm(y ~ 1, dat))
(Intercept)
   2.999048

sigma(lm(y ~ 1, dat))
[1] 1.462059

Intercept is the average of the data, the residual standard deviation is the standard deviation.

Heteroscedasticity regression model

Consider the following research. We assigned two groups, a treatment group, 30 is a personal, the other was a control group, each of 100 individual, treatment group is determined to match the result of covariates. Therefore, we are interested in the therapeutic effect, and let us assume a simple mean difference is sufficient. As it happens, this treatment is effective in addition, has further homogenization, e.g., the subject is better brainwashed improved results. The following data sets should meet the above scheme:


Treatment state has 100 participants is 0 (control group), the average value 0 and standard deviation 1. There are 30 participants treatment state is 1 (treatment group), an average of 0.3, a value of a standard deviation of 0.25.

This situation is clearly in violation of the same variance assumptions, however, we continue to therapeutic effects OLS estimate:


Call:

Residuals:
    Min      1Q  Median      3Q     Max
-2.8734 -0.5055 -0.0287  0.4231  3.4097

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.03386    0.09298   0.364    0.716
treat        0.21733    0.19355   1.123    0.264

Residual standard error: 0.9298 on 128 degrees of freedom
Multiple R-squared:  0.009754,	Adjusted R-squared:  0.002018
F-statistic: 1.261 on 1 and 128 DF,  p-value: 0.2636

Treatment effect was 0.22, not statistically significant, p = 0.26p = .26 in a αα level of .05. But we know that is not the same variance variance, because we have created the data, and the residual simple diagnosis versus the fitted values ​​confirms this:


0 marks

First, I record it to recreate the OLS model:


In this function, I is the average of the results to create a model that is a function of the intercept b_int, and coefficient predictor of treatment b_treat. Standard deviation is a constant index. This model is equivalent to the linear model.

However, we know that the variance is not constant, but different groups. We can specify the standard deviation as a function of the group:


Here, we specify the standard deviation of a model, the model as a function of intercept s_int, representing the control group, and the deviation of the intercept s_treat.

We can do better. We can use coefficients from the OLS model as the initial value b_intand b_treat. Running the model:



Maximum likelihood estimation

Call:
(minuslogl = y ~ dnorm(mean = b_int + b_treat * treat, sd = exp(s_int +
    s_treat * treat)), start = list(b_int = coef(m.ols)[1], b_treat = coef(m.ols)[2],
    s_int = rnorm(1), s_treat = rnorm(1)))

Coefficients:
         Estimate Std. Error  z value   Pr(z)    
b_int    0.033862   0.104470   0.3241 0.74584    
b_treat  0.217334   0.112249   1.9362 0.05285 .  
s_int    0.043731   0.070711   0.6184 0.53628    
s_treat -1.535894   0.147196 -10.4344 < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 288.1408

Treatment is substantially the same, but now a p-value .053. Assumed to be much smaller than the pure analysis of variance of 0.26. b_treatPrecision variable much higher, because here the standard error of .11 less than .19.

Standard deviation model suggests that the standard deviation is:

exp(coef(m.het)[3])

   s_int
1.044701

Control group and 1.045:

exp(coef(m.het)[3] + coef(m.het)[4])

    s_int
0.2248858

.22 for the treatment group. These values ​​are close to the analog values ​​as we know it. We can confirm that the sample statistical data:


  treat         y
1     0 1.0499657
2     1 0.2287307

In the absence of heteroscedasticity and allows heteroskedastic case, the model can easily compare models:


Likelihood Ratio Tests
Model 1: m.mle, y~dnorm(mean=b_int+b_treat*treat,sd=exp(s1))
Model 2: m.het, y~dnorm(mean=b_int+b_treat*treat,sd=exp(s_int+s_treat*treat))
  Tot Df Deviance  Chisq Df Pr(>Chisq)    
1      3   347.98                         
2      4   288.14 59.841  1  1.028e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Likelihood ratio test suggested that we improved the model, χ 2 (1) = 59.81, p <0.001χ2 (1 Ge) = 59.81, p <.001.

Thus, we can confirm this single example variance modeling accuracy can be improved. When the impact is zero and we have heteroscedasticity, it is easy to write a Heteroskedasticity MLE and OLS estimated by comparing simulation code.

I'm from the face of the code changes, the average treatment method is to set to zero, so that no mean difference between the two groups. I repeated this process 500 times, saving a therapeutic effect from the OLS and its p-value, the saving treatment from Heteroskedasticity MLE and p values.

Then I draw results:


par(mfrow = c(1, 1))

Level 1

Similar treatment heteroscedasticity OLS and the MLE. However, when the null is true, p value heteroscedasticity MLE models perform better. If null is true, the value of p may be desirable uniform distribution. OLS p-value iteration stacked high.

This time, I repeat this process, the average value of the treatment group was 0.15, so the null hypothesis is false zero effect. 

level 2

Treatment again have the same distribution. However, compared with OLS, MLE heteroscedasticity p-value is much smaller, heteroscedasticity MLE has a greater statistical power to detect a treatment effect.


First, specify a negative log-likelihood function, and then this transfer function to the MLE.


(minuslogl = ll, start = list(b_int = rnorm(1), b_treat = rnorm(1),
    s_int = rnorm(1), s_treat = rnorm(1)))

Coefficients:
         Estimate Std. Error  z value   Pr(z)    
b_int    0.033862   0.104470   0.3241 0.74584    
b_treat  0.217334   0.112249   1.9362 0.05285 .  
s_int    0.043733   0.070711   0.6185 0.53626    
s_treat -1.535893   0.147196 -10.4343 < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

-2 log L: 288.1408

 


Family: gaussian  ( identity )
Formula:          y ~ treat
Dispersion:         ~treat
Data: dat

    AIC      BIC   logLik deviance df.resid
  296.1    307.6   -144.1    288.1      126


Conditional model:
           Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.03386    0.10447   0.324   0.7458  
treat        0.21733    0.11225   1.936   0.0528 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Dispersion model:
           Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.08746    0.14142   0.618    0.536    
treat       -3.07179    0.29439 -10.434   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this case, the degree of dispersion in the range of the variance, it must take the square root of variance index number to retrieve a set of standard deviation above.

Published 445 original articles · won praise 246 · views 970 000 +

Guess you like

Origin blog.csdn.net/qq_19600291/article/details/104074304