Square R R R language seeking party logistic regression

Original link: http://tecdat.cn/?p=6295

 

 Not all results / dependent variables can be modeled using linear regression were reasonable. Perhaps the second most common regression model is logistic regression, it applies to binary outcome data. How to calculate the R-squared logistic regression model? 

 

McFadden R squared

In R, GLM (Generalized Linear Model) command is a standard command is used to fit logistic regression. As far as I know, the object does not fit glm directly to you any pseudo R-squared value, but can be easily calculated measure of McFadden. To this end, we first fitted model we are interested in, and then only contain null model intercept. Then we can use the model number R-squared fit McFadden likelihood values ​​calculated:

mod < -  glm(y~x,family =“binomial”)
nullmod < -  glm(y~1,family =“binomial”)
1-logLik(MOD)/ logLik(nullmod)

 In order to understand the strength required to obtain a predictor McFadden R-squared values, we will use a single binary data to simulate the X-prediction, we first attempt to P (Y = 1 | X = 0) = 0.3 and P (Y = 1 | X = 1) = 0.7:

set.seed(63126)
n < -  10000
x < -  1 *( (n)<0.5)
pr < - (x == 1)* 0.7 +(x == 0)* 0.3
y < -  1 *(  f(n)<pr)
mod < -  glm(y~x,family =“binomial”)
nullmod < -  glm(y~1,family =“binomial”)
1-logLik(MOD)/  (nullmod)
'log Lik。' 0.1320256(df = 2)

 Therefore, even if the probability of X to Y = quite a strong impact, McFadden R2 of only 0.13. To increase it, we must make the P (Y = 1 | X = 0) and P (Y = 1 | X = 1) be more different:

set.seed(63126)
n < -  10000
x < -  1 *(runif(n)<0.5)
pr < - (x == 1)* 0.9 +(x == 0)* 0.1
y < -  1 *( (n)<pr)
mod < -  glm(y~x,family =“binomial”)
nullmod < -  glm(y~1,family =“binomial”)
1- (MOD)/  (nullmod)
[1] 0.5539419

Even if the X P (Y = 1) changes from 0.1 0.9, McFadden R-squared of only 0.55. Finally, we will try to values ​​of 0.01 and 0.99 - I would call a very powerful effect!

set.seed(63126)
n < -  10000
x < -  1 *(runif(n)<0.5)
pr < - (x == 1)* 0.99 +(x == 0)* 0.01
y < -  1 *( (n) pr)
mod < -  glm(y~x,family =“binomial”)
nullmod < -  glm(y~1,family =“binomial”)
1- (MOD)/  ( )
[1] 0.9293177

Now we have a value closer to 1. 

Two packet data with a single data

 

data < -  data.frame(s = c(700,300),f = c(300,700),x = c(0,1))
     SFX
1 700 300 0
2 300 700 1

 In order to make the data suitable for logistic regression model in R, we can transfer the response to the glm function:

Call:
glm(formula = cbind(s, f) ~ x, family = "binomial", data = data)

Deviance Residuals: 
[1]  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.84730    0.06901   12.28   <2e-16 ***
x           -1.69460    0.09759  -17.36   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3.2913e+02  on 1  degrees of freedom
Residual deviance: 1.3323e-13  on 0  degrees of freedom
AIC: 18.371

Number of Fisher Scoring iterations: 2

We will now grouped Bernoulli binomial data into the data, and for the same logistic regression model. 

individualData <-  (cbind(data,y=0),cbind(data,y=1))
individualData$freq <- individualData$s
individualData$freq[ $y==0] <-  $f[individualData$y==0]
mod2 <- glm(y~x, family="binomial",data= ,weight=freq)
summary(mod2)

Call:
glm(formula = y ~ x, family = "binomial", data = individualData, 
    weights = freq)

Deviance Residuals: 
     1       2       3       4  
-26.88  -22.35   22.35   26.88  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.84730    0.06901   12.28   <2e-16 ***
x           -1.69460    0.09759  -17.36   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2772.6  on 3  degrees of freedom
Residual deviance: 2443.5  on 2  degrees of freedom
AIC: 2447.5

Number of Fisher Scoring iterations: 4

As expected, we get the same from the packet data box, parameter estimation and inference. 

nullmod1 <- glm(cbind(s,f)~1, family="binomial",data)
nullmod2 <- glm(y~1, family="binomial",data=individualData, =freq)
1-logLik(mod1)/logLik(nullmod1)
'log Lik.' 0.9581627 (df=2)
1-logLik(mod2)/logLik(nullmod2)
'log Lik.' 0.1187091 (df=2)

We see R-squared packet data model is 0.96, and a single data model R-squared of 0.12. 

Thank you for reading this article, you have any questions please leave a comment below!

 

 

Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
Click here to send me a message QQ:3025393450
 
[Service] Scene  
Research projects; 
 
Companies outsourcing; online and offline one training; data collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy services
[] Big Data Big Data tribal tribe to provide customized one-stop data mining and statistical analysis consultancy services
Share the latest information about big data, data analysis, study a little every day, so we do have the attitude of people together data [] Big Data Big Data tribal tribe to provide customized one-stop data mining and statistical analysis consultancy services
Micro-channel customer service number: lico_9e
QQ exchange group: 186 388 004  Big Data tribe

Welcome to elective our R language data analysis will be mining will know the course!

 

[] Big Data Big Data tribal tribe to provide customized one-stop data mining and statistical analysis consultancy services


Welcome attention to micro-channel public number for more information about data dry!
[] Big Data Big Data tribal tribe to provide customized one-stop data mining and statistical analysis consultancy services

Guess you like

Origin www.cnblogs.com/tecdat/p/11460094.html