The most common model is used to analyze the data sequence number of a logical model. Essentially, you will be treated as categorical results show continuous latent variables. This predictor variables have an impact on its results in only one way, and therefore obtain a regression coefficient for each predictor variable. However, this model has several intercept, they represent the variable segmentation to create points classification performance observed.
As in ordinary regression model, each predictor variables affect the results in a manner that is proportional odds assumptions or constraints. Alternatively, you can let each predictor variables have different effects on the results in each entry point.
How to use univariate GLM software model this? On page about UCLA idre multivariate random coefficient model of the article. Here are important because they use nlme
(univariate linear mixed model software) the results of the multivariate modeling. The basic idea is to stack up the data, making it a repeated measures, one kind of signal is found, but the signals sent to the software, i.e. the results are different, thereby requiring different predictors intercept and slope.
Therefore, we have to do is to convert data from a wide long, it is conventional binomial model, but we need to tell the different models for each level estimate of the intercept. To this end, I have to use unstructured
general estimating equations (GEE) working correlation structure. 3
demonstration
library(ordinal) # For ordinal regression to check our results
library(geepack) # For GEE with binary data
data set.
soup <- ordinal::soup
soup$ID <- 1:nrow(soup) # Create a person ID variable
str(soup)
'data.frame': 1847 obs. of 13 variables:
$ RESP : Factor w/ 185 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PROD : Factor w/ 2 levels "Ref","Test": 1 2 1 2 1 2 2 2 2 1 ...
$ PRODID : Factor w/ 6 levels "1","2","3","4",..: 1 2 1 3 1 6 2 4 5 1 ...
$ SURENESS: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 6 5 5 6 5 5 2 5 5 2 ...
$ DAY : Factor w/ 2 levels "1","2": 1 1 1 1 2 2 2 2 2 2 ...
$ SOUPTYPE: Factor w/ 3 levels "Self-made","Canned",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SOUPFREQ: Factor w/ 3 levels ">1/week","1-4/month",..: 1 1 1 1 1 1 1 1 1 1 ...
$ COLD : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ EASY : Factor w/ 10 levels "1","2","3","4",..: 7 7 7 7 7 7 7 7 7 7 ...
$ GENDER : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 2 2 2 2 ...
$ AGEGROUP: Factor w/ 4 levels "18-30","31-40",..: 4 4 4 4 4 4 4 4 4 4 ...
$ LOCATION: Factor w/ 3 levels "Region 1","Region 2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
I use SURENESS
variables. It has six levels. Use DAY
and GENDER
variables be modeled.
# Select variables to work with
soup <- dplyr::select(soup, ID, SURENESS, DAY, GENDER)
# I like dummy variables with recognizable names
soup$girl <- ifelse(soup$GENDER == "Female", 1, 0) # Make male reference group
soup$day2 <- ifelse(soup$DAY == "2", 1, 0) # Make day 1 reference group
The next step is to convert the results of the sequence represented by each threshold of 5 results
Once this is done, we are ready to results of these five new variable conversion.
head(soup.long) # Let's look at the data
ID SURENESS DAY GENDER girl day2 SURE VAL SURE.f
1 1 6 1 Female 1 0 2 1 2
1848 1 6 1 Female 1 0 3 1 3
3695 1 6 1 Female 1 0 4 1 4
5542 1 6 1 Female 1 0 5 1 5
7389 1 6 1 Female 1 0 6 1 6
2 2 5 1 Female 1 0 2 1 2
Let's look did not choose the highest response category of people:
ID SURENESS DAY GENDER girl day2 SURE VAL SURE.f
22 22 4 1 Female 1 0 2 1 2
1869 22 4 1 Female 1 0 3 1 3
3716 22 4 1 Female 1 0 4 1 4
5563 22 4 1 Female 1 0 5 0 5
7410 22 4 1 Female 1 0 6 0 6
The man chosen SURENESS
VAL
in a. Her score was a front three, the last two of her score is 0, because the threshold value and less than 4 4-5 5-6 thresholds.
The next step is to create a dummy variable for the threshold. These variables will be used to represent the model intercept.
Please note that I will dummies multiplied by -1. In ordinal regression, doing so makes it easier to explain. In short, it ensures a positive coefficient increases the chances from the lower category (for example, 3) move to a higher category (4) or to respond to a higher category of the response.
Now, we are ready to run the model. We use the GEE. Related structure unstructured
.
Next, I use the standard ordinal regression estimation model:
让我们比较系数和标准误差:
Estimate Estimate.1 Std.err Std. Error Wald z value Pr(>|W|) Pr(>|z|)
SURE.f2 -2.13244 -2.13155 0.10454 0.10450 416.0946 -20.3971 0.0000 0.0000
SURE.f3 -1.19345 -1.19259 0.09142 0.09232 170.4284 -12.9179 0.0000 0.0000
SURE.f4 -0.89164 -0.89079 0.08979 0.09011 98.5995 -9.8857 0.0000 0.0000
SURE.f5 -0.65782 -0.65697 0.08945 0.08898 54.0791 -7.3833 0.0000 0.0000
SURE.f6 -0.04558 -0.04477 0.08801 0.08789 0.2682 -0.5093 0.6046 0.6105
girl -0.04932 -0.04917 0.09036 0.09074 0.2980 -0.5419 0.5851 0.5879
day2 -0.26172 -0.26037 0.08584 0.08579 9.2954 -3.0351 0.0023 0.0024
We can see the results are very close.
However, using the estimated glm()
estimate dependencies between the results can not establish a person's will produce different results.
Estimate Std. Error z value Pr(>|z|)
SURE.f2 -2.15144 0.08255 -26.062 0.0000
SURE.f3 -1.21271 0.06736 -18.004 0.0000
SURE.f4 -0.91149 0.06472 -14.084 0.0000
SURE.f5 -0.67782 0.06327 -10.713 0.0000
SURE.f6 -0.06523 0.06178 -1.056 0.2911
girl -0.07326 0.04961 -1.477 0.1398
day2 -0.26898 0.04653 -5.780 0.0000
Estimates and standard errors were inadequate.
We can easily relax pom.bin
proportional odds model constraints. Let's run some have suggested by easing constraints on predictors of partial proportional odds modelday2
. We dummies and by estimating the threshold of day2
the interaction between the predictor variables to do this.
I also use the name of the parameter to run the same model are compared day2
.
Estimate Estimate.1 Std.err Std. Error Wald z value Pr(>|W|) Pr(>|z|)
SURE.f2 -2.02982 -2.03106 0.11800 0.11834 295.8986 -17.1630 0.00000 0.00000
SURE.f3 -1.22087 -1.22213 0.09829 0.09857 154.2801 -12.3980 0.00000 0.00000
SURE.f4 -0.92773 -0.92899 0.09458 0.09443 96.2112 -9.8375 0.00000 0.00000
SURE.f5 -0.65744 -0.65870 0.09246 0.09188 50.5554 -7.1693 0.00000 0.00000
SURE.f6 -0.04733 -0.04859 0.08955 0.08965 0.2793 -0.5420 0.59714 0.58784
SURE.f2:day2 0.07359 0.07360 0.14148 0.14155 0.2705 0.5199 0.60298 0.60312
SURE.f3:day2 0.31691 0.31697 0.10607 0.10613 8.9270 2.9867 0.00281 0.00282
SURE.f4:day2 0.33301 0.33308 0.09970 0.09973 11.1551 3.3398 0.00084 0.00084
SURE.f5:day2 0.26330 0.26339 0.09618 0.09616 7.4938 2.7391 0.00619 0.00616
SURE.f6:day2 0.26741 0.26748 0.09347 0.09345 8.1842 2.8622 0.00423 0.00421
girl -0.04809 -0.04994 0.09048 0.09077 0.2825 -0.5502 0.59507 0.58221
The results are comparable.
Now, we can model the proportional ratio odds ratio odds binary binary model are compared to test day2
variable constraints. geepack
It allows anova()
for both models Wald test:
Analysis of 'Wald statistic' Table
Model 1 VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + SURE.f2:day2 + SURE.f3:day2 + SURE.f4:day2 + SURE.f5:day2 + SURE.f6:day2
Model 2 VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + day2
Df X2 P(>|Chi|)
1 4 6.94 0.14
The difference between the two models are not statistically significant, indicating that the day2
variable ratio constraint is sufficient.
We can use the function or use ordinal
to compare pom.ord
and npom.ord
modeling anova()
, to perform the same test nomimal_test()
. Both are likelihood ratio test, Wald test more fully than the above-mentioned GEE.
Likelihood ratio tests of cumulative link models:
formula: nominal: link: threshold:
pom.ord SURENESS ~ girl + day2 ~1 logit flexible
npom.ord SURENESS ~ girl ~day2 logit flexible
no.par AIC logLik LR.stat df Pr(>Chisq)
pom.ord 7 5554 -2770
npom.ord 11 5555 -2766 6.91 4 0.14
nominal_test(pom.ord)
Tests of nominal effects
formula: SURENESS ~ girl + day2
Df logLik AIC LRT Pr(>Chi)
<none> -2770 5554
girl 4 -2766 5554 8.02 0.091 .
day2 4 -2766 5555 6.91 0.141
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Both tests converge to the same results, and comparing the test Wald GEE model is also given the same p-value. However, Wald- χ 2χ2 test statistics slightly higher.
Once this is done, of course, use of the ordinal number of the packet is much easier. However, the model is treated as binary may be some benefits, but all of which are out of curiosity rather than need. For some reason, I have yet to figure out when a person tries to use fitted()
a function obtained from the model to predict the probability that returns only one set of probability fit. Ideally, it should return fitting probability value for each threshold. Use geepack
can be obtained directly predict the probability of each level. However, this advantage is negligible.
Also, if familiar with the maximum likelihood estimation, it is possible to simply program the likelihood function.
Examples of the above syntax odds ratio in the case of:
coef(summary(res))
Estimate Std. Error
a1 -2.13155603 0.10450286
a2 -1.19259266 0.09232077
a3 -0.89079068 0.09010891
a4 -0.65697671 0.08898063
a5 -0.04477565 0.08788869
bg -0.04917604 0.09073602
bd -0.26037369 0.08578617
coef(summary(pom.ord))
Estimate Std. Error z value Pr(>|z|)
1|2 -2.13155281 0.10450291 -20.3970663 1.775532e-92
2|3 -1.19259171 0.09232091 -12.9178937 3.567748e-38
3|4 -0.89078590 0.09010896 -9.8856524 4.804418e-23
4|5 -0.65697465 0.08898068 -7.3833401 1.543671e-13
5|6 -0.04476553 0.08788871 -0.5093434 6.105115e-01
girl -0.04917245 0.09073601 -0.5419287 5.878676e-01
day2 -0.26037360 0.08578617 -3.0351465 2.404188e-03
The results are very similar, for a more definitive way to compare models, we can always compare the log-likelihood:
logLik(res)
'log Lik.' -2769.784 (df=7)
logLik(pom.ord)
'log Lik.' -2769.784 (df=7)
- Agresti, A. (2013). Categorical Data Analysis. Wiley-Interscience. ↩
If you have any questions, please leave a comment below.
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!