Unordered multi-class logistic regression model [SPSS 080]

1. Teaching content

The dependent variable is unordered multi-category data, or although the dependent variable is ordered multi-category but does not meet the proportional advantage assumption (parallelism test P>0.05), logistic regression of unordered multi-category can be used for analysis. Of course, when the outcome variable is out of order and there is only one independent variable and it is a categorical variable, the chi-square test can be used directly; when the outcome variable is ordered, and there is only one independent variable and it is a categorical variable, the nonparametric test can be used directly.
The logistic regression model of unordered multi-classification is different from the logistic regression model of ordered multi-classification. The logistic regression of ordered multi-classification uses the cumulative logit model, and the logit transformation is the cumulative probability of the ordered value level of the dependent variable; the logistic regression of the unordered multi-classification uses the generalized logit model, which uses the dependent variable The natural logarithm of the ratio of each level (except the reference level) to the reference level is used to establish the model equation. When the level number is 2, the model is equivalent to the logistic regression of the two-category data, so the model can be regarded as a two-category logistic Extension of the regression model. The dependent variable y is an unordered multi-category variable with n levels, and n-1 generalized logit models can be generated when logistic regression of unordered multi-category is performed. The positive probability of the reference level R is recorded as πR, and the positive probability of the k-th level (k=1, 2,...n) is respectively πk, then π1+π2+…πn=1. There are m independent variables x, and the coefficient of the i-th independent variable at the k-th level (i=1,2,...m) is βki.

Obviously π1+π2+π3+π4=1. If you want to compare 1 and 2, you can subtract the corresponding two formulas to get the corresponding function. Similarly, you can compare 1 and 3, or 2 and 3. Of course, we can also directly modify the reference level.
Example: A researcher wanted to understand whether adult residents' access to health knowledge is different between different communities and genders. They surveyed 314 adults in two communities. The results are shown in the table below. The variable assignments are: community (community A=0, community B=1), gender (male=0, female=1), access to health knowledge (traditional mass media=1, network=2, community promotion=3). Please fit a multi-class logistic regression model of community and gender on how residents acquire health knowledge.

1. Data entry

2. Data weighting: Data>>Weight Cases..., weight [frequency]

3. Multivariate regression analysis: Analyze>>Regression>>Multinomial Logistic...
l Dependent variable: access method
l Factors: community, gender The
dependent variable and the factor must be categorical variables, and the covariate is not the researcher’s concern but correct The independent explanatory variable that will have an impact on the result can be either categorical or continuous. In [Reference Category...] under [Dependent Variable], you can set the reference category and category order. The default reference category is the last category, and the default category order is ascending. In ascending order, the dependent variable has the smallest value as the first category, while in descending order, the smallest value is the last category.

[Model]: You can specify the analysis model. The default is to analyze only the main effect, or you can perform a full factor analysis (main effect + interaction), of course, you can also perform a custom analysis. After selecting Custom/Stepwise, in addition to customizing the model, you can also filter variables, similar to the Block and Method in the binary logistics regression. This example uses the default main effect analysis.
[Statistics]: In addition to the default options, select information criteria (output AIC and BIC), cell probability, classification table and goodness of fit test at the same time. The default option for defining subpopulations is to calculate cell probabilities for all independent variables and covariates and perform a goodness-of-fit test.

[Convergence criterion]: Mainly set the iteration.
[Options]: The entry and rejection criteria and their inspection methods can be set. [Save]: You can save new variables [estimated reaction probability], [predicted classification], [predicted classification probability], and [actual classification probability].
4. Results
[Summary of Case Processing]: Analyze the basic situation of the example.

[Model fitting information]: Compared with the initial model containing only constant terms, the final model's AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), negative 2 times the log likelihood value (-2LL) Both are down. The -2LL value dropped from 80.877 to 36.821, a decrease of 44.056 (chi-square value). The likelihood ratio of the chi-square test was statistically significant (P<0.001), indicating that the model included gender and community variables with at least one partial regression coefficient Not 0.

[Goodness of fit test]: Display the results of Pearson goodness of fit test and Deviance goodness of fit test. These two methods actually test the comparison between the current model prediction value and the sample measured value. The P value of both results is greater than 0.05, indicating a good fit. However, it should be noted that these two methods have certain requirements for the sample size of the independent variables. The test results of these two methods are generally not used when there are many independent variables or continuous variables.

[Pseudo R2]: Output three pseudo coefficients of determination. For statistical analysis of categorical data, there is no need to pay too much attention to these three kinds of pseudo-coefficients of determination too low.

[Likelihood Ratio Test]: The table shows the AIC, BIC, and -2LL values ​​of the final model (consistent with the results in the [Model Fitting Information] table), as well as the AIC and AIC of the reduced model (the model after removing the effect of a certain independent variable) The BIC and -2LL values, and the chi-square test statistic is the difference of -2LL between the reduced model and the final model. The results show that the contribution of community and gender to the model is statistically significant.

[Parameter estimation] In SPSS, the high level of the dependent variable is the reference level by default (in this case, community propaganda). If you want to use other values ​​as the reference level, you can modify the assignment of the dependent variable levels in the data, or pass [Reference Category...] to specify. The independent variable also defaults to the high level as the reference level. You can also modify the assignment of each level of the independent variable to change the reference level. If the variable is included in the analysis as a covariate, the low level will default to the reference level. Therefore, in this example, community B (community=1) and female (gender=1) are the reference levels, and the parameter value is 0, which is generally a parameter that researchers are not interested in, that is, redundant parameters.

From the results, the regression coefficient of community A (community=0) is negative, P=0.001<0.05, OR=0.370. It is statistically significant that the regression coefficient of community A is not 0 (the regression coefficient of community B is 0). The regression coefficient is negative, indicating that compared with community propaganda, community A (than community B) is less willing to obtain health knowledge through traditional mass media, or community A is more willing to obtain health knowledge through community propaganda; OR=0.370, that is, compared Community propaganda, the health knowledge obtained by community A through traditional mass media is 0.37 times that of community B, or a more logical statement is that the health knowledge obtained by community A through community propaganda is 2.70 times (1/0.370) of community B. Community B passed Traditional mass media obtain health knowledge 2.70 times that of Community A. Of course, strictly speaking, the expression of OR should be: the ratio of community B's choice of traditional mass media to the choice of community propaganda is 2.70 times the corresponding ratio of community A.
In the same way, men (than women) are more willing to obtain health knowledge through traditional mass media than community propaganda, OR=3.410. Compared with online publicity, there is no statistical difference between community A (and community B) in acquiring health knowledge through traditional mass media (Wald χ2=1.7, P=0.192>0.05), but men are more inclined to choose online to obtain health knowledge (Wald χ2=8.126, P=0.004<0.05, OR=2.213).

If you want to compare traditional mass media with the Internet, you can directly subtract the corresponding model equations.

In general, it can be judged that compared with the online approach, Community A is less inclined to traditional mass media (that is, more inclined to the Internet), and men are more inclined to traditional mass media, but whether it is statistically significant requires further testing. In the multivariate regression dialog box, the reference category can be customized as a network (Custom Value=2) through [Reference Category...], and the following results can be obtained, which are consistent with the above calculation results, interpretation is omitted.

In addition, the principle of same entry and same exit should be followed when the independent variable is multi-classification.
[Classification table]: The difference between observation frequency and prediction frequency. The diagonal line is the frequency of correct judgments, while the non-diagonal line is the number of judgment errors. The prediction accuracy rate is average and needs improvement.
[Observation frequency and prediction frequency]: relatively close, good fitting.

2. Remarks

Get all the complete information of this CSDN number SPSS or do it on your behalf, add QQ1564658423

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/112691561