Multivariate analysis of variance [SPSS 081]

1. Teaching content

Multi-factor analysis of variance is used to study whether a dependent variable is affected by multiple independent variables (also known as factors). It tests whether there is a significant difference between the means of the dependent variable among different combinations of the value levels of multiple factors difference. Multi-factor analysis of variance can analyze the effects of individual factors (main effects), the interaction between factors (interaction effects), and the analysis of covariance, as well as the interaction of various factor variables and covariates.
According to the number of observed variables (ie dependent variables), multi-factor analysis of variance can be divided into: univariate multi-factor analysis of variance (also called univariate multi-factor analysis of variance) and multivariate multi-factor analysis of variance (ie, multivariate multi-factor analysis of variance) . This article will focus on the one-variable multi-factor analysis of variance, and the next article will describe the multivariate multi-factor analysis of variance in detail.
One-variable multi-factor analysis of variance: There is only one dependent variable, and the influence of multiple independent variables on the dependent variable is examined. For example: when analyzing the effects of different varieties and different fertilizer rates on crop yields, crop yields can be used as observation variables, and varieties and fertilizer rates can be used as control variables. Using the method of multi-factor analysis of variance, study how different varieties and different fertilizer rates affect crop yields, and further study which variety and level of fertilization is the optimal combination to increase crop yields.
01
Principle of Analysis
Perform F test by calculating F statistic. The F statistic is the ratio of the average sum of squares between groups to the average sum of squares within groups.

Here, the total sum of squares of influence is recorded as SST, which is divided into two parts, one part is the dispersion caused by the control variable, recorded as SSA (sum of squared deviations between groups), and the other part is the SSE caused by random variables (The sum of squared deviations within the group). That is, SST=SSA+SSE.
The sum of squared deviations between groups SSA is the sum of squared deviations between the average value of each level and the overall average value, reflecting the influence of the control variable. The sum of squared deviations within a group is the sum of squared deviations of each data from the average of the group at this level, reflecting the degree of data sampling error.
It can be seen from the F value that if the different levels of the control variables have a significant impact on the observed variables, then the sum of squared deviations of the observed variables between the groups is large, and the F value is also large; on the contrary, if the different levels of the control variables do not cause the observation variables Significant influence, the sum of squared deviations within the group is relatively large, and the F value is relatively small.
At the same time, SPSS will also give the corresponding associated probability value sig according to the F distribution table. If sig is less than the significance level (generally the significance level is set to 0.05, 0.01, or 0.001), then the overall mean values ​​of the control variables at different levels are considered to be significantly different, and vice versa. Generally, the larger the F value, the smaller the sig value.
02
SPSS analysis case
Now there is a salary table for employees of a company. I want to see the impact of the two control variables of employee gender "gender" and education years "edu" on the employee's "current salary". Using the multi-factor analysis of variance method, the influence of "gender" and "edu" on the "current salary" should be considered separately, which is called the main effect, and the influence of "gender*edu" on the "current wage", called the interaction effect.
(1) Analysis step: After importing the data into SPSS, select: analysis-general linear model-univariate

(2) Select the "current wage" as the dependent variable (that is, the observation variable), and select the gender "gender" and the number of years of education "edu" as the fixed factor (that is, the control variable).

(3) Select "Model" of "Univariate", and select "Full Factors" after opening the dialog box, which means that the model of the analysis of variance includes the main effects of all factors, as well as the interaction effects between factors. Then "continue".

(4) Open the "Draw" dialog box of "Single variable", select "gender" as the horizontal axis variable, select "edu" as the split variable, and click "Add" to display the interaction of these two factor variables. That is, the interaction variable "gender*edu".
Because in this example, "gender" has only two levels, namely male and female; while "edu" has multiple levels. Therefore, if the main effect is significant, it indicates that there is a significant difference between two or more levels of the factor. After the fact, you can continue to compare the mean difference between multiple levels of the same factor. This process is called multiple comparisons.
But in fact, if both the main effect and the interaction effect are significant, we are more concerned about the impact of the dependent variable under the multi-factor interaction.
Therefore, if the interaction effect is significant, a simple effect test is usually required. The so-called simple effect test refers to the variation of the level of one factor at a certain level of another factor. For example, in our example, if there is a significant interaction between gender and edu, we can test the difference between the levels of edu when gender is "female", which is called the "female" level of edu The simple effect; and the difference between the levels of edu on the "male" level is called the simple effect of edu on the "male" level.
The simple effect test actually fixes one of the independent variables at a certain level and examines the influence of the other independent variable on the dependent variable. The simple effect test is implemented in SPSS with a "MANOVA" command.
Similarly, when we test three independent variables, if the interaction between these independent variables is significant, a simple effect test is needed, that is, the effect of the level of one factor on the combination of the levels of the other two factors.
That is to fix the two factors at their respective levels and examine the influence of the third factor on the dependent variable. It is also achieved with the "MANOVA" command. We observe whether the simple effect is significant or not by looking at the F value and the sig value. Generally, the sig value is compared with a value we set (0.05, 0.01, or 0.001). If the sig value is greater than this value, it is simple The effect is not significant; on the contrary, if the sig value is less than this value, the simple effect is significant.

(5) Open the "Options" dialog box, move the three control variables on the left to the right, "Display Means", and select "Descriptive Statistics" and "Compare Main Effects".

(6) After clicking "OK", the result will be displayed in the SPSS viewer. Among them, the top part of the code is the code of the steps that we do in SPSS. The following table is the result we want, draw conclusions from the table.

(7) From the "Test of Intersubjective Effect" table below, we compare the F value and sig value of gender gender, education level edu, and gender edu interaction, and see that edu has the largest F value and the smallest sig value. And sig<0.05. The sig values ​​of gender and gender edu are both greater than 0.05. It is concluded that the main effect of "gender" is not significant, while the main effect of "edu" is significant, and the interaction effect of gender and edu is not significant (when the interaction effect reaches When it is significant, a simple effect test result can be performed), and a simple effect test is not required. Then the "education level" of the company's employees has a significant impact on the employees' "current salary", while the "gender" has no obvious impact on the "current salary".

(8) The figure below is the mean distribution diagram, which is the mean distribution of the dependent variable employee wages under the action of the two factors edu and gender. Generally, if the interaction effect is not significant, the factor distribution lines in the graph are all parallel lines; if the interaction effect is significant, the factor distribution lines in the graph are not parallel.
In this figure, the gender "gender" is used as the horizontal axis variable to observe the influence of the number of years of education "edu" on the dependent variable "current wage".

The figure draws the conclusion: when the education period is 20 years, generally at the graduate level, the wage difference between men and women is not big; the education period is 14 years, generally at the junior college level, and the gender wage difference is not obvious. But when the years of education are 8 years, 10 years, 12 years, and 17 years, the wage difference between men and women is relatively large, especially when it is 8 years and 17 years, the difference in wages between men and women is particularly obvious.

2. Remarks

Get all the complete information of this CSDN number SPSS or do it on your behalf, add QQ1564658423

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/112691564