SPSS one-way analysis of variance tutorial

write in front

For self-study recording, it is derived from the SPSS single-factor analysis of variance tutorial at Bilibili . Welcome to correct and exchange comments.

What is one-way analysis of variance

That is, compare the average values ​​of different groups to see if there are any differences. For example, I want to compare whether there is any difference in the average age of three classes A/B/C. This is a very typical case of single-factor analysis of variance. The only factor is class. A medical example is: the therapeutic effect of mild group/moderate group/severe group.

The principle of one-way analysis of variance

The ratio of the between-group difference to the within-group difference was calculated. The inter-group difference is the difference between the three groups of mild/moderate/severe; the intra-group difference refers to, for example, there are 30 people in the severe group, and the difference between these 30 people is called the intra-group difference. If the contrast between the differences between groups and the differences within groups is large, the differences between these groups are considered significant.

One-way analysis of variance is based on the F statistic, which is the difference between groups divided by the difference within the group. If the quotient of the difference between the groups divided by the difference within the group is large, the corresponding F value is large, and the corresponding p value is small. If it is less than 0.05, it is considered that there is a significant difference between the average values ​​of the participating research groups, that is, the core is that the quotient of the difference between the groups and the difference within the group is larger.

Null hypothesis for one-way ANOVA

There is no significant difference between the mean values ​​of different groups

In other words, there is no significant difference in the treatment effect between the severe group, the mild group and the moderate group. If the calculated p value is greater than 0.05, the null hypothesis must be accepted, otherwise the alternative hypothesis must be accepted.

Alternative hypotheses for one-way ANOVA

At least one group is not equal to the other groups

Note that this alternative hypothesis does not require that there be a difference between every pair. It only requires that one group is different for the difference to be considered significant. To put it simply, if the single-factor analysis of variance result is less than 0.05, further pairwise comparisons will be conducted, and post hoc multiple comparisons will be conducted to examine which two groups have significant differences.

Application conditions of one-way analysis of variance

Four necessary conditions:

  • The dependent variable must be a continuous numerical variable: a value that can be obtained at any point within a certain interval of a coordinate axis. For example, categorical variables like gender (male/female) are not continuous numerical variables. But if you want to compare the age differences between different groups, the age variable covers any value that normal human age can take, so the age here is a continuous numerical variable, which meets the first condition of variance analysis.
  • The variables in each group obey a normal distribution : For example, if you want to compare the age differences between the three groups of patients A/B/C, you need to conduct a normal distribution test on the ages of the patients in these three groups. Only three groups satisfy the normal distribution. Only the state distribution can be used for one-way analysis of variance, which is the second condition.
  • The variances of the groups are equal (homogeneity) : that is, the variances of the three groups A/B/C must be equal before single-factor analysis of variance can be performed.

In actual research, the two conditions of normal distribution and equal variance can be appropriately relaxed, and slight skewness is acceptable.

  • The number of groups is greater than or equal to two groups : Only one-way analysis of variance is used for more than two groups, and the independent sample T test is more commonly used between two groups.

Data practice

Test for normal distribution

image-20220607215120211
  • option parameters

    • 统计-描述性-界外值
    image-20220607215428304
    • 绘图- 直方图, be sure to check it带检验的正态图
    image-20220607215717699
    • After confirmation, check the results (here a group of data that does not obey the normal distribution is changed for the purpose of result demonstration): The Shapiro-Wilk test shows that group 1 and group 3 do not meet the normal distribution, and based on the outliers, it can be seen which values ​​are abnormal.

    image-20220607221433024 image-20220607221506403

    • Solution
Parametric and non-parametric tests

image-20220608205544183

Parametric test : Assuming that the data obeys a certain distribution (generally normal distribution), the population parameter (μ) is tested through the estimator of the sample parameter (x±s), such as t test, u test, and variance analysis. Continuous variables : such as numerical values

Non-parametric test : There is no need to assume the overall distribution form, and the distribution of the data is directly tested. Because it does not involve the parameters of the overall distribution, it is called a "non-parametric" test. For example, the chi-square test. Discrete variables : yes and no, beginner/intermediate/advanced, etc.

Satisfies normal distribution (parametric test)

parameter settings
  • My own experimental data is selected here to compare the difference in feed intake between the four parities (the same group used in the previous normality test. For some reasons, for the convenience of continuing in this section, the default is that this group of data conforms to normal distribution)
image-20220607203604920
  • The dependent variable refers to feed intake, the variable to be compared (the tested variable), and the factor refers to the group (here, parity).

    • Attribute settings 对比: Polynomial level settings are generally used for data with obvious hierarchical divisions, such as the settings for mild/moderate/severe patients mentioned above, and what is used here is that parity has a hierarchical relationship. The selection is equivalent to telling the 等级software , I want to study whether there is a difference between the factor of parity and feed intake as the grade increases. I generally choose 五次to count all times 1-5. Normally, you can uncheck this option for non-hierarchical grouping.
    image-20220607204157419
    • Property settings 事后多重比较, this time choose the following
    image-20220607205817790
    • Property settings 选项, check 描述性and方差同质性检验
    image-20220607210313641
Result analysis
  • Descriptive statistics: N represents the number of samples in each group. The red box shows two values ​​that are often used in article display - the mean and SEM standard error.
image-20220607211859426
  • Check the homogeneity of variances : p greater than 0.05 means that the variances of the four groups are homogeneous, which meets the prerequisites of single-factor variance analysis. You can continue to view the results.
image-20220607210650423
  • Whether it is significant and the corresponding p value. The final p value between the above demonstration data groups is the value below 0.430. Generally speaking, if there is no implicit grade in the experimental design (light/moderate/severe, gradient dose, etc.), they all have grades . relationship) relationship, then just use the total p value of 0.430. The column on the left only calculates cubic terms, indicating that the data provided does not support higher-order calculations. The results of the three cubic terms in this table are all greater than 0.05, indicating that these models (linear models/quadratic curves/cubic curve changes) are all incompatible. If you need to list subterm relationships (linear and quadratic term relationships that are common in articles, such as the example in an article on the left): Generally, the result of the row corresponding to the unweighted subterm can be used.

image-20220607211425399 image-20220608164806720

  • Post hoc multiple comparisons: Because the condition of homogeneity of variances has been met previously, the comparison method here can ignore Tamhane's T2 method (see this if the variances are uneven), and directly check the results of the Bonferroni method. In this result, the p values ​​are all greater than 0.05 means there is no significant difference in feed intake between different parities. The premise of which test method to refer to in multiple comparisons is whether the variances are equal (homogeneity of variances). For more information, please refer to the selection of test methods.
image-20220607212053288
Choice of test methods
  • For the selection of specific comparison methods, please refer to the source [Study Notes] Comparison of differences between groups and summary of related issues . The original author's writing is very detailed and helpful.

    img

    • Comparison between multiple groups : first use Levene's test for homogeneity of variances, and Shapiro-Wilk test for normality.
      • If the number of samples in each group is equal, use Tukey and Ducan
      • If the number of samples in each group is not equal, use Bonferroni, Student-Newman-keuls (SNK), Scheffe

    Confirmatory research : In the experimental design stage, the groups to be compared are designed in advance based on the research purpose or professional knowledge. For example, during the experimental design, a group of control groups and n groups of experimental groups have been designed. After finally getting the data, we only care about the pairwise comparison between the experimental group and the control group, and the comparison between the experimental group and the experimental group does not Within the scope of the experimental design, no comparison is required. That is, before the data is obtained, the groups that need to be compared have been designed, and the only concern is whether there is a difference in the means between certain groups. This is called "priori test".

    Exploratory research : In the experimental design stage, because it is not clear which groups need to be compared, it is impossible to design the groups that need to be compared in advance. Therefore, after obtaining the data, pairwise comparisons of all groups need to be carried out. To further determine which differences exist between the two groups. For example, when designing the experiment, it is not known whether there is any difference in the EEG signals between the normal state, fatigue state and sleep state. Therefore, after collecting the data, it is necessary to compare the two to get the result. All factors need to be considered. Comparison, this is called "post hoc test". (Note that in actual operations, there will be such a situation: after the data collection is completed, in order to reduce the workload, the researcher will select some groups that seem to be very different for comparison, and those groups that seem to have no difference will not be compared. Then compare, so in the actual operation, not all the pairwise comparisons were completed, but only a few groups of pairwise comparisons that seemed to be very different were completed. However, it should be noted that even if it seems that only a few of the pairwise comparisons were performed Pairwise comparison, but these "seemingly different" groups are already the result of screening out "through experience" rather than "testing methods" in all pairwise comparisons, so in fact all the two groups are still considered. Comparison between the two is still a "post hoc comparison".)

Does not satisfy normal distribution (non-parametric test)

Also attach a picture first

image-20220607214802096

After performing a normal test on the data, if the normal distribution is not satisfied, a non-parametric test is selected (for convenience of demonstration, another set of data is used below):

  • Use the previous normal distribution test for the following data. After completing the test mentioned above, it is found that the LIP/TP/NH3L/SOD groups of data do not meet the normal distribution. Then use the non-parametric Kruskal-Wallis H test: 分析- 非参数检验- 旧对话框-K个独立样本
image-20220608150638258
  • Select LIP/TP/NH3L/SOD as the test variable, parity as the grouping variable and set 1-4 groups, select Kruskal-Wallis H test as the test type, and check descriptive statistics in the options

    image-20220608150742112
  • The test results show that there are differences between the two indicators of TP/NH3L between groups. To further check whether there are differences between the two, please see the section below on how to perform pairwise comparisons with the Kruskal-Wallis rank sum test.

image-20220608151003365
How does the Kruskal-Wallis rank sum test perform pairwise comparisons?

Continuing from the above , which does not satisfy the normal distribution (non-parametric test) , comes from station B SPSS-non-parametric test 6-Kruskal-Wallis H test-multiple independent sample rank sum test-post-hoc pairwise comparison , the following is a text arrangement

  • In the results interface of ks test , select 非参数检验-独立样本
image-20220608152530729
  • In the pop-up dialog box, just modify 字段this module, set the field whose p value is less than 0.05 and add the group to run directly.
image-20220608153923396
  • 模型查看器At this time, I found that there was still no pairwise comparison result in the result box that popped up. Then double-click the result box, and in the new dialog box that popped up, select the test field ( the yellow background TP shown in the picture below), and click in the middle under the 查看right interface Select 成对比较, then the pairwise comparison information appears on the right. For example, as shown in the figure below, for the TP indicator, the p value is 0.016 between group 1 and group 3. On the surface, there is a significant difference between the two groups.
image-20220608155225676

Summarize

The content of this article is summarized and formed into a mind map as follows (only for the self-use process of this article, for a more comprehensive and detailed reference, please refer to the [Study Notes] comparison of differences between groups and summary of related issues cited in the text). Since I am not a statistics major, But recently there has been a demand for this tool, so I am slowly exploring it. If you have any questions or suggestions, please point them out.

SPSS one-way analysis of variance

Guess you like

Origin blog.csdn.net/twocanis/article/details/125192298