[Statistics Notes nine] analysis of variance (ANOVA)

[Statistics Notes nine] analysis of variance (ANOVA)

ANOVA (Analysis of Variance, referred ANOVA)


Analysis of variance (ANOVA), also known as "analysis of variance" or "F test" is RAFister invention for significance test for two or more sample mean differences.

Due to various factors, the Institute obtained data fluctuated like.

Causes fluctuations can be divided into two categories: one is uncontrollable random factors, other factors are controlled study in applied form influence on the results.

Analysis of variance definition:

Analysis of variance was categorical arguments whether there is a significant impact on the value of the dependent variable type by examining each population mean is equal to judge.


The basic idea of ​​analysis of variance

By analyzing the different sources of variation studies the contribution to the total variation, thereby determining the size of controllable factors influence the study results.

From the formal point of view, analysis of variance is relatively more population mean equality, but it is essentially a study of the relationship between variables.

And a numeric result when the relationship between the variables analyzed in the study of one or more categorical variables variance from one of the main methods of them. This has a lot in common with the regression analysis, but there are essential differences.

Analysis of variance test can not only improve efficiency, and because it combines all of the sample information, thus increasing the reliability of the analysis.

why?

For example, it is assumed population mean 4 are  \ Mu 1, , \ Mu 2, \ Mu 3, \ Mu 4if the test method is generally assumed, such as t-test, two samples can only study, to test the overall mean 4 are equal, the need pairwise comparisons made 6 secondary inspection.

Inspection H{_{0}}1: \ Mu 1 = \ Mu 2

Test H{_{0}}2: \ Mu 1 = \ Mu 3

Inspection H{_{0}}3: \ Mu 1 = \ Mu 4

Inspection H{_{0}}4: \ Mu 2 = \ Mu 3

Inspection H{_{0}}5: \ Mu 2 = \ Mu 4

Inspection H{_{0}}6: \ Mu 3 = \ Mu 4

Obviously, such a comparison is very complicated! If the  \alpha = 0.05probability that each inspection made Class I error is 0.05, for a number of tests that will be the type I error probability increases correspondingly made when testing is completed, make Type I error probability will be greater than 0.05, i.e., continuous 6 test probability for committing type I error is:

1-\left ( 1-\alpha \right )^{6} = 0.265, The confidence level will be reduced to 0.735 (i.e., 0.95^{6}).

In general, with the increase in the number of individual significance tests of causal factors leading to the possibility of differences will increase. Variance analysis is taking into account all of the samples, thus excluding the probability of error accumulation, thus avoiding reject a true null hypothesis.

 

Example: There are three machines in the same production specifications aluminum alloy sheet, to test the thickness of the three machines are the same for producing the sheet, the sheet is randomly produced by each machine in each of the five samples extracted, the measured results are as follows:

Question: Is there a significant difference in the thickness of the three machines produce sheet?

This problem is solved by analysis of variance.

In Excel, generating the above three machines - a sample data The sample data of five ANOVA table :

SS represents the sum of squares, df degree of freedom, the MS indicates mean square, F is the test statistics, P-value P-value for the test, F{_{crit}} for a given  \alpha critical value level.

When making a decision can be analyzed in the table P value significant variance  \alpha values are compared.

When the analysis of variance table, analysis of variance:

If: F > F_{\alpha }then reject the null hypothesis  H{_{0}};

If: F < F_{\alpha }null hypothesis is not rejected  H{_{0}}.

Can also, P and  \alpha the relationship determination:

If: P < \alphathen rejected  H{_{0}};

When: P > \alpha, not rejected  H{_{0}}.

In the present embodiment, from the analysis of variance table, it can be seen:

Since (2,12) = 3.89 <32.92 , it is rejected , that each sheet thickness produced machines significantly different . 

Here we used the principle of hypothesis testing and analysis of variance (ANOVA).


Classification analysis of variance

ANOVA

Depending on the data type of design, there are two methods of analysis of variance:

1, a plurality of groups of samples were compared design, should be completely random design by analysis of variance, i.e. ANOVA .

2, multiple randomized block design were compared, compatibility group should design analysis of variance, which uses two-way ANOVA .

ANOVA (One Way ANOVA)

ANOVA single factor refers to the results of the analysis, the method significantly affected by the presence or absence of test factors on the test results.

ANOVA comparison of two sample means extension, which is used to test the difference between the average of a plurality, a statistical method to determine the presence or absence of significant factors that affect the test results.

Two-way ANOVA method (Two-way analysis of variance)

Two-way ANOVA analysis is a statistical analysis method, this method of analysis can be used to analyze different levels of two factors on the results if there is interaction between the existence of a significant impact, as well as two factors. Using the general method of two-way ANOVA, a combination of different levels of the first two factors, the design of the test, the content of each required combination of the obtained samples are the same.

In the study of practical problems, and sometimes need to consider the impact of two factors on the experimental results. For example beverage sales, in addition to color outside the beverage concerned, we want to know whether the sales area sales impact, if in different regions, there are significant differences in sales, you need to analyze the reasons. Using different marketing strategies so that the popular beverage brands continued high market share in the region, to stay ahead; low market share in the region, to further expand the promotion, so that more consumers understanding, acceptance of the product. If the color is seen as factors A beverage sales, sales regions beverages are factors B. The factors A and B factors were also analyzed, belongs to a content analysis of two-way ANOVA, two-way ANOVA analysis of influencing factors to test, whether it is a factor all play a role at work, or on two factors, or two factors not significant.

Step analysis

The basic step of analyzing the variance of the two, but a different variation in an exploded manner, the design of the information group, the total variation is decomposed into the variation between the groups and group variability (random errors), namely: Total = Room SS SS SS group group + inside, while the design data compatibility group, except that the total variation and the variation divided into treatment groups of random errors further comprising an outer compatibility group variation, namely: SS = SS total compatibility + SS + SS processing error. The basic step of analyzing the overall variance as follows:

1, the establishment of testing hypotheses;

H0: Overall mean equal plurality of samples;
Hl: a plurality of samples or unequal overall mean insufficiency.
  Significance level of 0.05.

2, F test statistic calculated value;

3, to determine the P value and inference result.


Application conditional variance analysis

Analysis of variance should pay attention to its terms before the data were statistical inference, including:

1, comparable. If the information in each of the groups were not comparable analysis of variance itself does not apply.

2, normality. That skewed data analysis of variance does not apply. Skewed distribution of data should consider using a logarithmic transformation, square root transformation, reciprocal transformation, arcsine square root transformation variables transform into a normal or near normal after analysis of variance.

3, homogeneity of variance. That is, if the analysis of variance between groups missing variance does not apply. More variance homogeneity test is available Bartlett's method, which uses chi-square test statistic value as a result judgment circles need to view a chi-square value table.

Analysis of variance is mainly used for:

1, the mean difference significance test;

2, all relevant factors and estimate their effect on the separation of the total variance;

3, analysis of the interaction between the factors;

4, homogeneity of variance test.

 

 

Published 619 original articles · won praise 185 · views 660 000 +

Guess you like

Origin blog.csdn.net/seagal890/article/details/105021319