Analysis of variance (1) (one-way analysis of variance)

Analysis of variance is a mathematical statistical method based on experimental data to infer whether one or more factors have a significant impact on experimental indicators when their status changes. According to the number of factors that affect the test indicators, variance analysis can be divided into single-factor ANOVA and two-factor ANOVA and Multi-factor ANOVA.

In mathematical statistics, the results of the test (such as product performance, output, etc.) are called test indicators, The conditions that affect the test indicators are called factors or factors, where the factors The different states are calledlevels. Usually capital letters A , B , . . . A,B,... A,B,... etc. represent different factors, using subscripts A 1 , A 2 , . . . A_1,A_2,. .. A1,A2,... isoexpressive factor A A A Non-uniform horizontal.

ANOVA

If in an experiment, only the level of one factor is changed and the levels of the other factors remain unchanged, then this kind of experiment is called Single factor experiment . Analysis of variance performed under a single factor test is called one-way analysis of variance.

mathematical model

factor A A A Yes r r r Discrete horizontal A 1 , A 2 , . . . , A r A_1,A_2,...,A_r < /span>A1,A2,...,Ar, is horizontal A i A_i Ai Down, forward n i n_i niRepeat the experiment independently and obtain the results in the following table:

level sample sample mean
A 1 A_1 A1 X 11 X 12 ⋯ X 1 n 1 X_{11} \quad X_{12} \quad \cdots \quad X_{1n_1} X11X12X1n1 X ˉ 1 \bar{X}_1 Xˉ1
A 2 A_2 A2 X 21 X 22 ⋯ X 2 n 2 X_{21} \quad X_{22} \quad \cdots \quad X_{2n_2} X21X22X2n2 X ˉ 2 \bar{X}_2 Xˉ2
⋮ \vdots ⋯ ⋯ ⋯ \cdots \quad \cdots \quad \quad \quad \quad\cdots ⋮ \vdots
A r A_r Ar X r 1 X r 2 ⋯ X r n r X_{r1} \quad X_{r2} \quad \cdots \quad X_{rn_r} Xr1Xr2Xrnr X ˉ r \bar{X}_r Xˉr

Determine each horizontal A i A_i Ai Corresponding population X i X_i Xi Clothes correct distribution N ( μ i , σ 2 ) N(\mu_i,\sigma^2) N(μi,p2), also self-dissimilar horizontal A i A_i AiThe samples of are independent of each other. In short, normal population, homoscedasticity, a> are the three basic assumptions for conducting analysis of variance. Independent samples

由于 X i j ∼ N ( μ i , σ 2 ) , j = 1 , 2 , . . . , n i X_{ij} \sim N(\mu_i,\sigma^2),j=1,2,...,n_i XijN(μi,p2),j=1,2,...,ni,因而 X i j − μ i ∼ N ( 0 , σ 2 ) X_{ij}-\mu_i \sim N(0,\sigma^2) XijmiN(0,p2),记 ϵ i j = X i j − μ i \epsilon_{ij}=X_{ij}-\mu_i ϵij=Xijmi,then
X i j = μ i + ε i j , j = 1 , ⋯  , n i ; i = 1 , ⋯  , r ε i j ∼ N ( 0 , σ 2 ) , j = 1 , ⋯  , n i ; i = 1 , ⋯   , r ε 11 , ⋯   , ε r n i mutually independent . \begin{aligned}&X_{ij}=\mu_i+\varepsilon_{ij},j=1,\cdots,n_i;i=1,\cdots,r\\&\varepsilon_{ij}\sim N( 0,\sigma^2),j=1,\cdots,n_i;i=1,\cdots,r\\&\varepsilon_{11},\cdots,\varepsilon_{rn_i}\text{ mutually independent}. \end{aligned}Xij=mi+eij,j=1,,ni;i=1,,reijN(0,p2),j=1,,ni;i=1,,re11,,erniMutually independent. constitutes the mathematical model of one-factor analysis of variance, where μ i \mu_i mi σ 2 \sigma^2 p2 are unknown parameters to be determined in the model.

The basic task of variance analysis is to test the hypothesis for the above model
H 0 : μ 1 = μ 2 = ⋯ = μ r ↔ H 1 : μ 1 , μ 2 , . . . , μ r are not all equal H_0:\mu_1=\mu_2=\cdots = \mu_r \leftrightarrow H_1:\mu_1,\mu_2,...,\mu_r are not all equalH0:m1=m2==mrH1:m1,m2,...,mrNot all equal That is, through the analysis of experimental data, it is tested whether the means of the normal populations with homoscedasticity are equal, and thus inferred whether the factors have a significant impact on the experimental indicators. .

Advanced analysis, introduction below, order
n = ∑ i = 1 r n i , μ = 1 n ∑ i = 1 r n i μ i , δ i = μ i − μ n=\sum_{i=1}^r n_i,\quad \mu=\frac{1}{n}\sum_{i=1}^r n_i\mu_i,\quad \delta_i=\mu_i-\mu n=i=1rni,m=n1i=1rnimi,di=miμ inside, μ \mu μ namereasoning average, δ i \delta_i di Vertical horizontal A i A_i AiThe effect of , which reflects the factor i i i horizontal A i A_i AiThe effect of on the test indicators. δ 1 , . . . , δ r \delta_1,...,\delta_r d1,...,dr Full foot formula ∑ i = 1 r n i δ i = 0 \sum_{i=1}^r n_i\delta_i = 0 i=1rnidi=0.

由电影记号,previously submitted model can be changed
X i j = μ + δ i + ε i j , j = 1 , 2 , ⋯   , n i , i = 1 , 2 , ⋯   , r ∑ i = 1 r n i δ i = 0 , ε i j ∼ N ( 0 , σ 2 ), j = 1 , 2 , ⋯   , n i ; i = 1 , 2 , ⋯   , r ε 11 , ⋯   , ε r n i are mutually independent. } \begin{rcases}X_{ij}=\mu+\delta_i+\varepsilon_{ij},j=1,2,\cdots,n_i,i=1,2,\cdots,r\\\sum_{i=1 }^rn_i\delta_i=0,\\\varepsilon_{ij}\sim N(0,\sigma^2),j=1,2,\cdots,n_i;i=1,2,\cdots,r\\ \varepsilon_{11},\cdots,\varepsilon_{rn_i}\text{ mutually independent}.\end{rcases} Xij=m+di+eij,j=1,2,,ni,i=1,2,,ri=1rnidi=0,eijN(0,p2),j=1,2,,ni;i=1,2,,re11,,erni Mutually independent.
For the above model, the hypothesis to be tested is
H 0 : δ 1 = δ 2 = ⋯ = δ r = 0 ↔ H 1 : δ 1 , δ 2 , . . . , δ r is not all zero H_0:\delta_1=\delta_2=\cdots = \delta_r =0\leftrightarrow H_1:\delta_1,\delta_2,...,\delta_r is not all zeroH0:d1=d2==dr=0H1:d1,d2,...,drNot all zeros
In variance analysis, the sum of squares decomposition method is used to decompose the total sum of squares of the deviations of the entire batch of data into several parts, some of which reflect the effects of factors , is called the sum of squares of the effects of factors, and some reflect errors caused by random fluctuations, which is called the sum of squares of errors. By analyzing the size of their ratios, the hypothesis testing work can be completed at once.

Statistical Analysis

首先,引入以下记号:
ϵ ˉ i = 1 n i ∑ j = 1 n i ϵ i j , i = 1 , ⋯   , r ϵ ˉ = 1 n ∑ i = 1 r ∑ j = 1 n i ϵ i j = 1 n ∑ i = 1 r n i ϵ ˉ i X ˉ i = 1 n i ∑ j = 1 n i X i j , i = 1 , ⋯   , r X ˉ = 1 n ∑ i = 1 r ∑ j = 1 n i X i j = 1 n ∑ i = 1 r n i X ˉ i } . \left.\left.\begin{array}{ll}\bar{\epsilon}_i=\frac1{n_i}\sum_{j=1}^{n_i}\epsilon_{ij},i=1,\cdots,r\\\\\bar{\epsilon}=\frac1n\sum_{i=1}^r\sum_{j=1}^{n_i}\epsilon_{ij}=\frac1n\sum_{i=1}^rn_i\bar{\epsilon}_i\\\\\bar{X}_i=\frac1{n_i}\sum_{j=1}^{n_i}X_{ij},i=1,\cdots,r\\\\\bar{X}=\frac1n\sum_{i=1}^r\sum_{j=1}^{n_i}X_{ij}=\frac1n\sum_{i=1}^rn_i\bar{X}_i\end{array}\right.\right\}\quad. ϵˉi=ni1j=1niϵij,i=1,,rϵˉ=n1i=1rj=1niϵij=n1i=1rniϵˉiXˉi=ni1j=1niXij,i=1,,rXˉ=n1i=1rj=1niXij=n1i=1rniXˉi .
由电影论社示可知
ϵ ˉ i ∼ N ( 0 , σ 2 n i ) , i = 1 , ⋯   , r ϵ ˉ ∼ N ( 0 , σ 2 n ) , X ˉ i = μ + δ i + ϵ ˉ i ∼ N ( μ + δ i , σ 2 n i ) , i = 1 , ⋯   , r X ‾ = μ + ε ˉ ∼ N ( μ , σ 2 n ) } \begin{rcases}\bar{\epsilon}_i\sim N\Big(0,\frac{\sigma^2}{n_i}\Big),i=1,\ cdots,r\\\bar{\epsilon}\sim N\Big(0,\frac{\sigma^2}n\Big),\\\bar{X}_i=\mu+\delta_i+\bar{\epsilon }_i\sim N\Big(\mu+\delta_i,\frac{\sigma^2}{n_i}\Big),i=1,\cdots,r\\\overline{X}=\mu+\bar{\ varepsilon}\sim N\Big(\mu,\frac{\sigma^2}n\Big)\end{rcases} ϵˉiN(0,nip2),i=1,,rϵˉN(0,np2),Xˉi=m+di+ϵˉiN(μ+di,nip2),i=1,,rX=m+eˉN(μ,np2)
Homogeneous, introduce the total sum of squared deviations
Q T = ∑ i = 1 r ∑ j = 1 n i ( X i j − X ˉ ) 2 Q_{T}=\sum_ {i=1}^{r}\sum_{j=1}^{n_{i}}(X_{ij}-\bar{X})^{2} QT=i=1rj=1ni(XijXˉ)2
由于 X ˉ \bar{X} Xˉ is the average of the entire batch of data, and Q T Q_T QT is the variance of the entire batch of data n n n 倍,即 Q T Q_T QT reflects the degree of fluctuation of the data, so Q T Q_T QT is calledthe sum of squared deviations.

Q T Q_T QT 可电影的 Q T = Q A + Q E Q_T=Q_A+Q_E QT=QA+QE,其中
Q A = ∑ i = 1 r n i ( X ˉ i − X ˉ ) 2 Q E = ∑ i = 1 r ∑ j = 1 n i ( X i j − X ˉ i ) 2 Q_A=\sum_{i=1}^r n_i(\bar{X}_i-\bar{X})^2 \\ Q_E=\sum_{i=1}^{r}\sum_{j=1}^{n_{i}}(X_{ij}-\bar{X}_i)^{2} QA=i=1rni(XˉiXˉ)2QE=i=1rj=1ni(XijXˉi)2 Q A Q_A QA nominal factor A A The sum of squares of the effects of Q E Q_E ), Sum of squares between groups (also known as AQE is called sum of squares of errors (also called sum of squares within group< a i=4>).

H H 0 H_0 H0 成立时, Q A Q_A QA give Q E Q_E QE Individual Function,Online
F = Q A / σ 2 r − 1 Q E / σ 2 n − r = Q A / ( r − 1 ) Q E / ( n − r ) . ∼ F ( r − 1 , n − r ) F=\frac{\frac{Q_A/\sigma^2}{r-1}}{\frac{Q_E/\sigma^2}{n-r}}=\frac {Q_A/(r-1)}{Q_E/(n-r)}\thicksim F(r-1,n-r)F=nrQE/σ2r1QA/σ2=QE/(nr)QA/(r1)F(r1,nr)
对给简的显書horizontal α \alpha α,得 H 0 H_0 H0The rejection region of is W = { F ≥ F α ( r − 1 , n − r ) } W=\{F \ge F_\alpha(r-1, n-r)\} IN={ FFα(r1,nr)}

The calculation results are usually listed in a variance analysis table:

Source of variance sum of squares degrees of freedom sum of mean squares F F F
因素 A A A (organization) Q A = ∑ i = 1 r n i ( x ˉ i − x ˉ ) 2 Q_A=\sum_{i=1}^r n_i(\bar{x}_i-\bar{x})^2 QA=i=1rni(xˉixˉ)2 r − 1 r-1 r1 Q ˉ A = Q A r − 1 \bar{Q}_A=\frac{Q_A}{r-1} QˉA=r1QA F = Q ˉ A Q ˉ E F=\frac{\bar{Q}_A}{\bar{Q}_E} F=QˉEQˉA
Direction E E E (internal) Q E = ∑ i = 1 r ∑ j = 1 n i ( x i j − x ˉ i ) 2 Q_E=\sum_{i=1}^{r}\sum_{j=1}^{n_{i}}(x_{ij}-\bar{x}_i)^{2} QE=i=1rj=1ni(xijxˉi)2 n − r n-r nr Q ˉ E = Q E n − r \bar{Q}_E=\frac{Q_E}{n-r} QˉE=nrQE
sum Q T = ∑ i = 1 r ∑ j = 1 n i ( x i j − x ˉ ) 2 Q_{T}=\sum_{i=1}^{r}\sum_{j=1}^{n_{i}}(x_{ij}-\bar{x})^{2} QT=i=1rj=1ni(xijxˉ)2 n−1 n−1n1

references

[1] "Applied Mathematical Statistics", Shi Yu, Xi'an Jiaotong University Press.

Guess you like

Origin blog.csdn.net/myDarling_/article/details/134799003