[Probability Theory] Final Review Notes: Hypothesis Testing

Hypothesis Test Directory

1. The basic concept of hypothesis testing

Hypothesis testing : Propose a hypothesis about the population → test the hypothesis with a sample
Accept the hypothesis: think the hypothesis is correct
Reject the hypothesis: think the hypothesis is wrong

Hypothesis testing is divided into parametric assumptions , distribution assumptions

1. Basic principles of hypothesis testing

Practical inference principle : It is almost impossible for a small probability event to occur in an experiment.
The idea of ​​hypothesis testing: Construct a suitable test hypothesis H 0 H_0H0The statistic ( test statistic ), if the hypothesis H 0 H_0H0is established, the test statistic satisfies a condition (if H 0 H_0H0If it is established, the probability of meeting the condition is very high), now see if the sample satisfies this condition, if it is satisfied, accept H 0 H_0H0, if not satisfied, reject H 0 H_0H0(This indicates that a small probability event has occurred, and the null hypothesis is not established, similar to rejection reasoning).

2. Two types of errors

deny domain WWW : When the observed value of the test statistic falls withinWWWhen W , rejectH 0 H_0H0
Receptive field W ‾ \overline{W}W: When the observed value of the test statistic falls on W ‾ \overline{W}W, accept H 0 H_0H0
Threshold : the critical point of rejecting domain and accepting domain

Two types of errors:
Type I errors : H 0 H_0H0True but rejected
Type II error : H 0 H_0H0false but accepted

Significance level α \alphaα : Probability of making Type I error,P {Reject H 0 ∣ H 0 is true} = α P\{\text{Reject}H_0|H_0\text{True}\}=\alphaP { reject H0H0true }=α , reflecting the rejection ofH 0 H_0H0The persuasiveness
β \betaβ : the probability of committing a type II error,P {accept H 0 ∣ H 0 is not true} = β P\{\text{accept}H_0|H_0\text{not true}\}=\betaP { accept H0H0not true }=β
at sample sizennWhen n is certain,α \alphaα decreases,β \betaβ will increase

Significance test : Control the probability of making a Type I error not to exceed a value (significant level), regardless of the Type II error. The
significance level is α \alphaα test method: the probability of making a Type I error does not exceedα \alphaα , that is,P {Reject H 0 ∣ H 0 is true} ≤ α P\{\text{Reject}H_0|H_0\text{True}\}\le\alphaP { reject H0H0true }α is α \alpha
at all significance levelsIn the test method of α , the probability of making a Type II errorβ \betaThe test with the smallest beta is the best test

3. General steps of hypothesis testing

(1) Fully consider and use the known background knowledge to put forward the null hypothesis H 0 H_0H0Alternative Hypothesis H 1 H_1H1

H 1 H_1 H1is H 0 H_0H0The opposite of is called the alternative hypothesis or alternative hypothesis. H 0 H_0H0Generally, it is "hypothesis to be protected" or "hypothesis to maintain the status quo", false rejection hypothesis H 0 H_0H0Than incorrectly reject the hypothesis H 1 H_1H1lead to more serious consequences. In practice, H 0 H_0H0It should not be easily denied, and if it is denied, there must be sufficient reasons.

(2) Determine the test statistic ZZZ , and atH 0 H_0H0Export ZZ under the premise of establishmentThe probability distribution of Z , requiringZZThe distribution of Z does not depend on any unknown parameters.

ZZ hereZ is similar to the pivot amount in parameter estimation, chooseZZThe reason for Z is to eliminate the interference of unknown parameters and use a certain distribution to find the rejection domain.

(3) Determine the deny domain. First determine the form of the rejection domain based on intuitive analysis, and then according to the given level α \alphaaZZThe distribution of Z , byP {Reject H 0 ∣ H 0 is true} = α P\{\text{Reject}H_0|H_0\text{True}\}=\alphaP { reject H0H0true }=α determines the critical value of the rejection domain, thereby determining the rejection domain.

Determining the rejection region is similar to the process of determining confidence intervals in parameter estimation.

(4) Make a specific sampling, according to the obtained sample value and the rejection domain pair H 0 H_0 determined aboveH0Make a decision to reject or accept.

If ZZObservations of Z fall into the rejection domain WWW , then reject the null hypothesisH 0 H_0H0, accepting the alternative hypothesis H 1 H_1H1; if it falls into the receptive field W ‾ \overline{W}W, then accept the null hypothesis H 0 H_0H0, rejecting the alternative hypothesis H 1 H_1H1
Z ∈ W ‾ ⟶ H 0   H 1 Z ∈ W ⟶ H 1   H 0 Z\in\overline{W}\longrightarrow \textcolor{green}{H_0}\,\textcolor{red}{\sout{H_1}}\\ Z\in W\longrightarrow \textcolor{green}{H_1}\,\textcolor{red}{\sout{H_0}} ZWH0H1ZWH1H0

4. p p p -value

p p p -value: The minimum significance level for rejecting the null hypothesis using the sample value is calledppp -value.
α < p ⟶ H 0 H 1 = Z falls in the accepting domain α ≥ p ⟶ H 1 H 0 = Z falls in the rejecting domain \alpha<p\longrightarrow \textcolor{green}{H_0}\,\textcolor{red}{ \sout{H_1}}\;=Z\text{falls into \textcolor{green}{receptive field}}\\ \alpha\ge p\longrightarrow \textcolor{green}{H_1}\,\textcolor{red} {\sout{H_0}}\;=Z\text{falls into \textcolor{red}{rejection field}}a<pH0H1=Z falls within the receptive fieldapH1H0=Z falls within the rejection domain ppThe p -value is the minimum significance level for "rejection", that is, ifα \alphaα greater than or equal toppp , then reject it. Intuitively,α \alphaThe smaller α is, the less likely it is to reject the null hypothesisH 0 H_0H0, reject H 0 H_0H0The more convincing; α \alphaThe smaller α is, the "smaller" the rejection domain will be.

At fixed α \alphaIn the case of α ,ppThe larger p is, the easier it is to acceptH 0 H_0H0(referred to as " ppThe larger the p , the better").
α and p-value
I feel thatα \alphaα is a pair rejectingH 0 H_0H0The "tolerance" of H 0 H_0H0"degree of doubt"; α \alphaThe smaller α is, forH 0 H_0H0The smaller the degree of suspicion, the easier it is to accept H 0 H_0H0 p p The p- values ​​are forH 0 H_0H0The minimum "suspicious degree" of , if α < p \alpha<pa<p , then thisα \alphaα toppp toH 0 H_0H0"more confident", so accept H 0 H_0H0. To sum up, it is: α \alphaThe smaller α is, forH 0 H_0H0The more confident you are.

The method of judging by rejecting the domain is called the critical value method, using ppThe p- value is calledppp -value method.

2. Hypothesis testing of normal population parameters

The test statistic follows a normal distribution    → u \;\to uu test method
The test statistic obeysχ 2 \chi^2h2 distribution    → χ 2 \;\to \chi^2h2 test method
The test statistic obeysttt distribution    → t \;\to tt test method
The test statistic obeysFFF distribution    → F \;\to FF test

For the case of a single population, we set X ~ N ( μ , σ 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu,\sigma^2)X~N ( μ ,p2 ); for the case of two populations, we assumeX ~ N ( μ 1 , σ 1 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu_1 ,\sigma_1^2)X~N ( m1,p12)Y ~ N ( μ 2 , σ 2 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}Y\td N(\mu_2,\sigma_2^2)Y~N ( m2,p22) X X X has a sample size ofnnn , the sample variance isSX 2 S_X^2SX2YYThe sample size of Y ismmm , the sample variance isSY 2 S_Y^2SY2

The following is the case of a single population ( X ~ N ( μ , σ 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu,\sigma^2)X~N ( μ ,p2))。

σ 2 is known, test the relationship between μ and μ 0\color{dodgerblue}\sigma^2\text{known, test the relationship between}\mu\text{and}\mu_0\text{}p2 known, checkμandμ0Relationship

Default settings : Note: H 0 : µ = µ 0 H 1 : µ ≠ µ 0 \begin{aligned} &\textcolor{green}{H_0:\mu=\mu_0}&\\ &\textcolor{red }{H_1:\mu\ne\mu_0}& \end{aligned}H0:m=m0H1:m=m0Recall that α \alphaα is the probability of making the first type of error, that is, whenH 0 H_0H0When established P {reject H 0 } = α P\{\text{reject}H_0\}=\alphaP { reject H0}=α . Because only the hypothesisH 0 H_0H0Established we can continue to analyze, so we must make H 0 H_0H0Established first, that is, μ = μ 0 \mu=\mu_0m=m0Variable value X ~ N ( μ 0 , σ 2 ) \newcommand{\td}{\,\text{\large\textciitilde}\,}X\td N(\mu_0,\sigma^2)X~N ( m0,p2 ), we givethe test statisticU = n ( X ‾ − μ 0 ) σ ~ N ( 0 , 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}U=\ cfrac{\sqrt{n}\left(\overline{X}-\mu_0\right)}{\sigma}\td N(0,1)U=pn (Xm0)~N(0,1) H 0 H_0 H0When was it established? is the sample mean X ‾ \overline{X}Xμ 0 \mu_0m0It is established when it is relatively close, and it is rejected when it is too outrageous. Now we have assumed H 0 H_0H0Established, as long as UUThe probability that the observed value of U falls into the rejection domain is α \alphaα will do. This is similar to interval estimation, becauseP { U ≥ u α / 2 or U ≤ − u α / 2 } = α P\{U\ge u_{\alpha/2}\text{or}U\le-u_{ \alpha/2}\}=\alphaP{ Uua / 2or Uua / 2}=α , so when rejecting it is∣ U ∣ ≥ u α / 2 |U|\ge u_{\alpha/2}Uua / 2when. For a certain sample, we calculate the UU of the sampleU value, then judge∣ U ∣ |U|U u α / 2 u_{\alpha/2}ua / 2relationship is fine. If ∣ U ∣ ≥ u α / 2 |U|\ge u_{\alpha/2}Uua / 2, then say that at H 0 H_0H0Under the established condition, a probability is only α \alphaThe event of α happened, which means thatH 0 H_0H0unlikely to hold, so we reject H 0 H_0H0. Attention, UUU measuresX ‾ \overline{X}Xμ \muThe difference of μ , if the difference is too large, rejectH 0 H_0H0 U U U divided byσ \sigmaThe purpose of σ is to remove the dimension and standardize.

Remove the default settings : H 0 : µ = µ 0 H 1 : µ ≥ µ 0 \begin{aligned} &\textcolor{green}{H_0:\mu=\mu_0}&\\ &\textcolor{red }{H_1:\mu\ge\mu_0}& \end{aligned}H0:m=m0H1:mm0Because the mean μ \muμ will not be less thanμ 0 \mu_0m0,soU ≤ − u α / 2 U\le -u_{\alpha/2}Uua / 2The rejection field of does not exist, only U ≥ some value U\ge\text{some value}UA reject field for a value . What is this value? don't forgetuuThe probability that U falls into the rejection domain isα \alphaα , so obviously this value isu α u_{\alpha}ua, rejection is U ≥ u α U\ge u_{\alpha}Uua。把 H 0 \textcolor{green}{H_0} H0Type H 0 : μ ≤ μ 0 \textcolor{green}{H_0:\mu\le\mu_0}H0:mm0, the deny domain is still the same.

σ 2 is unknown, test the relationship between μ and μ 0\color{dodgerblue}\sigma^2\text{unknown, test the relationship between}\mu\text{and}\mu_0\text{}p2 Unknown, checkμandμ0Relationship

Test statistic T = n ( X ‾ − μ 0 ) S ~ t ( n − 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}T=\cfrac{\sqrt{n }\left(\overline{X}-\mu_0\right)}{S}\td t(n-1)T=Sn (Xm0)~t(n1 )
Rejection field andσ 2 \sigma^2p2 is similar when known, that is,u α u_{\alpha}uaReplaced by t α ( n − 1 ) t_{\alpha}(n-1)ta(n1 ) That's all.

μ known, test the relationship between σ 2 and σ 0 2 \color{dodgerblue}\mu\text{known, test the relationship between}\sigma^2\text{and}\sigma_0^2\text{}μ is known, check σ2p02Relationship

Infinite value 2 = ∑ i = 1 n ( X i − µ ) 2 σ 0 2 ~ χ 2 ( n ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}\chi^ 2=\cfrac{\sum\limits_{i=1}^n{\left(X_i-\mu\right)}^2}{\sigma_0^2}\td\chi^2(n)h2=p02i=1n(Xim )2~h2(n)

μ unknown, test the relationship between σ 2 and μ 0\color{dodgerblue}\mu\text{unknown, test the relationship between}\sigma^2\text{and}\mu_0\text{}μ unknown, test σ2 andμ0Relationship

2 = ∑ i = 1 n ( X i − X ‾ ) 2 σ 0 2 = ( n − 1 ) S 2 σ 0 2 ~ χ 2 ( n − 1 ) \newcommand{\td}{\ , \text{\large\texttitle}\,}\chi^2=\cfrac{\sum\limits_{i=1}^n{\left(X_i-\overline{X}\right)}^2}{\ sigma_0^2}=\cfrac{(n-1)S^2}{\sigma_0^2}\td\chi^2(n-1)h2=p02i=1n(XiX)2=p02(n1)S2~h2(n1)

For example H 0 : σ 2 = σ 0 2 , H 1 : σ 2 ≠ µ 0 2 \textcolor{green}{H_0:\sigma^2=\sigma_0^2},\textcolor{red}{H_1: . \sigma^2\ne\mu_0^2}H0:p2=p02,H1:p2=m02Inequality { χ 2 ≤ χ 1 − α / 2 2 ( n − 1 ) } ⋃ { χ 2 ≥ χ α / 2 2 ( n − 1 ) } \textcolor{chocolate}{\left\{\chi^ 2\le\chi^2_{1-\easy/2}(n-1)\right\}\bigcup\left\{\chi^2\ge\chi^2_{\easy/2}(n-1). )\right\}}{ x2h1 a /22(n1)}{ x2ha /22(n1 ) } .
chi square test
One-sided hypothesis testH 0 : σ 2 = σ 0 2 , H 1 : σ 2 > σ 0 2 \textcolor{green}{H_0:\sigma^2=\sigma_0^2},\textcolor{red}{H_1: \sigma^2>\sigma_0^2}H0:p2=p02,H1:p2>p02The rejection region of is χ 2 ≥ χ α 2 ( n − 1 ) \chi^2\ge\chi^2_\alpha(n-1)h2ha2(n1 ) .
One-sided hypothesis testH 0 : σ 2 = σ 0 2 , H 1 : σ 2 < σ 0 2 \textcolor{green}{H_0:\sigma^2=\sigma_0^2},\textcolor{red}{H_1: \sigma^2<\sigma_0^2}H0:p2=p02,H1:p2<p02The rejection region of is χ 2 ≤ χ 1 − α 2 ( n − 1 ) \chi^2\le\chi^2_{1-\alpha}(n-1)h2h1 a2(n1)

σ 1 2 , σ 2 2 are known, test the relationship between μ 1 − μ 2 and Δ μ\color{dodgerblue}\sigma_1^2,\sigma_2^2\text{known, test}\mu_1-\mu_2\text {relationship with}\Delta\mu\text{}p12,p22Known, check μ1m2Relationship with Δ μ

设计管理量U = ( X ‾ − Y ‾ ) − Δ μ σ 1 2 n + σ 2 2 m ~ N ( 0 , 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\, }U=\cfrac{\left(\overline{X}}\overline{Y}\right)-\Delta\mu}{\sqrt{\frac{\sigma_1^2}{n}+\frac{\sigma_2 ^2}{m}}}\td N(0,1)U=np12+mp22 (XY)D m~N(0,1)

σ 1 2 = σ 2 2 unknown, test the relationship between μ 1 − μ 2 and Δ μ\color{dodgerblue}\sigma_1^2=\sigma_2^2\text{unknown, test}\mu_1-\mu_2\text{with }\Delta\mu\text{relationship}p12=p22unknown, test μ1m2Relationship with Δ μ

检验统计量 T = ( X ‾ − Y ‾ ) − Δ μ S W 1 n + 1 m   ~   t ( n + m − 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}T=\cfrac{\left(\overline{X}-\overline{Y}\right)-\Delta\mu}{S_W\sqrt{\frac{1}{n}+\frac{1}{m}}}\td t(n+m-2) T=SWn1+m1 (XY)D m~t(n+m2),其中 S W = ( n − 1 ) S X 2 + ( m − 1 ) S Y 2 n + m − 2 S_W=\sqrt{\cfrac{(n-1)S_X^2+(m-1)S_Y^2}{n+m-2}} SW=n+m2(n1)SX2+(m1)SY2

μ 1 , μ 2 are known, test the relationship between σ 1 2 σ 2 2 and c\color{dodgerblue}\mu_1,\mu_2\text{known, test}\frac{\sigma_1^2}{\sigma_2^2 The relationship between }\text{and}c\text{}m1,m2known, testedp22p12relationship with c

检验统计量 F = ∑ i = 1 n ( X i − μ 1 ) 2 n ∑ j = 1 m ( Y j − μ 2 ) 2 m / c = 1 c m ∑ i = 1 n ( X i − μ 1 ) 2 n ∑ j = 1 m ( Y j − μ 2 ) 2   ~   F ( n , m ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}F=\left.\cfrac{\sum\limits_{i=1}^n\cfrac{ {(X_i-\mu_1)}^2}{n}}{\sum\limits_{j=1}^m\cfrac{ {(Y_j-\mu_2)}^2}{m}}\right/c=\cfrac{1}{c}\cfrac{m\sum\limits_{i=1}^n{(X_i-\mu_1)}^2}{n\sum\limits_{j=1}^m{(Y_j-\mu_2)}^2}\td F(n,m) F=j=1mm(Yjm2)2i=1nn(Xim1)2/c=c1nj=1m(Yjm2)2mi=1n(Xim1)2~F(n,m)

μ 1 , μ 2 are unknown, test the relationship between σ 1 2 σ 2 2 and c\color{dodgerblue}\mu_1,\mu_2\text{unknown, test}\frac{\sigma_1^2}{\sigma_2^2}\ text{relationship with}c\text{}m1,m2unknown, testp22p12relationship with c

Test statistic F = 1 c SX 2 SY 2 ~ F ( n − 1 , m − 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}F=\cfrac{1}{ c}\cfrac{S_X^2}{S_Y^2}\td F(n-1,m-1)F=c1SY2SX2~F(n1,m1)

注意 F 1 − α ( n , m ) = 1 F α ( m , n ) F_{1-\alpha}(n,m)=\cfrac{1}{F_{\alpha}(m,n)} F1 - a(n,m)=Fa(m,n)1(There are three things to do: ① take the inverse, ② 1 − α 1-\alpha1α change\alphaα , ③ Exchangen , mn,mn,order of m ) . The rejection domain andχ 2 \chi^2h2 tests are similar.

Memory method of test statistic \color{limegreen}\text{Memory method of test statistic}Memorization of Test Statistics

  1. Single variable test mean : In order to standardize the distribution, we need to exclude the influence of three factors: mean ( μ \muμ ), sample size (nnn ), standard deviation (σ \sigmaσ orSSS ), where excluding the standard deviation also makes the data dimensionless. Therefore, corresponding to the known and unknown standard deviations respectively, we have the statisticU = n ( X ‾ − μ ) σ U=\cfrac{\sqrt{n}\left(\overline{X}-\mu\right)} {\sigma}U=pn (Xm ). T = n ( X ‾ − μ ) S T=\cfrac{\sqrt{n}\left(\overline{X}-\mu\right)}{S} T=Sn (Xm )., the former obeys N ( 0 , 1 ) N(0,1)N(0,1 ) , the latter followst ( n − 1 ) t(n-1)t(n1)

  2. Single variable test variance : Our statistics should obey the chi-square distribution, pay attention to χ 2 ( n ) \chi^2(n)hThe expectation of 2 (n)isnnn , so we require that the expectation of our statistic should bennn , n − 1 n-1when the mean is unknownn1。注意 E [ ∑ i = 1 n ( X i − μ ) 2 ] = n E [ ( X 1 − μ ) 2 ] E\left[\sum\limits_{i=1}^n{\left(X_i-\mu\right)}^2\right]=nE\left[{\left(X_1-\mu\right)}^2\right] E[i=1n(Xim )2]=and E[(X1m )2],而 μ = E ( X 1 ) \mu=E(X_1) m=E ( X1) , so according to the definition of varianceE [ ( X 1 − μ ) 2 ] = σ 2 E\left[{\left(X_1-\mu\right)}^2\right]=\sigma^2E[(X1m )2]=p2,因此 E [ ∑ i = 1 n ( X i − μ ) 2 ] = n σ 2 E\left[\sum\limits_{i=1}^n{\left(X_i-\mu\right)}^2\right]=n\sigma^2 E[i=1n(Xim )2]=n p2 . At the same time we know,E ( S 2 ) = σ 2 E\left(S^2\right)=\sigma^2E(S2)=p2,即 E [ ∑ i = 1 n ( X i − X ‾ ) 2 ] = ( n − 1 ) σ 2 E\left[\sum\limits_{i=1}^n{\left(X_i-\overline{X}\right)}^2\right]=(n-1)\sigma^2 E[i=1n(XiX)2]=(n1 ) p2 . Therefore, the two test statistics we use are∑ i = 1 n ( X i − μ ) 2 σ 2 \cfrac{\sum\limits_{i=1}^n{\left(X_i-\mu\right )}^2}{\sigma^2}p2i=1n(Xim )2 ∑ i = 1 n ( X i − μ ) 2 σ 2 \cfrac{\sum\limits_{i=1}^n{\left(X_i-\mu\right)}^2}{\sigma^2} p2i=1n(Xim )2, subject to nn degrees of freedom respectivelyn andn − 1 n-1nChi-square distribution of 1 . It can be understood as: in the test statistics we use, the dimension is square and the expectation isnnn is subject toχ 2 ( n ) \chi^2(n)h2(n)

  3. Two variables test mean difference : note that X ‾ − Y ‾ ~ N ( μ 1 − μ 2 , σ 1 2 n + σ 2 2 m ) \newcommand{\td}{\,\text{\large\textasciitilde}\ ,}\overline{X}-\overline{Y}\td N\left(\mu_1-\mu_2,\frac{\sigma_1^2}{n}+\frac{\sigma_2^2}{m}\right )XY~N( m1m2,np12+mp22) . Imitate the situation when a single variable is tested for the mean, and the situation where the variance is known will not be repeated here; for the situation where the variance is unknown, it is required thatσ 1 2 = σ 2 2 \sigma_1^2=\sigma_2^2p12=p22, for convenience, we denote it as σ 2 \sigma^2p2。此时 D ( X ‾ − Y ‾ ) = σ 2 ( 1 n + 1 m ) D\left(\overline{X}-\overline{Y}\right)=\sigma^2\left(\frac{1}{n}+\frac{1}{m}\right) D(XY)=p2(n1+m1) , the test statistic used should be( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) σ 1 n + 1 m \cfrac{\left(\overline{X}-\overline{Y}\right) -(\mu_1-\mu_2)}{\sigma\sqrt{\frac{1}{n}+\frac{1}{m}}}pn1+m1 (XY)( m1m2), but σ \sigmaσ is unknown, so we have to putσ \sigmaσ is replaced by a S_X withSXSXSY S_YSYThe estimated value indicated. How to estimate it? Let us estimate σ 2 \sigma^2pThe quantity of 2 isSW 2 S_W^2SW2. First, this SW 2 S_W^2SW2The expectation must be σ 2 \sigma^2p2 ; secondly, it is divided byσ 2 \sigma^2pAfter 2 must obeyχ 2 ( n + m − 2 ) \chi^2(n+m-2)h2(n+m2 ) (requires degrees of freedom to be full). Let's think about it,∑ i = 1 n ( X i − X ‾ ) 2 + ∑ j = 1 m ( Y j − Y ‾ ) 2 \sum\limits_{i=1}^n{\left(X_i-\overline {X}\right)}^2+\sum\limits_{j=1}^m{\left(Y_j-\overline{Y}\right)}^2i=1n(XiX)2+j=1m(YjY)2 what is it? It is actually( n − 1 ) SX 2 + ( m − 1 ) SY 2 (n-1)S_X^2+(m-1)S_Y^2(n1)SX2+(m1)SY2,Let ( n − 1 ) SX 2 + ( m − 1 ) SY 2 σ 2 ~ χ 2 ( n + m − 2 ) cfrac{(n-1)S_X^2+(m-1)S_Y^2}{\sigma^2}\td \chi^2(n+m-2)p2(n1)SX2+(m1)SY2~h2(n+m2) E ( ( n − 1 ) S X 2 + ( m − 1 ) S Y 2 n + m − 2 ) = σ 2 E\left(\cfrac{(n-1)S_X^2+(m-1)S_Y^2}{n+m-2}\right)=\sigma^2 E(n+m2(n1)SX2+(m1)SY2)=p2 . So, we useSW = ( n − 1 ) SX 2 + ( m − 1 ) SY 2 n + m − 2 S_W=\sqrt{\cfrac{(n-1)S_X^2+(m-1)S_Y ^2}{n+m-2}}SW=n+m2(n1)SX2+(m1)SY2 To replace the σ \sigma in the test statisticσ , getT = ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) SW 1 n + 1 m T=\cfrac{\left(\overline{X}}\overline{Y}\right)-( \mu_1-\mu_2)}{S_W\sqrt{\frac{1}{n}+\frac{1}{m}}}T=SWn1+m1 (XY)( m1m2)

  4. Two variables test variance ratio : in fact, the test statistic is a measured variance ratio.\cfrac{\text{measured variance ratio}}{\text{required variance ratio}}required variance ratioMeasured Variance Ratioform. There is another way of understanding: we know that χ X 2 = ( n − 1 ) SX 2 σ 1 2 ~ χ 2 ( n − 1 ) \chi^2_X=\newcommand{\td}{\,\text{\large\ textasciitilde}\,}\cfrac{(n-1)S_X^2}{\sigma_1^2}\td\chi^2(n-1)hX2=p12(n1)SX2~h2(n1 ) ,χ Y 2 = ( m − 1 ) SY 2 σ 2 2 ~ χ 2 ( m − 1 ) \chi^2_Y=\newcommand{\td}{\,\text{\large\texttitle}\,} \cfrac{(m-1)S_Y^2}{\sigma_2^2}\td\chi^2(m-1)hY2=p22(m1)SY2~h2(m1 ) , according toFFWe know the properties of the F distribution χ X 2 / ( n − 1 ) χ Y 2 / ( m − 1 ) ~ F ( n − 1 , m − 1 ) \newcommand{\td}{\,\text{\large\ textasciitilde}\,}\cfrac{\chi^2_X/(n-1)}{\chi^2_Y/(m-1)}\td F(n-1,m-1)hY2/(m1)hX2/(n1)~F(n1,m1 ) ,letSX 2 / σ 1 2 SY 2 / σ 2 2 ~ F ( n − 1 , m − 1 ) \newcommand{\td}{\,\text{\large\textsciitilde}\,}\cfrac {S_X^2/\sigma_1^2}{S_Y^2/\sigma_2^2}\td F(n-1,m-1)SY2/ p22SX2/ p12~F(n1,m1)

Hypothesis testing on paired data\textcolor{orchid}{\text{Hypothesis testing on paired data}}Hypothesis Testing on Paired Data

Sometimes, X 1 , X 2 , ⋯ , X n X_1,X_2,\cdots,X_nX1,X2,,XnY 1 , Y 2 , ⋯ , Y n Y_1,Y_2,\cdots,Y_nY1,Y2,,Ynare not necessarily independent of each other, X i X_iXiGive Y i Y_iYimay have a strong correlation, but different (X i , Y i ) (X_i,Y_i)(Xi,Yi) are not related. How do we examine the mean difference in this case?

with nnn pairs of mutually independent samples( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ⋯ , ( X n , Y n ) (X_1,Y_1),(X_2,Y_2),\cdots,(X_n,Y_n )(X1,Y1),(X2,Y2),,(Xn,Yn),令 Z i = Y i − X i   ( i = 1 , 2 , ⋯   , n ) Z_i=Y_i-X_i\,(i=1,2,\cdots,n) Zi=YiXi(i=1,2,,n),显然 Z 1 , Z 2 , ⋯   , Z n Z_1,Z_2,\cdots,Z_n Z1,Z2,,ZnIndependent. Let Z 1 , Z 2 , ⋯ , Z n Z_1,Z_2,\cdots,Z_nZ1,Z2,,Znis from the population N ( μ , σ 2 ) N(\mu,\sigma^2)N ( μ ,p2 ), thenthe μ \muμ for hypothesis testing. In generalσ 2 \sigma^2p2 is unknown, so the test statistic we choose is generallyT = n ( X ‾ − μ 0 ) ST=\cfrac{\sqrt{n}\left(\overline{X}-\mu_0\right)}{ S}T=Sn (Xm0)

3. Distribution hypothesis testing

1. The concept of distribution fitting test

Distribution fitting test : Let ( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) is from the overallXXThe sample of X , according to the sample test hypothesis H 0 : the distribution function of X is F ( x ) H 1 : the distribution function of X is not F ( x ) \begin{aligned} &\textcolor{green}{H_0:X\ The distribution function of text{is}F(x)}&\\ &\textcolor{red}{H_1: The distribution function of X\text{is not}F(x)}& \end{aligned}H0:The distribution function of X is F ( x )H1:The distribution function of X is not F ( x )Here F ( x ) F(x)F ( x ) isa theoretical distribution function, which can be known, or a function whose form is known but contains unknown parameters. The distribution fitting test is to investigate the use ofF ( x ) F(x)F ( x ) fitXXHow good is the fit for the distribution of X ?

2. Pearson's Theorem

Pearson's theorem Let a random experiment rrr resultsA 1 , A 2 , ⋯ , A r A_1,A_2,\cdots,A_rA1,A2,,ArConstitute a mutually exclusive complete event group, and their occurrence probabilities in one trial are p 1 , p 2 , ⋯ , pr p_1,p_2,\cdots,p_rp1,p2,,pr,其中 p i > 0   ( i = 1 , 2 , ⋯   , r ) p_i>0\,(i=1,2,\cdots,r) pi>0(i=1,2,,r),且 ∑ i = 1 r p i = 1 \sum\limits_{i=1}^r p_i=1 i=1rpi=1。以 m i m_i miexpressed in nnA i A_i in n independent repeated experimentsAiThe number of occurrences, then when n → ∞ n\to\inftyn random variableχ 2 = ∑ i = 1 r ( mi − npi ) 2 npi \chi^2=\sum\limits_{i=1}^r \cfrac{ {(m_i-np_i)}^2}{ np_i }h2=i=1rnpi(minpi)2The distribution of converges to degrees of freedom r − 1 r-1rχ 2 \chi^2of 1h2 distribution.

3. χ 2 \chi^2h2 Fitting test method

(1) Theoretical distribution F ( x ) F(x)The case where F ( x ) is completely known

X 1 , X 2 , ⋯   , X n X_1,X_2,\cdots,X_n X1,X2,,Xnis from overall XXA sample of X ,F ( x ) F(x)F ( x ) is a fully known distribution function, at the significance levelα \alphaUnder α , test the hypothesis H 0 : the distribution function of X is F ( x ) H 1 : the distribution function of X is not the distribution function of F ( x ) \begin{aligned} &\textcolor{green}{H_0:X\text{ The distribution function of }F(x)}&\\ &\textcolor{red}{H_1:X\text{is not}F(x)}& \end{aligned}H0:The distribution function of X is F ( x )H1:The distribution function of X is not F ( x )Methods as below:

  1. will overall XXThe value range of X is divided intorrr mutually disjoint subsets (intervals)A 1 , A 2 , ⋯ , A r A_1,A_2,\cdots,A_rA1,A2,,Ar, then it constitutes a mutually exclusive complete event group ( rrr is not large,nnn is large). Note that the sample size is large, generallyn ≥ 50 n\ge 50n50 ; when grouping, the number of samples contained in each intervalmi ≥ 5 m_i\ge 5mi5
  2. at H 0 H_0H0Under the premise that holds, calculate the event A i A_iAiThe frequency of P ( A i ) = pi P(A_i)=p_iP(Ai)=pi, then event A i A_iAiThe frequency of occurrence is npi np_inpi
  3. Count the sample values ​​x 1 , x 2 , ⋯ , xn x_1,x_2,\cdots,x_nx1,x2,,xnmiddle, incident A i A_iAiThe actual frequency of occurrence mi m_imi
  4. At the significance level α \alphaUnder α , consider the statisticχ 2 = ∑ i = 1 r ( mi − npi ) 2 npi \chi^2=\sum\limits_{i=1}^r \cfrac{ {(m_i-np_i)}^2 } {np_i}h2=i=1rnpi(minpi)2. If the data does obey F ( x ) F(x)The distribution specified by F ( x ) , then χ 2 \chi^2h2 should be as small as possible, so its rejection domain should beχ 2 \chi^2h2 is greater than or equal to a certain number. Therefore, the rejection region isχ 2 ≥ χ α 2 ( r − 1 ) \chi^2\ge\chi^2_{\alpha}(r-1)h2ha2(r1)

(2) The theoretical distribution contains unknown parameters

Functional function F ( x ; θ 1 , θ 2 , ⋯ , θ l ) F(x;\theta_1,\theta_2,\cdots,\theta_l);F(x;i1,i2,,il) ,includeθ1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilis an unknown parameter. We'll do two things before continuing with the process when the theoretical distribution is known:

  1. According to samples X 1 , X 2 , ⋯ , X n X_1,X_2,\cdots,X_nX1,X2,,XnRangeθ 1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilThe maximum likelihood estimates of θ ^ 1 , θ ^ 2 , ⋯ , θ ^ l \hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_li^1,i^2,,i^l,the function F ( x ; θ^1, θ^2, ⋯, θ^l) F(x;\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta} _l)F(x;i^1,i^2,,i^l) is treated as a fully known distribution function;
  2. Don't forget to put χ 2 \chi^2h2 degrees of freedom is changed tor − l − 1 rl-1rl1 ! ! ! This is a modification of Pearson's theorem. The rejection region isχ 2 ≥ χ α 2 ( r − l − 1 ) \chi^2\ge\chi^2_\alpha(rl-1)h2ha2(rl1 ) . It can be understood that we need to start fromnnDraw ll out of n samplesl to make up for the unknown parameters, which caused the loss of degrees of freedom, so the final degrees of freedom becomer − l − 1 rl-1rl1

Guess you like

Origin blog.csdn.net/qaqwqaqwq/article/details/128519204