[Probability Theory] Final Review Notes: Parameter Estimation

1. Point estimate

1. The concept of estimators

Point estimation : Let the population XXThe distribution function of X is F ( x ; θ 1 , θ 2 , ⋯ , θ l ) F(x;\theta_1,\theta_2,\cdots,\theta_l)F(x;i1,i2,,il) ,includeθ1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilis the unknown parameter to be estimated, ( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) is from the overallXXX的样本, ( x 1 , x 2 , ⋯   , x n ) (x_1,x_2,\cdots,x_n) (x1,x2,,xn)是相应的样本值,点估计问题就是要构造 l l l个适当的统计量 θ ^ i ( X 1 , X 2 , ⋯   , X n )   ( i = 1 , 2 , ⋯   , l ) \hat{\theta}_i(X_1,X_2,\cdots,X_n)\,(i=1,2,\cdots,l) θ^i(X1,X2,,Xn)(i=1,2,,l),分别用观测值 θ ^ i ( x 1 , x 2 , ⋯   , x n ) \hat{\theta}_i(x_1,x_2,\cdots,x_n) θ^i(x1,x2,,xn)作为未知参数 θ i \theta_i θi的估计值。
估计量:估计用的统计量 θ ^ i ( X 1 , X 2 , ⋯   , X n ) \hat{\theta}_i(X_1,X_2,\cdots,X_n) θ^i(X1,X2,,Xn)
estimated value:the observed valuethe estimator θ ^ i ( x 1 , x 2 , ⋯ , xn ) \hat{\theta}_i(x_1,x_2,\cdots,x_n)i^i(x1,x2,,xn) are collectively referred to as estimators and estimates
as estimates without confusion, and are abbreviated asθ ^ i \hat{\theta}_ii^i

The estimator is a function of the sample and is a random variable, and the estimated values ​​obtained by different sample values ​​are often different.

2. Finding method of estimator

method of moments

Let the overall XXex llof xL -order origin momentα k = E ( X k ) ( k = 1 , 2 , ⋯ , l ) \alpha_k=E\left(X^k\right)\,(k=1,2,\cdots,l)ak=E(Xk)(k=1,2,,l ) Specific,dimensionθ1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilFor example, k = α k ( θ 1 , θ 2 , ⋯ , θ l ) \alpha_k=\alpha_k(\theta_1,\theta_2,\cdots,\theta_l)ak=ak( i1,i2,,il) . Replace the overall origin moment with the sample origin moment (α k → A k \alpha_k\to A_kakAk), the unknown parameters are replaced by their estimators ( θ i → θ ^ i \theta_i\to\hat{\theta}_iiii^i)Then α 1 ( θ ^ 1 , θ ^ 2 , ⋯ , θ ^ l ) = A 1 α 2 ( θ ^ 1 , θ ^ 2 , ⋯ , θ ^ l ) = A 2 ⋯ α l ( θ ^ 1 , θ ^ 2 , ⋯ , θ ^ l ) = A l \begin{cases}\alpha_1\left(\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta} _l\right)=A_1\\ \alpha_2\left(\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_l\right)=A_2\\\cdots\\ \alpha_l\left(\hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_l\right)=A_l \end{cases} a1(i^1,i^2,,i^l)=A1a2(i^1,i^2,,i^l)=A2al(i^1,i^2,,i^l)=AlSolving this system of equations can get θ ^ 1 , θ ^ 2 , ⋯ , θ ^ l \hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_li^1,i^2,,i^l(是 A 1 , A 2 , ⋯   , A k A_1,A_2,\cdots,A_k A1,A2,,Akfunction), and take them as θ 1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilestimate of . A 1 A_1A1Generally written X ‾ \overline{X}X

The theoretical basis of the moment estimation method is the law of large numbers, when nnWhen n is sufficiently large, the sample momentA k A_kAkfalls in the overall moment α k \alpha_k with a high probabilityakThe vicinity of , so A k A_k can be usedAkAs α k \alpha_kakThe moment estimator of .

X   ~   U ( 0 , θ ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td U(0,\theta) X~U(0,θ ) ,ifθ\thetaThe moment estimator of θ .
Solution: We know that α 1 = E ( X ) = θ 2 \alpha_1=E(X)=\frac{\theta}{2}a1=E ( X )=2iPut α 1 \alpha_1a1Change to A 1 A_1A1 X ‾ \overline{X} X), θ \theta θ is replaced byθ ^ \hat{\theta}i^ X ‾ = θ ^ 2 \overline{X}=\frac{\hat{\theta}}{2} X=2i^Therefore θ \thetaThe moment estimator of θ is θ ^ = 2 X ‾ \hat{\theta}=2\overline{X}i^=2X

The moment estimation method does not need to know the distribution of the population. The advantage is that it is simple and direct, but the disadvantage is that it only uses the local characteristics of the population and does not make full use of the information of the population.

Maximum Likelihood Estimation

Idea: If there is a certain distribution, so that under this distribution ( x 1 , x 2 , ⋯ , xn ) (x_1,x_2,\cdots,x_n)(x1,x2,,xn) has the largest probability, then( x 1 , x 2 , ⋯ , xn ) (x_1,x_2,\cdots,x_n)(x1,x2,,xn) from this distribution.

Likelihood function : if the overall XXX is a discrete or continuous random variable, and its distribution law isP { X = x } = p ( x ; θ 1 , θ 2 , ⋯ , θ l ) P\{X=x\}=p(x;\ theta_1,\theta_2,\cdots,\theta_l)P{ X=x}=p(x;i1,i2,,il) , or its probability density isf ( x ; θ 1 , θ 2 , ⋯ , θ l ) f(x;\theta_1,\theta_2,\cdots,\theta_l)f(x;i1,i2,,il) ,includeθ1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilis an unknown parameter, in the parameter space Θ \ThetaValue in Θ , variable xxx in the random variableXXTake a value within the range of possible values ​​for X. Let( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) is from the overallXXA sample of X , then( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) Feedback L ( x 1 , x 2 , ⋯ , xn ; θ 1 , θ 2 , ⋯ , θ l ) = P { X 1 = x 1 , X 2 = x 2 , ⋯ , X n = xn } = ∏ i = 1 np ( xi ; θ 1 , θ 2 , ⋯ , θ l ) \begin{aligned} L(x_1,x_2,\cdots,x_n;\theta_1,\theta_2,\cdots,\theta_l)&= P\{X_1=x_1,X_2=x_2,\cdots,X_n=x_n\}\\&=\prod\limits_{i=1}^np(x_i;\theta_1,\theta_2,\cdots,\theta_l)\ end{aligned}L(x1,x2,,xn;i1,i2,,il)=P{ X1=x1,X2=x2,,Xn=xn}=i=1np(xi;i1,i2,,il)Let L ( x 1 , x 2 , ⋯ , xn ; θ 1 , θ 2 , ⋯ , θ l ) = ∏ i = 1 nf ( xi ; θ 1 , θ 2 , ⋯ , θ l ) L(x_1 ,x_2,\cdots,x_n;\theta_1,\theta_2,\cdots,\theta_l)=\prod\limits_{i=1}^nf(x_i;\theta_1,\theta_2,\cdots,\theta_l);L(x1,x2,,xn;i1,i2,,il)=i=1nf(xi;i1,i2,,il) when fixed( x 1 , x 2 , ⋯ , xn ) (x_1,x_2,\cdots,x_n)(x1,x2,,xn) , putLLLevels θ 1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilis defined in Θ \ThetaΘ , it is called parametersθ 1 , θ 2 , ⋯ , θ l \theta_1,\theta_2,\cdots,\theta_li1,i2,,ilThe likelihood function of , and abbreviated as L ( θ 1 , θ 2 , ⋯ , θ l ) L(\theta_1,\theta_2,\cdots,\theta_l)L ( i1,i2,,il) . That is:the likelihood function is the distribution law/probability density of the sample, and then regarded as a function of the parameters.
Log likelihood function: logarithm of the likelihood functionln ⁡ L ( θ 1 , θ 2 , ⋯ , θ l ) \ln L(\theta_1,\theta_2,\cdots,\theta_l)lnL ( i1,i2,,il) is called the log-likelihood function.

Maximum likelihood estimation method : get sample values ​​( x 1 , x 2 , ⋯ , xn ) (x_1,x_2,\cdots,x_n)(x1,x2,,xn)后,inθ^ 1 , θ ^ 2 , ⋯ , θ ^ n \hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_ni^1,i^2,,i^nLet L ( θ ^ 1 , θ ^ 2 , ⋯ , θ ^ n ) = max ⁡ ( θ 1 , θ 2 , ⋯ , θ l ) ∈ Θ L ( θ 1 , θ 2 , ⋯ , θ l ) L(\ hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_n)=\max\limits_{(\theta_1,\theta_2,\cdots,\theta_l)\in\Theta} L(\theta_1,\theta_2,\cdots,\theta_l)L(i^1,i^2,,i^n)=( i1, i2,, il)ΘmaxL ( i1,i2,,il) obtained in this wayθ ^ 1 , θ ^ 2 , ⋯ , θ ^ n \hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_ni^1,i^2,,i^nwith sample values ​​( x 1 , x 2 , ⋯ , xn ) (x_1,x_2,\cdots,x_n)(x1,x2,,xn) with respect toθ ^ i = θ ^ i ( x 1 , x 2 , ⋯ , xn ) \hat{\theta}_i=\hat{\theta}_i(x_1,x_2,\cdots,x_n)i^i=i^i(x1,x2,,xn) , and called parameterθ i ( i = 1 , 2 , ⋯ , l ) \theta_i\,(i=1,2,\cdots,l)ii(i=1,2,,l ) ,and the corresponding statisticsθ^ i = θ ^ i ( X 1 , X 2 , ⋯ , X n ) ( i = 1 , 2 , ⋯ , l ) \hat{\theta} _i=\hat{\theta}_i(X_1,X_2,\cdots,X_n)\,(i=1,2,\cdots,l)i^i=i^i(X1,X2,,Xn)(i=1,2,,l ) is called the parameterθ i \theta_iiiThe maximum likelihood estimator of .

Since ln ⁡ x \ln xlnx isxxx的单调增函数,所以 L L L取最大的时候 ln ⁡ L \ln L lnL也取最大,我们也可以考察 ln ⁡ L \ln L lnL的最大值。

在很多时候, L L L ln ⁡ L \ln L lnL关于参数 θ 1 , θ 2 , ⋯   , θ l \theta_1,\theta_2,\cdots,\theta_l θ1,θ2,,θl的偏导数存在,此时 θ ^ 1 , θ ^ 2 , ⋯   , θ ^ n \hat{\theta}_1,\hat{\theta}_2,\cdots,\hat{\theta}_n θ^1,θ^2,,θ^n可从似然方程{ ∂ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ 1 = 0 ∂ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ 2 = 0 ⋯ ∂ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ l = 0 \begin{cases} \cfrac{\partial L(\theta_1,\theta_2,\cdots,\theta_l)}{\partial\theta_1}=0\\ \ cfrac{\partial L(\theta_1,\theta_2,\cdots,\theta_l)}{\partial\theta_2}=0\\\cdots\\ \cfrac{\partial L(\theta_1,\theta_2,\cdots,\ theta_l)}{\partial\theta_l}=0 \end{cases} θ1L ( i1,i2,,il)=0θ2L ( i1,i2,,il)=0θlL ( i1,i2,,il)=0Constraints { ∂ ln ⁡ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ 1 = 0 ∂ ln ⁡ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ 2 =∂ ln ⁡ L ( θ 1 , θ 2 , ⋯ , θ l ) ∂ θ l = 0 \begin{cases} \cfrac{\partial\ln L(\theta_1,\theta_2,\cdots,\theta_l)}{\ partial\theta_1}=0\\ \cfrac{\partial\ln L(\theta_1,\theta_2,\cdots,\theta_l)}{\partial\theta_2}=0\\\cdots\\\cfrac{\partial\ ln L(\theta_1,\theta_2,\cdots,\theta_l)}{\partial\theta_l}=0 \end{cases} θ1lnL ( i1,i2,,il)=0θ2lnL ( i1,i2,,il)=0θllnL ( i1,i2,,il)=0solved in.

Definition X ~ U ( 0 , θ ) \newcommand{\td}{\,\text{\large\texttitle}\,}X\td U(0,\theta )X~U(0,θ ) ,ifθ\thetaThe maximum likelihood estimator of θ .
Solution:XXThe probability density of X is f ( x ; θ ) = { 1 θ , 0 ≤ x ≤ θ 0 , other f(x;\theta)=\begin{cases}\frac{1}{\theta},&0\le x\le\theta\\0,&\text{other}\end{cases}f(x;i )={ i1,0,0xiOtherThen sample ( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) is f ( x 1 , x 2 , ⋯ , xn ; θ ) = ∏ i = 1 nf ( xi ; θ ) = { 1 θ n , 0 ≤ x 1 , x 2 , ⋯ , xn ≤ θ 0 , other f(x_1,x_2,\cdots,x_n;\theta)=\prod\limits_{i=1}^nf(x_i;\theta)=\begin{cases} \frac{1}{\theta^ n},&0\le x_1,x_2,\cdots,x_n\le\theta\\ 0,&\text{other} \end{cases}f(x1,x2,,xn;i )=i=1nf(xi;i )={ in1,0,0x1,x2,,xniOtherThink of it as θ \thetaFunction of θ (x 1 , x 2 , ⋯ , xn x_1,x_2,\cdots,x_nx1,x2,,xnis known), then θ \thetaθ的似然函数为 L ( θ ) = { 1 θ n , θ ≥ max ⁡ { x 1 , x 2 , ⋯   , x n } 0 , 其他 L(\theta)=\begin{cases} \frac{1}{\theta^n},&\theta\ge\max\{x_1,x_2,\cdots,x_n\}\\ 0,&\text{其他} \end{cases} L(θ)={ θn1,0,θmax{ x1,x2,,xn}其他这个函数我们不用求导就能求出最大值。首先,它在 θ ≥ max ⁡ { x 1 , x 2 , ⋯   , x n } \theta\ge\max\{x_1,x_2,\cdots,x_n\} θmax{ x1,x2,,xn}时才是正数;其次,在 θ \theta θ满足这个条件的情况下,因为 θ n \theta^n θn在分母,所以我们希望 θ \theta θ should be as small as possible. So whenθ = max ⁡ { x 1 , x 2 , ⋯ , xn } \theta=\max\{x_1,x_2,\cdots,x_n\}i=max{ x1,x2,,xn}L ( θ ) L(\theta)L ( θ ) takes the maximum value. θ \thetaThe maximum likelihood estimator of θ is θ ^ = X ( n ) \hat{\theta}=X_{(n)}i^=X(n). This is different from the estimator obtained by the method of moments estimation.

2. Estimated selection criteria

1. Unbiasedness

Unbiased estimator : Let ( X 1 , X 2 , ⋯ , X n ) (X_1,X_2,\cdots,X_n)(X1,X2,,Xn) is from the overallXXA sample of X , θ \thetaθ is contained inXXUnknown parameters in the distribution of X , θ \thetaThe value range of θ isΘ \ThetaΘ θ ^ = θ ^ ( X 1 , X 2 , ⋯   , X n ) \hat{\theta}=\hat{\theta}(X_1,X_2,\cdots,X_n) i^=i^(X1,X2,,Xn)θ \thetaAn estimator of θ . If∀ θ ∈ Θ \forall\theta\in\ThetaθΘ ,E ( θ ^ ) = θ E\left(\hat{\theta}\right)=\thetaE(i^)=θ , thenθ ^ \hat{\theta}i^ flow\thetaAn unbiased estimator of θ .
Biased estimator: A biased estimator, where the bias (referred to as bias) is equal toE ( θ ^ ) − θ E\left(\hat{\theta}\right)-\thetaE(i^)θ
渐进无偏估计量:若 E ( θ ^ ) − θ ≠ 0 E\left(\hat{\theta}\right)-\theta\ne0 E(θ^)θ=0,但当样本容量 n → ∞ n\to\infty n时,有 lim ⁡ n → ∞ [ E ( θ ^ ) − θ ] = 0 \lim\limits_{n\to\infty}\left[E\left(\hat{\theta}\right)-\theta\right]=0 nlim[E(θ^)θ]=0,则称 θ ^ \hat{\theta} θ^ θ \theta θ的渐近无偏估计量。

( X 1 , X 2 , ⋯   , X n ) (X_1,X_2,\cdots,X_n) (X1,X2,,Xn)是来自总体 X X X的样本,无论 X X What distribution does X obey?
(1) IfE ( X ) = μ E(X)=\muE ( X )=μ exists, then the sample meanX ‾ \overline{X}XIs E ( X ) E(X)Unbiased estimator of E ( X )
; (2) IfD ( X ) = σ 2 D(X)=\sigma^2D(X)=p2 exists, then the sample varianceS 2 S^2S2 isσ 2 \sigma^2pThe unbiased estimator of 2
; (3) If the populationkkk slightlyE ( X k ) = α k E\left(X^k\right)=\alpha_kE(Xk)=akexists, then kkK -order sample origin momentA k = 1 k ∑ i = 1 n X ik A_k=\frac{1}{k}\sum\limits_{i=1}^n X_i^kAk=k1i=1nXikis kkK -order overall origin momentα k \alpha_kakunbiased estimator of .

Example It can be proved that if the population X ~ U ( 0 , θ ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td U(0,\theta)X~U(0,θ ) , parameterθ > 0 \theta>0i>0 , then2 X ‾ 2\overline{X}2Xn + 1 n X ( n ) \frac{n+1}{n}X_{(n)}nn+1X(n)Both are θ \thetaAn unbiased estimator of θ .

Although S 2 S^2S2 isσ 2 \sigma^2pAn unbiased estimator of 2 , but SSS is notσ \sigmaThe value of σ and its value, n − 1 2 Γ ( n − 1 2 ) Γ ( n 2 ) S \sqrt{\frac{n-1}{2}}\frac{\Gamma\left(\frac{n -1}{2}\right)}{\Gamma\left(\frac{n}{2}\right)}S2n1 C (2n)C (2n1)S isσ \sigmaAn unbiased estimator of σ . This shows that ifθ ^ \hat{\theta}i^ flow\thetaThe unbiased estimator of θ , in general,g ( θ ^ ) g\left(\hat{\theta}\right)g(i^ )is notθ \thetaunbiased estimator of θ , unless ggg is a linear function.

2. Effectiveness

The unbiased estimator is not necessarily unique, so we need to select the one with the most concentrated value, that is, the one with the smallest variance as the best estimator.

Validity : Let θ ^ 1 \hat{\theta}_1i^1θ^2 \hat{\theta}_2i^2Both are θ \thetaThe unbiased estimator of θ , if D ( θ ^ 1 ) ≤ D ( θ ^ 2 ) D\left(\hat{\theta}_1\right)\le D\left(\hat{\theta}_2\right )D(i^1)D(i^2) , thenθ ^ 1 \hat{\theta}_1i^1Page ^2 \hat{\theta}_2i^2efficient.

Definition X ~ U ( 0 , θ ) \newcommand{\td}{\,\text{\large\texttitle}\,}X\td U(0,\theta )X~U(0,θ ) ,则θ ^ 2 = n + 1 n X ( n ) \hat{\theta}_2=\frac{n+1}{n}X_{(n)}i^2=nn+1X(n) θ ^ 1 = 2 X ‾ \hat{\theta}_1=2\overline{X} θ^1=2X有效( D ( θ ^ 1 ) = θ 2 3 n > D ( θ ^ 2 ) = θ 2 n ( n + 2 ) D\left(\hat{\theta}_1\right)=\frac{\theta^2}{3n}>D\left(\hat{\theta}_2\right)=\frac{\theta^2}{n(n+2)} D(θ^1)=3nθ2>D(θ^2)=n(n+2)θ2)。

最小方差无偏估计量:在所有估计量中方差最小的无偏估计量

3. 相合性

相合估计量/一致估计量:设 θ ^ = θ ^ ( X 1 , X 2 , ⋯   , X n ) \hat{\theta}=\hat{\theta}(X_1,X_2,\cdots,X_n) θ^=θ^(X1,X2,,Xn)是参数 θ \theta θ的估计量,如果当 n → ∞ n\to\infty n时, θ ^ \hat{\theta} θ^ converges toθ \thetaθ ,for ε > 0 , lim ⁡ n → ∞ P { ∣ θ ^ − θ ∣ < ε } = 1 \forall\varepsilon>0,\,\lim\limits_{n\to\infty}P\left\ {\left|\hat{\theta}-\theta\right|<\varepsilon\right\}=1ε>0,nlimP{ i^i <e }=1 is calledθ ^ \hat{\theta}i^ forθ \thetaConsistent estimator/consistent estimator of θ , and remember ( p ) lim ⁡ n → ∞ θ ^ = θ (p)\lim\limits_{n\to\infty}\hat{\theta}=\theta(p)nlimi^=θθ ^ ⟶ P θ ( n → ∞ ) \hat{\theta}\translate{P}{\longrightarrow}\theta(n\to\infty)i^Pi ( n) .
Mean square consistent estimator: if whenn → ∞ n\to\inftyn时,θ^\hat{\theta}i^ Mean square converges toθ \thetaθ ,in limit ⁡ n → ∞ E [ ( θ ^ − θ ) 2 ] = 0 \lim\limits_{n\to\infty}E\left[{\left(\hat{\theta}-\theta\right )}^2\right]=0nlimE[(i^i )2]=0 is calledθ ^ \hat{\theta}i^ forθ \thetaThe mean square consistent estimator of θ , and record ( m. s. ) lim ⁡ n → ∞ θ ^ = θ \newcommand{\ms}{(\text{ms })}\ms\lim\limits_{n\to\infty} \hat{\theta}=\theta(m. s. )nlimi^=θθ ^ ⟶ L 2 θ ( n → ∞ ) \hat{\theta}\translate{L^2}{\longrightarrow}\theta(n\to\infty)i^L2i ( n)

Consistency is the most basic requirement for an estimator, which requires that when the sample size increases infinitely, the estimator can be used to estimate parameters with arbitrarily small precision.

It can be proved that common moment estimators are consistent estimators (eg A k → α k A_k\to\alpha_kAkak X ‾ → E ( X ) \overline{X}\to E(X) XE(X) S 2 → σ 2 S^2\to\sigma^2 S2p2 S → σ S\to\sigma Sσ ). A mean-square consistent estimator must be a consistent estimator, but not necessarily vice versa.

An important fact: Let the overall XXThe distribution function of X isF ( x ) F(x)F(x) ( X 1 , X 2 , ⋯   , X n ) (X_1,X_2,\cdots,X_n) (X1,X2,,Xn) is from the overallXXX的样本, F n ( X ) = 1 n ∑ i = 1 n [ X i ≤ x ] F_n(X)=\frac{1}{n}\sum\limits_{i=1}^n [X_i\le x] Fn(X)=n1i=1n[Xix ] is an empirical distribution function, then for any fixedxxx F n ( x ) F_n(x) Fn( x ) isF ( x ) F(x)Unbiased estimator, consistent estimator, mean square consistent estimator of F ( x ) .

Summarize

Form : E ( θ ^ ) = θ E\left(\hat{\theta}\right)=\thetaE(i^)=θ
Validity: The smaller the variance, the better
Consistency: Convergence according to probability (when the sample size is large enough, the gap between the estimated value and the real value can be arbitrarily small)

3. Interval estimation

1. Two-sided interval estimation

P { θ ^ 1 ( X 1 , X 2 , ⋯   , X n ) < θ < θ ^ 2 ( X 1 , X 2 , ⋯   , X n ) } = 1 − α ⇓ \underset{\large\Downarrow}{P\left\{ \hat{\theta}_1(X_1,X_2,\cdots,X_n)<\theta<\hat{\theta}_2(X_1,X_2,\cdots,X_n) \right\}=1-\alpha} P{ i^1(X1,X2,,Xn)<i<i^2(X1,X2,,Xn)}=1aRandom interval ( θ ^ 1 , θ ^ 2 ) \left(\hat{\theta}_1,\hat{\theta}_2\right)(i^1,i^2) is the parameterθ \thetaThe confidence level of θ is1 − α 1-\alpha1Two-sided confidence intervalfor α .
θ ^ 1 \hat{\theta}_1i^1: lower confidence limit
θ ^ 2 \hat{\theta}_2i^2: Confidence upper limit
1 − α 1-\alpha1α :Confidence
α \alphaα : range( θ^1, θ^2) \left(\hat{\theta}_1,\hat{\theta}_2\right)(i^1,i^2) does notcontainθ \thetaProbability of θ (generally small)

At confidence level 1 − α 1-\alpha1When α is given, the length of the confidence intervalE ( θ ^ 2 − θ ^ 1 ) E\left(\hat{\theta}_2-\hat{\theta}_1\right)E(i^2i^1) the smaller the better.

Find the unknown parameter θ \thetaThe specific practice of the two-sided confidence interval of θ :

(1) Find the pivot amount Z = Z ( X 1 , X 2 , ⋯ , X n , θ ) Z=Z\left(X_1,X_2,\cdots,X_n,\theta\right)Z=Z(X1,X2,,Xn,θ),我们需要知道 Z Z Z的分布,并且此分布不依赖于任何未知参数,也不依赖于 θ \theta θ
(2) 对于给定的置信度 1 − α 1-\alpha 1α,求出两个常数 k 1 , k 2 k_1,k_2 k1,k2使得 P { k 1 < Z < k 2 } = 1 − α P\{k_1<Z<k_2\}=1-\alpha P{ k1<Z<k2}=1α
(3) k 1 < Z < k 2 ⟶ 改写 θ ^ 1 < θ < θ ^ 2 k_1<Z<k_2\overset{\text{改写}}{\Large{\longrightarrow}}\hat{\theta}_1<\theta<\hat{\theta}_2 k1<Z<k2改写θ^1<θ<θ^2 ( θ ^ 1 , θ ^ 2 ) \left(\hat{\theta}_1,\hat{\theta}_2\right) (θ^1,θ^2)是置信度为 1 − α 1-\alpha 1α的置信区间。
(4) 根据样本值计算 θ ^ 1 , θ ^ 2 \hat{\theta}_1,\hat{\theta}_2 θ^1,θ^2的具体值。

X   ~   N ( μ , σ 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu,\sigma^2) X~N(μ,σ2) σ 2 \sigma^2 σ2已知, μ \mu μ未知,求参数 μ \mu μ的置信度为 1 − α 1-\alpha 1α的置信区间。
:取枢轴量 U = X ‾ − μ σ / n   ~   N ( 0 , 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}U=\cfrac{\overline{X}-\mu}{\sigma/\sqrt{n}}\td N(0,1) U=σ/n Xμ~N(0,1 ) , it can be seen thatN ( 0 , 1 ) N(0,1)N(0,1 ) Does not depend on any parameters.
Now to findk 1 , k 2 k_1,k_2k1,k2使得 P { k 1 < U < k 2 } = 1 − α P\left\{k_1<U<k_2\right\}=1-\alpha P{ k1<U<k2}=1α , generally takek 1 = − u α / 2 k_1=-u_{\alpha/2}k1=ua / 2k 2 = u α / 2 k_2=u_{\alpha/2}k2=ua / 2。 notice− u α / 2 = u 1 − α / 2 -u_{\alpha/2}=u_{1-\alpha/2}ua / 2=u1 a / 2Let P { k 1 < U < k 2 } = 1 − P { U ≥ k 2 } − P { U ≤ k 1 } = 1 − α 2 − ( 1 − P { U > k 1 } ) = − α 2 − [ 1 − ( 1 − α 2 ) ] = α \begin{aligned} P\{k_1<U<k_2\}&=1-P\{U\ge k_2\}-P\{U\ and k_1\}\\ &=1-\frac{\alpha}{2}-\left(1-P\left\{U>k_1\right\}\right)\\ &=1-\frac{\ alpha}{2}-\left[1-\left(1-\frac{\alpha}{2}\right)\right]\\ &=\alpha\end{aligned}P{ k1<U<k2}=1P{ Uk2}P{ Uk1}=12a(1P{ U>k1})=12a[1(12a)]=a
schematic diagram

If P { − u α / 2 < X ‾ − μ σ / n < u α / 2 } = 1 − α P\left\{-u_{\alpha/2}<\cfrac{\overline{X}}\ u}{\sigma/\sqrt{n}}<u_{\alpha/2}\right\}=1-\alphaP{ ua / 2<s /n Xm<ua / 2}=1α , then P { − n σ u α / 2 < X ‾ − μ < n σ u α / 2 } = 1 − α P { X ‾ − n σ u α / 2 < μ < X ‾ + n σ u α / 2 } = 1 − α \begin{aligned} P\left\{-\frac{\sqrt{n}}{\sigma}u_{\alpha/2}<\overline{X}}\mu<\frac {\sqrt{n}}{\sigma}u_{\alpha/2}\right\}&=1-\alpha\\ P\left\{\overline{X}}\frac{\sqrt{n}} {\sigma}u_{\alpha/2}<\mu<\overline{X}+\frac{\sqrt{n}}{\sigma}u_{\alpha/2}\right\}&=1-\ alpha \end{aligned}P{ pn ua / 2<Xm<pn ua / 2}P{ Xpn ua / 2<m<X+pn ua / 2}=1α=1α于是得 μ \mu μ的置信度为 1 − α 1-\alpha 1α的置信区间为 ( X ‾ − n σ u α / 2 , X ‾ + n σ u α / 2 ) \left(\overline{X}-\frac{\sqrt{n}}{\sigma}u_{\alpha/2},\overline{X}+\frac{\sqrt{n}}{\sigma}u_{\alpha/2}\right) (Xσn uα/2,X+σn uα/2)

其实,选取枢轴量的过程就是从 X X X的分布中剔除参数 θ \theta θ的影响的过程。 X X X的分布受 θ \theta θ影响,我们就需要消除这种影响,所以我们提出统计量 Z Z Z,它的分布是完全确定的,只有这样我们才能确定参数 k 1 , k 2 k_1,k_2 k1,k2。如果 X X X的分布不是确定的,那么我们很难求出置信区间。

2. One-sided interval estimation

P { θ ‾ ( X 1 , X 2 , ⋯   , X n ) < θ } = 1 − α    ⟹    ( θ ‾ , + ∞ ) P\left\{\underline{\theta}(X_1,X_2,\cdots,X_n)<\theta\right\}=1-\alpha\implies\left(\underline{\theta},+\infty\right) P{ i(X1,X2,,Xn)<i }=1a(i,+ ) currentθ \thetaThe confidence level of θ is1 − α 1-\alpha1One-sided confidence intervalfor α ,θ ‾ \underline{\theta}iLet P { θ < θ ‾ ( X 1 , X 2 , ⋯ , X n ) } = 1 − α ⟹ ( − ∞ , θ ‾ ) P\left\{\theta<\overline{\theta } (
X_1, X_2, \cdots, X_n)\right\}=1-\alpha\implies\left(-\infty,\overline{\theta}\right);P{ i<i(X1,X2,,Xn)}=1a(,i)θ \thetaThe confidence level of θ is1 − α 1-\alpha1One-sided confidence intervalfor α ,θ ‾ \overline{\theta}iis the upper bound of confidence .

即: ( θ ‾ , + ∞ ) \left(\underline{\theta},+\infty\right) (i,+ ) containsθ \thetaThe probability of θ is1 − α 1-\alpha1α ( − ∞ , θ ‾ ) \left(-\infty,\overline{\theta}\right) (,i) containsθ \thetaThe probability of θ is1 − α 1-\alpha1a .

At confidence level 1 − α 1-\alpha1When α is given, the larger the confidence lower bound is, the better, and the smaller the confidence upper bound is, the better.

4. Interval Estimation of Normal Population Parameters

For the case of a single population, we set X ~ N ( μ , σ 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu,\sigma^2)X~N ( μ ,p2 ); for the case of two populations, we assumeX ~ N ( μ 1 , σ 1 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu_1 ,\sigma_1^2)X~N ( m1,p12)Y ~ N ( μ 2 , σ 2 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}Y\td N(\mu_2,\sigma_2^2)Y~N ( m2,p22) X X X has a sample size ofnnn , the sample variance isSX 2 S_X^2SX2YYThe sample size of Y ismmm , the sample variance isSY 2 S_Y^2SY2

The following is the case of a single population ( X ~ N ( μ , σ 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu,\sigma^2)X~N ( μ ,p2))。

σ 2 is known, examine μ \color{dodgerblue}\sigma^2\text{known, examine}\mup2 known, examineμ

枢長量U = n ( X ‾ − μ ) σ ~ N ( 0 , 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}U=\cfrac{\sqrt{n} \left(\overline{X}}\mu\right)}{\sigma}\td N(0,1)U=pn (Xm ).~N(0,1)

注意 P { − u α / 2 < U < u α / 2 } = 1 − α P\left\{-u_{\alpha/2}<U<u_{\alpha/2}\right\}=1-\alpha P{ ua / 2<U<ua / 2}=1α
P { U < u α } = 1 − α P\left\{U<u_\alpha\right\}=1-\alpha P{ U<ua}=1α
P { U > − u α } = 1 − α P\left\{U>-u_\right\}=1-rightP{ U>ua}=1a

σ 2 unknown, examine μ \color{dodgerblue}\sigma^2\text{unknown, examine}\mup2 unknown, investigateμ

枢轴量 T = n ( X ‾ − μ ) S   ~   t ( n − 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}T=\cfrac{\sqrt{n}\left(\overline{X}-\mu\right)}{S}\td t(n-1) T=Sn (Xm ).~t(n1)

注意 P { − t α / 2 < T < t α / 2 } = 1 − α P\left\{-t_{\alpha/2}<T<t_{\alpha/2}\right\}=1-\alpha P{ ta / 2<T<ta / 2}=1α
P { T < t α } = 1 − α P\left\{T<t_\alpha\right\}=1-\alpha P{ T<ta}=1α
P { T > − t α } = 1 − α P\left\{T>-t_\right\}=1-\P{ T>ta}=1a

μ known, examine σ 2 \color{dodgerblue}\mu\text{known, examine}\sigma^2μ is known, examine σ2

Definition χ 2 = ∑ i = 1 n ( X i − µ ) 2 σ 2 ~ χ 2 ( n ) =\cfrac{\sum\limits_{i=1}^n{\left(X_i-\mu\right)}^2}{\sigma^2}\td\chi^2(n)h2=p2i=1n(Xim )2~h2(n)

μ unknown, examine σ 2 \color{dodgerblue}\mu\text{unknown, examine}\sigma^2μ unknown, examine σ2

Definition χ 2 = ∑ i = 1 n ( X i − X ‾ ) 2 σ 2 = ( n − 1 ) S 2 σ 2 ~ χ 2 ( n − 1 ) \newcommand{\td}{\,\text {\large\texttitle}\,}\chi^2=\cfrac{\sum\limits_{i=1}^n{\left(X_i-\overline{X}\right)}^2}{\sigma^ 2}=\cfrac{(n-1)S^2}{\sigma^2}\td\chi^2(n-1)h2=p2i=1n(XiX)2=p2(n1)S2~h2(n1 )
Chi-square distribution diagram
ProportionP { χ 2 > χ α / 2 2 ( n − 1 ) } = α 2 P\left\{\chi^2>\chi^2_{\alpha/2}(n-1)\right\ }=\frac{\alpha}{2}P{ x2>ha /22(n1)}=2a, P { χ 2 > χ 1 − α / 2 2 ( n − 1 ) } = 1 − α 2 P\left\{\chi^2>\chi^2_{1-\alpha/2}(n-1 )\right\}=1-\frac{\alpha}{2}P{ x2>h1 a /22(n1)}=12a,故P { χ 1 − α / 2 2 ( n − 1 ) < χ 2 < χ α / 2 2 ( n − 1 ) } = 1 − α P\{\chi^2_{1-\alpha/2} (n-1)<\chi^2<\chi^2_{\alpha/2}(n-1)\}=1-\alphaP { x1 a /22(n1)<h2<ha /22(n1)}=1α
P { χ 2 < χ α 2 ( n − 1 ) } = 1 − α P\left\{\chi^2<\chi^2_{\alpha}(n-1)\right\}=1- \alphaP{ x2<ha2(n1)}=1α
P { x 2 > x 1 − α 2 ( n − 1 ) } = 1 − α P\left\{\chi^2>\chi^2_{1-\alpha}(n-1)\right\} =1-\alphaP{ x2>h1 a2(n1)}=1a

The following is the case of two populations ( X ~ N ( μ 1 , σ 1 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}X\td N(\mu_1,\sigma_1^ 2)X~N ( m1,p12)Y ~ N ( μ 2 , σ 2 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}Y\td N(\mu_2,\sigma_2^2)Y~N ( m2,p22))。

σ 1 2 , σ 2 2 known,inspected μ 1 − μ 2 \color{dodgerblue}\sigma_1^2,\sigma_2^2\text{ known,inspected}\mu_1-\mu_2p12,p22Known, examine μ1m2

枢长量U = ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) σ 1 2 n + σ 2 2 m ~ N ( 0 , 1 ) \newcommand{\td}{\,\text{\large \textasciitilde}\,}U=\cfrac{\left(\overline{X}-\overline{Y}\right)-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{ n}+\frac{\sigma_2^2}{m}}}\td N(0,1)U=np12+mp22 (XY)( m1m2)~N(0,1)

Note that D ( X ‾ − Y ‾ ) = σ 1 2 n + σ 2 2 m D\left(\overline{X}}\overline{Y}\right)=\frac{\sigma_1^2}{n}+ \frac{\sigma_2^2}{m}D(XY)=np12+mp22

σ 1 2 = σ 2 2 unknown,inspection μ 1 − μ 2 \color{dodgerblue}\sigma_1^2=\sigma_2^2\text{unknown,inspection}\mu_1-\mu_2p12=p22Unknown, investigate μ1m2

枢轴量 T = ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) S W 1 n + 1 m   ~   t ( n + m − 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}T=\cfrac{\left(\overline{X}-\overline{Y}\right)-(\mu_1-\mu_2)}{S_W\sqrt{\frac{1}{n}+\frac{1}{m}}}\td t(n+m-2) T=SWn1+m1 (XY)(μ1μ2)~t(n+m2),其中 S W = ( n − 1 ) S X 2 + ( m − 1 ) S Y 2 n + m − 2 S_W=\sqrt{\cfrac{(n-1)S_X^2+(m-1)S_Y^2}{n+m-2}} SW=n+m2(n1)SX2+(m1)SY2

NoteU = ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) σ 1 n + 1 m ~ N ( 0 , 1 ) \newcommand{\td}{\,\text{\large\textasciitilde}\, }U=\frac{\left(\overline{X}}\overline{Y}\right)-(\mu_1-\mu_2)}{\sigma\sqrt{\frac{1}{n}+\frac{ 1}{m}}}\td N(0,1)U=pn1+m1 (XY)( m1m2)~N(0,1) V = ( n − 1 ) S X 2 + ( m − 1 ) S Y 2 σ 2   ~   χ 2 ( n + m − 2 ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}V=\frac{(n-1)S_X^2+(m-1)S_Y^2}{\sigma^2}\td\chi^2(n+m-2) V=p2(n1)SX2+(m1)SY2~h2(n+m2)

μ 1 , μ 2 已知,考察 σ 1 2 σ 2 2 \color{dodgerblue}\mu_1,\mu_2\text{已知,考察}\frac{\sigma_1^2}{\sigma_2^2} μ1,μ2已知,考察σ22σ12

枢轴量 F = ∑ i = 1 n ( X i − μ 1 ) 2 σ 1 2 / n ∑ j = 1 m ( Y j − μ 2 ) 2 σ 2 2 / m = σ 2 2 σ 1 2 m ∑ i = 1 n ( X i − μ 1 ) 2 n ∑ j = 1 m ( Y j − μ 2 ) 2   ~   F ( n , m ) \newcommand{\td}{\,\text{\large\textasciitilde}\,}F=\cfrac{\left.\sum\limits_{i=1}^n\cfrac{ {(X_i-\mu_1)}^2}{\sigma_1^2}\right/n}{\left.\sum\limits_{j=1}^m\cfrac{ {(Y_j-\mu_2)}^2}{\sigma_2^2}\right/m}=\cfrac{\sigma_2^2}{\sigma_1^2}\cfrac{m\sum\limits_{i=1}^n{(X_i-\mu_1)}^2}{n\sum\limits_{j=1}^m{(Y_j-\mu_2)}^2}\td F(n,m) F=j=1mσ22(Yjm2)2/mi=1np12(Xim1)2/n=p12p22nj=1m(Yjm2)2mi=1n(Xim1)2~F(n,m)

µ 1 , μ 2 Ratio,contrast σ 1 2 σ 2 2 \color{dodgerblue}\mu_1,\mu_2\text{contrast,contrast}\frac{\sigma_1^2}{\sigma_2^2}m1,m2unknown, investigationp22p12

Definition F = σ 2 2 σ 1 2 SX 2 SY 2 ~ F ( n − 1 , m − 1 ) \newcommand{\td}{\,\text{\large\texttitle}\,}F=\cfrac {\sigma_2^2}{\sigma_1^2}\cfrac{S_X^2}{S_Y^2}\td F(n-1,m-1)F=p12p22SY2SX2~F(n1,m1)

注意 P { F > F α / 2 ( n − 1 , m − 1 ) } = α 2 P\left\{F>F_{\alpha/2}(n-1,m-1)\right\}=\frac{\alpha}{2} P{ F>Fα/2(n1,m1)}=2α P { F > F 1 − α / 2 ( n − 1 , m − 1 ) } = 1 − α 2 P\left\{F>F_{1-\alpha/2}(n-1,m-1)\right\}=1-\frac{\alpha}{2} P{ F>F1α/2(n1,m1)}=12α,故 P { F 1 − α / 2 ( n − 1 , m − 1 ) < F < F α / 2 ( n − 1 , m − 1 ) } = 1 − α P\{F_{1-\alpha/2}(n-1,m-1)<F<F_{\alpha/2}(n-1,m-1)\}=1-\alpha P{ F1α/2(n1,m1)<F<Fα/2(n1,m1)}=1α
P { F < F α ( n − 1 , m − 1 ) } = 1 − α P\left\{F<F_{\alpha}(n-1,m-1)\right\}=1-\alpha P{ F<Fa(n1,m1)}=1α
P { F > F 1 − α ( n − 1 , m − 1 ) } = 1 − α P\left\{F>F_{1-\alpha}(n-1,m-1)\right\}=1-\alpha P{ F>F1 - a(n1,m1)}=1a


t t t distribution and standard normal distributionN ( 0 , 1 ) N(0,1)N(0,1 ) Similarly, probability density curves are all aboutx=0 x=0x=0 symmetric,u 1 − α = − u α u_{1-\alpha}=-u_\alphau1 - a=ua t 1 − α ( n ) = − t α ( n ) t_{1-\alpha}(n)=-t_\alpha(n) t1 - a(n)=ta(n)
F F F distribution andχ 2 \chi^2h2 distribution is similar, the probability density is only atx > 0 x>0x>0 is positive.

Regardless of XXAny distribution X obeys satisfies P { q 1 − α / 2 < X < q α / 2 } = 1 − α P\{q_{1-\alpha/2}<X<q_{\alpha/2}\ }=1-\alphaP{ q1 a / 2<X<qa / 2}=1α , whereq α q_{\alpha}qameans XXThe upper side of the distribution X obeys α \alphaalpha quantile.

Regarding the degree of freedom, it can be considered as follows: If the overall mean μ \muμ is known, then the degree of freedom is the sample size; if the population meanμ \muμ is unknown, and with the sample meanX ‾ \overline{X}XInstead, one degree of freedom is lost. For the tt used when testing the sample mean differencet distribution, its degrees of freedom are the sum of the degrees of freedom of the two.

Guess you like

Origin blog.csdn.net/qaqwqaqwq/article/details/128466245