Chapter 5 (Limit Theorems): The Central Limit Theorem (中心极限定理)

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

The Central Limit Theorem

  • Let X 1 , X 2 , . . . X_1,X_2, . . . X1,X2,... be a sequence of independent identically distributed random variables with mean μ μ μ and variance σ 2 \sigma^2 σ2. We define
    Z n = S n − n μ σ n = X 1 + . . . + X n − n μ σ n Z_n=\frac{S_n-n\mu}{\sigma\sqrt n}=\frac{X_1+...+X_n-n\mu}{\sigma\sqrt n} Zn=σn Snnμ=σn X1+...+XnnμAn easy calculation yields
    E [ Z n ] = 0       v a r ( Z n ) = 1 E[Z_n]=0\ \ \ \ \ var(Z_n)=1 E[Zn]=0     var(Zn)=1

The Central Limit Theorem

  • The CDF of Z n Z_n Zn converges to the standard normal CDF
    Φ ( z ) = 1 2 π ∫ − ∞ z e − x 2 / 2 d x \Phi(z)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^ze^{-x^2/2}dx Φ(z)=2π 1zex2/2dxin the sense that
    lim ⁡ n → ∞ P ( Z n ≤ z ) = Φ ( z ) ,          f o r   e v e r y   z \lim_{n\rightarrow\infty}P(Z_n\leq z)=\Phi(z),\ \ \ \ \ \ \ \ for\ every\ z nlimP(Znz)=Φ(z),        for every z

  • The central limit theorem is surprisingly general. Besides independence, and the implicit assumption that the mean and variance are finite, it places no other requirement on the distribution of the X i X_i Xi, which could be discrete, continuous, or mixed.
  • This theorem is of tremendous importance.
    • On the conceptual side, it indicates that the sum of a large number of independent random variables is approximately normal. As such, it applies to many situations in which a random effect is the sum of a large number of small but independent random factors. Noise in many natural or engineered systems has this property. In a wide array of contexts, it has been found empirically that the statistics of noise are well-described by normal distributions, and the central limit theorem provides a convincing explanation for this phenomenon.
    • On the practical side, the central limit theorem eliminates the need for detailed probabilistic models, and for tedious manipulations of PMFs and PDFs. Rather, it allows the calculation of certain probabilities by simply referring to the normal CDF table. Furthermore, these calculations only require the knowledge of means and variances.

Problem 12. Proof of the central limit theorem.
Let X 1 , X 2 , . . . X_1, X_2, .. . X1,X2,... be a sequence of independent identically distributed zero-mean random variables with common variance σ 2 \sigma^2 σ2 and associated transform M X ( s ) M_X(s) MX(s). We assume that M X ( s ) M_X(s) MX(s) is finite when − d < s < d -d < s < d d<s<d, where d d d is some positive number. Let
Z n = X 1 + . . . + X n σ n Z_n=\frac{X_1+...+X_n}{\sigma\sqrt n} Zn=σn X1+...+Xn

  • (a) Show that the transform associated with Z n Z_n Zn satisfies
    M Z n ( s ) = ( M X ( s σ n ) ) n M_{Z_n}(s)=(M_X(\frac{s}{\sigma\sqrt n}))^n MZn(s)=(MX(σn s))n
  • ( b ) (b) (b) Suppose that the transform M X ( s ) M_X(s) MX(s) has a second order Taylor series expansion around s = 0 s = 0 s=0, of the form
    M X ( s ) = a + b s + c s 2 + o ( s 2 ) M_X(s)=a+bs+cs^2+o(s^2) MX(s)=a+bs+cs2+o(s2)where o ( s 2 ) o(s^2 ) o(s2) is a function that satisfies lim ⁡ s → 0 o ( s 2 ) / s 2 = 0 \lim_{s\rightarrow0} o(s^2)/ s^2 = 0 lims0o(s2)/s2=0. Find a , b a, b a,b, and c c c in terms of σ 2 \sigma^2 σ2.
  • ( c ) (c) (c) Combine the results of parts ( a ) (a) (a) and ( b ) (b) (b) to show that the transform M Z n ( s ) M_{Z_n} (s) MZn(s) converges to the transform associated with a standard normal random variable, that is,
    lim ⁡ n → ∞ M Z n ( s ) = e s 2 / 2 ,         f o r   a l l   s \lim_{n\rightarrow\infty}M_{Z_n}(s)=e^{s^2/2},\ \ \ \ \ \ \ for\ all\ s nlimMZn(s)=es2/2,       for all s[Note: The central limit theorem follows from the result of part ( c ) (c) (c), together with the fact (whose proof lies beyond the scope of this text) that if the transforms M Z n ( s ) M_{Z_n}(s) MZn(s) converge to the transform M Z ( s ) M_Z(s) MZ(s) of a random variable Z Z Z whose CDF is continuous, then the CDFs F Z n F_{Z_n} FZn converge to the CDF of Z Z Z. In our case, this implies that the CDF of Z n Z_n Zn converges to the CDF of a standard normal.]

SOLUTION

  • ( a ) (a) (a)
  • ( b ) (b) (b)
    a = M X ( 0 ) = 1 b = d d s M X ( s ) ∣ s = 0 = E [ X ] = 0 c = 1 2 ⋅ d 2 d s 2 M X ( s ) ∣ s = 0 = E [ X 2 ] 2 = σ 2 2 a=M_X(0)=1\\b=\frac{d}{ds}M_X(s)\bigg|_{s=0}=E[X]=0\\ c=\frac{1}{2}\cdot\frac{d^2}{ds^2}M_X(s)\bigg|_{s=0}=\frac{E[X^2]}{2}=\frac{\sigma^2}{2} a=MX(0)=1b=dsdMX(s)s=0=E[X]=0c=21ds2d2MX(s)s=0=2E[X2]=2σ2
  • ( c ) (c) (c)
    M Z n ( s ) = ( M X ( s σ n ) ) n = ( a + b s σ n + c s 2 σ 2 n + o ( s 2 σ 2 n ) ) n = ( 1 + s 2 2 n + o ( s 2 σ 2 n ) ) n \begin{aligned}M_{Z_n}(s)&=(M_X(\frac{s}{\sigma\sqrt n}))^n=(a+b\frac{s}{\sigma\sqrt n}+c\frac{s^2}{\sigma^2 n}+o(\frac{s^2}{\sigma^2 n}))^n \\&=(1+\frac{s^2}{2n}+o(\frac{s^2}{\sigma^2 n}))^n\end{aligned} MZn(s)=(MX(σn s))n=(a+bσn s+cσ2ns2+o(σ2ns2))n=(1+2ns2+o(σ2ns2))nWe now take the limit as n → ∞ n\rightarrow\infty n, and use the identity
    lim ⁡ n → ∞ ( 1 + c n ) n = e c \lim_{n\rightarrow\infty}(1+\frac{c}{n})^n=e^c nlim(1+nc)n=ecto obtain
    lim ⁡ n → ∞ M Z n ( s ) = e s 2 / 2 \lim_{n\rightarrow\infty}M_{Z_n}(s)=e^{s^2/2} nlimMZn(s)=es2/2

Approximations Based on the Central Limit Theorem

  • The central limit theorem allows us to calculate probabilities related to Z n Z_n Zn as if Z n Z_n Zn were normal. Since normality is preserved under linear transformations, this is equivalent to treating S n S_n Sn as a normal random variable with mean n μ nμ nμ and variance n σ 2 n\sigma^2 nσ2 .
    P ( S n ≤ c ) = Φ ( c − n μ n σ ) P(S_n\leq c)=\Phi(\frac{c-n\mu}{\sqrt n\sigma}) P(Snc)=Φ(n σcnμ)where Φ ( z ) \Phi(z) Φ(z) is available from standard normal CDF tables.
  • The normal approximation is increasingly accurate as n n n tends to infinity, but in practice we are generally faced with specific and finite values of n n n. It would be useful to know how large n n n should be before the approximation can be trusted, but there are no simple and general guidelines. Much depends on whether the distribution of the X i X_i Xi is close to normal and, in particular, whether it is symmetric.
    • For example, if the X i X_i Xi are uniform, then S 8 S_8 S8 is already very close to normal. But if the X i X_i Xi are, say, exponential, a significantly larger n n n will be needed before the distribution of S n S_n Sn is close to a normal one. Furthermore, the normal approximation to P ( S n ≤ c ) P(S_n\leq c) P(Snc) tends to be more faithful when c c c is in the vicinity of the mean of S n S_n Sn.

Example 5.11. Polling.

  • We poll n n n voters and record the fraction M n M_n Mn of those polled who are in favor of a particular candidate. If p p p is the fraction of the entire voter population that supports this candidate, then
    M n = X 1 + . . . + X n n M_n=\frac{X_1+...+X_n}{n} Mn=nX1+...+Xnwhere the X i X_i Xi are independent Bernoulli random variables with parameter p p p. In particular, M n M_n Mn has mean p p p and variance p ( 1 − p ) / n p(1 - p) / n p(1p)/n. By the normal approximation, X 1 + ⋅ ⋅ ⋅ + X n X_1 + · · · + X_n X1++Xn is approximately normal, and therefore M n M_n Mn is also approximately normal.
  • We are interested in the probability P ( ∣ M n − p ∣ ≥ ϵ ) P(|M_n -p|\geq\epsilon) P(Mnpϵ) that the polling error is larger than some desired accuracy ϵ \epsilon ϵ. Because of the symmetry of the normal PDF around the mean, we have
    P ( ∣ M n − p ∣ ≥ ϵ ) ≈ 2 P ( M n − p ≥ ϵ ) P(|M_n -p|\geq\epsilon)\approx2P(M_n-p\geq\epsilon) P(Mnpϵ)2P(Mnpϵ)The variance p ( 1 − p ) / n p(1 - p)/n p(1p)/n of M n − p M_n - p Mnp depends on p p p and is therefore unknown. We note that the probability of a large deviation from the mean increases with the variance. Thus, we can obtain an upper bound on P ( M n − p ≥ ϵ ) P(M_n - p\geq \epsilon) P(Mnpϵ) by assuming that M n − p M_n - p Mnp has the largest possible variance, namely, 1 / ( 4 n ) 1 / (4n) 1/(4n) which corresponds to p = 1 / 2 p = 1 /2 p=1/2. To calculate this upper bound, we evaluate the standardized value
    z = ϵ 1 / ( 2 n ) z=\frac{\epsilon}{1/(2\sqrt n)} z=1/(2n )ϵand use the normal approxiamation
    P ( M n − p ≥ ϵ ) ≤ 1 − Φ ( z ) = 1 − Φ ( 2 ϵ n ) P(M_n-p\geq\epsilon)\leq1-\Phi(z)=1-\Phi(2\epsilon\sqrt n) P(Mnpϵ)1Φ(z)=1Φ(2ϵn )
  • For instance, consider how large a sample size n n n is needed if we wish our estimate M n M_n Mn to be within 0.01 0.01 0.01 of p p p with probability at least 0.95 0.95 0.95? Assuming again the worst possible variance, we are led to the condition
    P ( ∣ M n − p ∣ ≥ ϵ ) ≈ 2 P ( M n − p ≥ ϵ ) ≤ 2 − 2 Φ ( 2 ϵ n ) = 2 − 2 Φ ( 2 ⋅ 0.01 ⋅ n ) ≤ 0.05 P(|M_n -p|\geq\epsilon)\approx2P(M_n-p\geq\epsilon)\leq2-2\Phi(2\epsilon\sqrt n) \\=2-2\Phi(2\cdot0.01\cdot\sqrt n)\leq0.05 P(Mnpϵ)2P(Mnpϵ)22Φ(2ϵn )=22Φ(20.01n )0.05or Φ ( 2 ⋅ 0.01 ⋅ n ) ≥ 0.975 \Phi(2\cdot0.01\cdot\sqrt n)\geq0.975 Φ(20.01n )0.975From the normal tables, we see that Φ ( 1.96 ) = 0.975 \Phi(1.96) = 0.975 Φ(1.96)=0.975, which leads to
    2 ⋅ 0.01 ⋅ n ≥ 1.96 ∴ n ≥ 9604 2\cdot0.01\cdot\sqrt n\geq1.96 \\\therefore n\geq9604 20.01n 1.96n9604This is significantly better than the sample size of 50,000 that we found using Chebyshev’s inequality (Example 5.5.).

De Moivre-Laplace Approximation to the Binomial

二项分布的棣(dì)莫弗-拉普拉斯近似 (这种近似方法不仅可以用于二项分布,也可以用于其他只取整数值的离散随机变量)

  • A binomial random variable S n S_n Sn with parameters n n n and p p p can be viewed as the sum of n n n independent Bernoulli random variables X 1 , . . . . X n X_1, .... X_n X1,....Xn, with common parameter p p p:
    S n = X 1 + . . . + X n μ = E [ X i ] = p              σ = p ( 1 − p ) S_n=X_1+...+X_n \\\mu=E[X_i]=p\ \ \ \ \ \ \ \ \ \ \ \ \sigma=\sqrt{p(1-p)} Sn=X1+...+Xnμ=E[Xi]=p            σ=p(1p)
  • According to the approximation suggested by the central limit theorem,
    P ( k ≤ S n ≤ l ) ≈ Φ ( l − n p n p ( 1 − p ) ) − Φ ( k − n p n p ( 1 − p ) ) P(k\leq S_n\leq l)\approx\Phi(\frac{l-np}{\sqrt{np(1-p)}})-\Phi(\frac{k-np}{\sqrt{np(1-p)}}) P(kSnl)Φ(np(1p) lnp)Φ(np(1p) knp)where k k k and l l l are given integers.
  • An approximation of this form is equivalent to treating S n S_n Sn as a normal random variable with mean n p np np and variance n p ( 1 − p ) np(1 -p) np(1p). Figure 5.3 provides an illustration and indicates that a more accurate approximation may be possible if we replace k k k and l l l by k − 1 / 2 k -1/2 k1/2 and l + 1 / 2 l +1/2 l+1/2, respectively.
    在这里插入图片描述在这里插入图片描述

  • When p p p is close to 1 / 2 1/2 1/2, in which case the PMF of the X i X_i Xi is symmetric, the above formula yields a very good approximation for n n n as low as 40 or 50.
  • When p p p is near 1 or near 0. the quality of the approximation drops. and a larger value of n n n is needed to maintain the same accuracy.

Example 5.12.

  • Let S n S_n Sn be a binomial random variable with parameters n = 36 n = 36 n=36 and p = 0.5 p = 0.5 p=0.5. An exact calculation yields
    P ( S n ≤ 21 ) = ∑ k = 0 21 ( 36 k ) ( 0.5 ) 36 = 0.8785 P(S_n\leq21)=\sum_{k=0}^{21}\begin{pmatrix}36\\k\end{pmatrix}(0.5)^{36}=0.8785 P(Sn21)=k=021(36k)(0.5)36=0.8785
  • The central limit theorem approximation. without the above discussed refinement, yields
    P ( S n ≤ 21 ) ≈ Φ ( 21 − n p n p ( 1 − p ) ) = Φ ( 1 ) = 0.8413 P(S_n\leq21)\approx\Phi(\frac{21-np}{\sqrt{np(1-p)}})=\Phi(1)=0.8413 P(Sn21)Φ(np(1p) 21np)=Φ(1)=0.8413
  • Using the proposed refinement, we have
    P ( S n ≤ 21 ) ≈ Φ ( 21.5 − n p n p ( 1 − p ) ) = Φ ( 1.17 ) = 0.879 P(S_n\leq21)\approx\Phi(\frac{21.5-np}{\sqrt{np(1-p)}})=\Phi(1.17)=0.879 P(Sn21)Φ(np(1p) 21.5np)=Φ(1.17)=0.879which is much closer to the exact value.
  • The de Moivre-Laplace formula also allows us to approximate the probability of a single value. For example
    P ( S n = 19 ) ≈ Φ ( 19.5 − n p n p ( 1 − p ) ) − Φ ( 18.5 − n p n p ( 1 − p ) ) = 0.6915 − 0.5675 = 0.124 P(S_n=19)\approx\Phi(\frac{19.5-np}{\sqrt{np(1-p)}})-\Phi(\frac{18.5-np}{\sqrt{np(1-p)}})=0.6915-0.5675=0.124 P(Sn=19)Φ(np(1p) 19.5np)Φ(np(1p) 18.5np)=0.69150.5675=0.124This is very close to the exact value which is
    ( 36 19 ) ( 0.5 ) 36 = 0.1251 \begin{pmatrix}36\\19\end{pmatrix}(0.5)^{36}=0.1251 (3619)(0.5)36=0.1251

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113988319