本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

The Central Limit Theorem

Let $X_1,X_2, . . .$ be a sequence of independent identically distributed random variables with mean $μ$ and variance $\sigma^2$ . We define
$Z_n=\frac{S_n-n\mu}{\sigma\sqrt n}=\frac{X_1+...+X_n-n\mu}{\sigma\sqrt n}$ An easy calculation yields
$E[Z_n]=0\ \ \ \ \ var(Z_n)=1$

The Central Limit Theorem

The CDF of $Z_n$ converges to the standard normal CDF
$\Phi(z)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^ze^{-x^2/2}dx$ in the sense that
$\lim_{n\rightarrow\infty}P(Z_n\leq z)=\Phi(z),\ \ \ \ \ \ \ \ for\ every\ z$

The central limit theorem is surprisingly general. Besides independence, and the implicit assumption that the mean and variance are finite, it places no other requirement on the distribution of the $X_i$ , which could be discrete, continuous, or mixed.
This theorem is of tremendous importance.
- On the conceptual side, it indicates that the sum of a large number of independent random variables is approximately normal. As such, it applies to many situations in which a random effect is the sum of a large number of small but independent random factors. Noise in many natural or engineered systems has this property. In a wide array of contexts, it has been found empirically that the statistics of noise are well-described by normal distributions, and the central limit theorem provides a convincing explanation for this phenomenon.
- On the practical side, the central limit theorem eliminates the need for detailed probabilistic models, and for tedious manipulations of PMFs and PDFs. Rather, it allows the calculation of certain probabilities by simply referring to the normal CDF table. Furthermore, these calculations only require the knowledge of means and variances.

Problem 12. Proof of the central limit theorem.
Let $X_1, X_2, .. .$ be a sequence of independent identically distributed zero-mean random variables with common variance $\sigma^2$ and associated transform $M_X(s)$ . We assume that $M_X(s)$ is finite when $- d < s < d$ , where $d$ is some positive number. Let
$Z_n=\frac{X_1+...+X_n}{\sigma\sqrt n}$

(a) Show that the transform associated with $Z_n$ satisfies
$M_{Z_n}(s)=(M_X(\frac{s}{\sigma\sqrt n}))^n$
$(b)$ Suppose that the transform $M_X(s)$ has a second order Taylor series expansion around $s = 0$ , of the form
$M_X(s)=a+bs+cs^2+o(s^2)$ where $o(s^2 )$ is a function that satisfies $\lim_{s\rightarrow0} o(s^2)/ s^2 = 0$ . Find $a, b$ , and $c$ in terms of $\sigma^2$ .
$(c)$ Combine the results of parts $(a)$ and $(b)$ to show that the transform $M_{Z_n} (s)$ converges to the transform associated with a standard normal random variable, that is,
$\lim_{n\rightarrow\infty}M_{Z_n}(s)=e^{s^2/2},\ \ \ \ \ \ \ for\ all\ s$ [Note: The central limit theorem follows from the result of part $(c)$ , together with the fact (whose proof lies beyond the scope of this text) that if the transforms $M_{Z_n}(s)$ converge to the transform $M_Z(s)$ of a random variable $Z$ whose CDF is continuous, then the CDFs $F_{Z_n}$ converge to the CDF of $Z$ . In our case, this implies that the CDF of $Z_n$ converges to the CDF of a standard normal.]

SOLUTION

$(a)$ …
$(b)$
$a=M_X(0)=1\\b=\frac{d}{ds}M_X(s)\bigg|_{s=0}=E[X]=0\\ c=\frac{1}{2}\cdot\frac{d^2}{ds^2}M_X(s)\bigg|_{s=0}=\frac{E[X^2]}{2}=\frac{\sigma^2}{2}$
$(c)$
$\begin{aligned}M_{Z_n}(s)&=(M_X(\frac{s}{\sigma\sqrt n}))^n=(a+b\frac{s}{\sigma\sqrt n}+c\frac{s^2}{\sigma^2 n}+o(\frac{s^2}{\sigma^2 n}))^n \\&=(1+\frac{s^2}{2n}+o(\frac{s^2}{\sigma^2 n}))^n\end{aligned}$ We now take the limit as $n\rightarrow\infty$ , and use the identity
$\lim_{n\rightarrow\infty}(1+\frac{c}{n})^n=e^c$ to obtain
$\lim_{n\rightarrow\infty}M_{Z_n}(s)=e^{s^2/2}$

Approximations Based on the Central Limit Theorem

The central limit theorem allows us to calculate probabilities related to $Z_n$ as if $Z_n$ were normal. Since normality is preserved under linear transformations, this is equivalent to treating $S_n$ as a normal random variable with mean $n μ$ and variance $n\sigma^2$ .
$P(S_n\leq c)=\Phi(\frac{c-n\mu}{\sqrt n\sigma})$ where $\Phi(z)$ is available from standard normal CDF tables.
The normal approximation is increasingly accurate as $n$ tends to infinity, but in practice we are generally faced with specific and finite values of $n$ . It would be useful to know how large $n$ should be before the approximation can be trusted, but there are no simple and general guidelines. Much depends on whether the distribution of the $X_i$ is close to normal and, in particular, whether it is symmetric.
- For example, if the $X_i$ are uniform, then $S_8$ is already very close to normal. But if the $X_i$ are, say, exponential, a significantly larger $n$ will be needed before the distribution of $S_n$ is close to a normal one. Furthermore, the normal approximation to $P(S_n\leq c)$ tends to be more faithful when $c$ is in the vicinity of the mean of $S_n$ .

Example 5.11. Polling.

We poll $n$ voters and record the fraction $M_n$ of those polled who are in favor of a particular candidate. If $p$ is the fraction of the entire voter population that supports this candidate, then
$M_n=\frac{X_1+...+X_n}{n}$ where the $X_i$ are independent Bernoulli random variables with parameter $p$ . In particular, $M_n$ has mean $p$ and variance $p (1 - p) / n$ . By the normal approximation, $X_1 + · · · + X_n$ is approximately normal, and therefore $M_n$ is also approximately normal.
We are interested in the probability $P(|M_n -p|\geq\epsilon)$ that the polling error is larger than some desired accuracy $\epsilon$ . Because of the symmetry of the normal PDF around the mean, we have
$P(|M_n -p|\geq\epsilon)\approx2P(M_n-p\geq\epsilon)$ The variance $p (1 - p) / n$ of $M_n - p$ depends on $p$ and is therefore unknown. We note that the probability of a large deviation from the mean increases with the variance. Thus, we can obtain an upper bound on $P(M_n - p\geq \epsilon)$ by assuming that $M_n - p$ has the largest possible variance, namely, $1 / (4 n)$ which corresponds to $p = 1 / 2$ . To calculate this upper bound, we evaluate the standardized value
$z=\frac{\epsilon}{1/(2\sqrt n)}$ and use the normal approxiamation
$P(M_n-p\geq\epsilon)\leq1-\Phi(z)=1-\Phi(2\epsilon\sqrt n)$
For instance, consider how large a sample size $n$ is needed if we wish our estimate $M_n$ to be within $0.01$ of $p$ with probability at least $0.95$ ? Assuming again the worst possible variance, we are led to the condition
$P(|M_n -p|\geq\epsilon)\approx2P(M_n-p\geq\epsilon)\leq2-2\Phi(2\epsilon\sqrt n) \\=2-2\Phi(2\cdot0.01\cdot\sqrt n)\leq0.05$ or $\Phi(2\cdot0.01\cdot\sqrt n)\geq0.975$ From the normal tables, we see that $\Phi(1.96) = 0.975$ , which leads to
$2\cdot0.01\cdot\sqrt n\geq1.96 \\\therefore n\geq9604$ This is significantly better than the sample size of 50,000 that we found using Chebyshev’s inequality (Example 5.5.).

De Moivre-Laplace Approximation to the Binomial

二项分布的棣(dì)莫弗-拉普拉斯近似 (这种近似方法不仅可以用于二项分布，也可以用于其他只取整数值的离散随机变量)

A binomial random variable $S_n$ with parameters $n$ and $p$ can be viewed as the sum of $n$ independent Bernoulli random variables $X_1, .... X_n$ , with common parameter $p$ :
$S_n=X_1+...+X_n \\\mu=E[X_i]=p\ \ \ \ \ \ \ \ \ \ \ \ \sigma=\sqrt{p(1-p)}$
According to the approximation suggested by the central limit theorem,
$P(k\leq S_n\leq l)\approx\Phi(\frac{l-np}{\sqrt{np(1-p)}})-\Phi(\frac{k-np}{\sqrt{np(1-p)}})$ where $k$ and $l$ are given integers.
An approximation of this form is equivalent to treating $S_n$ as a normal random variable with mean $n p$ and variance $n p (1 - p)$ . Figure 5.3 provides an illustration and indicates that a more accurate approximation may be possible if we replace $k$ and $l$ by $k - 1 / 2$ and $l + 1 / 2$ , respectively.

When $p$ is close to $1 / 2$ , in which case the PMF of the $X_i$ is symmetric, the above formula yields a very good approximation for $n$ as low as 40 or 50.
When $p$ is near 1 or near 0. the quality of the approximation drops. and a larger value of $n$ is needed to maintain the same accuracy.

Example 5.12.

Let $S_n$ be a binomial random variable with parameters $n = 36$ and $p = 0.5$ . An exact calculation yields
$P(S_n\leq21)=\sum_{k=0}^{21}\begin{pmatrix}36\\k\end{pmatrix}(0.5)^{36}=0.8785$
The central limit theorem approximation. without the above discussed refinement, yields
$P(S_n\leq21)\approx\Phi(\frac{21-np}{\sqrt{np(1-p)}})=\Phi(1)=0.8413$
Using the proposed refinement, we have
$P(S_n\leq21)\approx\Phi(\frac{21.5-np}{\sqrt{np(1-p)}})=\Phi(1.17)=0.879$ which is much closer to the exact value.
The de Moivre-Laplace formula also allows us to approximate the probability of a single value. For example
$P(S_n=19)\approx\Phi(\frac{19.5-np}{\sqrt{np(1-p)}})-\Phi(\frac{18.5-np}{\sqrt{np(1-p)}})=0.6915-0.5675=0.124$ This is very close to the exact value which is
$\begin{pmatrix}36\\19\end{pmatrix}(0.5)^{36}=0.1251$

Chapter 5 (Limit Theorems): The Central Limit Theorem (中心极限定理)

目录

The Central Limit Theorem

Approximations Based on the Central Limit Theorem

De Moivre-Laplace Approximation to the Binomial

猜你喜欢