All of Statistics Chapter 5

Contents of this chapter

  • 5.1 Introduction
  • 5.2 Types of convergence
  • 5.3 The Law of Large Numbers
  • 5.4 The Central Limit Theorem
  • 5.5 Delta method

As for the key nouns, some words may not convey the meaning, so the key nouns are organized as follows

1. The Law of Large Numbers: The Law of Large Numbers

2. The Central Limit Theorem: The Central Limit Theorem

3. large sample theory: large sample theory

4. Limit Theory: limit Theory

5. Asymptotic theory: asymptotic theory

6. Slutzky's theorem: Slutzky's theorem

7. The Weak Law of Large Numbers(WLLN)

8. Multivariate central limit theorem:Multivariate central limit theorem

5.1 Introduction

One of the most interesting aspects of probability theory is the behavior of sequences of random variables. This part of probability theory is called large sample theory or limit theory or asymptotic theory. The most basic question is: the limit of the random variable sequence X1, X2,... What is behavior? Because statistics and data mining both collect data, we naturally think about what will happen when more and more data are collected.

In calculus, if for any reason \varepsilon >0, there is a number greater than n such |x_n-x|<\varepsilonthat We just say x_nit converges to x, and this x is x_nthe limit of . In probability theory, convergence becomes a little more subtle. Let's go back to calculus for a moment. If for all n, there is x_n=x, then, obviously lim_{n\rightarrow \infty}x_n=x. Then let's think about the general model of this example. If X1, X2... are random variable sequences, they are independent and conform to N(0,1) distribution. Since these random variables all have the same distribution, we can say that Xn converges to X and X follows a normal distribution X\sim N(0,1). But this is not very accurate, because for all n \mathbb{P}(Xn=X)=0(the probability that two consecutive random variables are equal is 0)

Here's another example. Consider X1, X2,...where Xi follows X_i \sim N(0,1/n)a distribution. Intuitively, when n becomes large, Xn is concentrated near 0, so we can say that Xn tends to 0. But for all n \mathbb{P}(X_n=0)=0. Clearly we need to develop a tool to discuss this convergence in a more rigorous way. This chapter will develop this suitable method.

There are two main perspectives in this chapter, informally stated as follows:

  1. The law of large numbers states that the sample average \bar{X}_n=n^{-1}\Sigma X_iconverges to the expectation, \mu = \mathbb{E}(X_i)which means that \bar{X}_nit is close to μ with a high probability
  2. The central limit theorem shows that the \sqrt{n}(\bar{X}_n-\mu)distribution converges to the normal distribution. This means that when n is large enough, the sample mean follows a normal distribution

5.2 Types of convergence

Two main types of convergence are defined as follows:

5.1 Definition

Let X1, X2... be sequences of random variables, and X be another random variable. Let F_nbe the CDF of Xn, Fand let be the CDF of X.

  1. For any \varepsilon > 0, n \to \inftyat that time , there is \mathbb{P}(|X_n-X|>\varepsilon) \to 0. Then Xn is said to converge to X with probability, denoted asX_n \overset{P}{\to} X
  2. If for all t, there exists \underset{n\to\infty}{\lim}F_n(t) =F(t), where F is a continuous function, then Xn is said to converge to X in distribution, denoted as

When constraining the random variable to obey a point mass distribution, we slightly change the way we write it. If \mathbb{P}(X=c) = 1, and X_n \overset p \to X, then we can write it as X_n \overset P \to c. Similarly, we can also write it as

There is one more type of convergence that is introduced mainly because it is very useful for proving probabilistic convergence.t \neq 0

5.2 Definition

If, n \to \inftyat that time , \mathbb{E}(X_n-X)^2 \to 0, then Xn is said to converge to X under the mean square. Referred to asX_n \overset{qm} \to X

Similarly, if X obeys the point mass distribution, it can be written asXn \overset {qm} \to c

5.3 Example

Suppose X_n \sim N(0,1/n). Intuitively, Xn gradually gathers near 0. So we can say that Xn converges to 0. Now let's see if it is correct. Let F be the point mass distribution function at 0. Note \sqrt{n}X_n\sim N(0,1)that Z is a standard normal random variable. For t<0, there is F_n(t) = \mathbb{P}(X_n<t) = \mathbb{P}(\sqrt nX_n < \sqrt n t) = \mathbb{P}(Z < \sqrt n t) \to 0, because \sqrt n t \to - \infty. And for t>0, there is F_n(t)=\mathbb{P}(X_n < t)= \mathbb{P}(\sqrt n X_n < \sqrt n t) = \mathbb{P}(Z < \sqrt n t) \to 1, because \sqrt n t \to \infty.

Therefore, for t \neq 0, there is F_n(t) \to F(t). So Xn converges to 0 on the redistribution.

Note that, F_n(0)=1/2 \neq F(0)=1,so at t=0, convergence is not established. This is not important because t = 0 is not a continuous point of F, and in the definition of distribution convergence, only convergence at continuous points is required. See below

Now let's think about convergence in probability. For any \varepsilon > 0, when n \to \infty, using Markov's inequality, we get

\mathbb{P}(|X_n|>\varepsilon) =\mathbb{P}(|X_n|^2-\varepsilon^2) \leq \frac{\mathbb{E}(Xn^2)}{\varepsilon^2}=\frac{\frac{1}{n}}{\varepsilon^2}\to 0

Therefore Xn converges to 0 with probability.X_n \overset{P} \to 0

The following theorem gives the relationship between two types of convergence. The results are summarized in the figure below

5.4 Theorem

The following relationship is established

  1. X_n \overset{qm} \to XimplicitX_n \overset{P} \to X
  2. X_n \overset{P} \to XIt implies that Xn converges to X in distribution,
  3. If Xn converges to X in distribution, and \mathbb{P}(X=c)=1, then Xn converges to X in probability,X_n \overset{P} \to X

Usually, except for the third point, the reverse does not hold.

To prove, start by proving the first point. Assume that X_n \overset{qm} \to X, for fixed \varepsilon > 0. Then use Markov's inequality

\mathbb{P}(|X_n-X| >< \varepsilon) = \mathbb{P}(|X_n-X|^2>\varepsilon^2) \leq \frac{\mathbb{E}(|X_n-X|^2)}{\varepsilon^2} \to 0

Prove the second point. This proof is a bit complicated, so you can skip it if you don't want to read it. Fixed \varepsilon > 0, let x be a continuous point of F. So

F_n(x)

          =\mathbb{P}(X_n < x)\\\\ =\mathbb{P}(X_n\leq x,X \leq x + \varepsilon)+\mathbb{P}(X_n \leq x,X > x+\varepsilon) \\\\ \leq \mathbb{P}(X \leq x+\varepsilon) + \mathbb{P}(|X_n - X| > \varepsilon)\\\\ =F(x+\varepsilon)+\mathbb{P}(|X_n-X| > \varepsilon)

at the same time,

F(x-\varepsilon)

    =\mathbb{P}(X \leq x -\varepsilon) =\mathbb{P}(X \leq x -\varepsilon,X_n \leq x )+\mathbb{P}(X \leq x -\varepsilon,X_n > x)\\\\ \leq Fn(x)+\mathbb{P}(|X_n-X| > \varepsilon)

therefore,

F(x-\varepsilon) - \mathbb{P}(|X_n-X| > \varepsilon) \leq F_n(x) \leq F(x+\varepsilon) +\mathbb{P}(|X_n-X| > \varepsilon)

Taking the limit , we n \to \inftyget,F(x-\varepsilon) \leq \underset{n\to \infty }\lim inf F_n(x) \leq \underset{n\to \infty }\lim sup F_n(x) \leq F(x+\varepsilon)

Listing is true for all \varepsilon > 0, take the limit of the above formula \varepsilon \to 0, and F is continuous at x\lim_n F_n(x)=F(x)

Prove the third point. fixed \varepsilon > 0, then

\mathbb{P}(|X_n-c| > \varepsilon)

    =\mathbb{P}(X_n < c-\varepsilon)+\mathbb{P}(X_n > c+ \varepsilon)\\\\ \leq \mathbb{P}(X_n < c-\varepsilon)+\mathbb{P}(X_n > c+ \varepsilon)\\\\ =F_n(c-\varepsilon)+1-F_n(c+\varepsilon)\\\\ \to F(c-\varepsilon)+1-F(c+\varepsilon)\\\\ =0+1-1=0

Now let us prove that the opposite direction is not true.

Convergence in probability does not mean convergence in mean square : let U \sim Unif(0,1), and let again X_n =\sqrt{n}I_{(0,1/n)}(U), then

\mathbb{P}(|X_n| > \varepsilon) = \mathbb{P}(\sqrt n I_{(0,1/n)}(U) > \varepsilon) = \mathbb{P}(0 \leq U < 1/n) = 1/n \to 0.Thus X_n \overset{P} \to 0, but for all n, \mathbb{E}(X_n^2)=n\int_0^1du=1, so Xn will not converge under the mean square.

Convergence in distribution does not mean convergence in probability : let X \sim N(0,1), X_n =-X, where n=1,2,3.... Therefore X_n \sim N(0,1). For all n, Xn and X have the same distribution function. Therefore, for all x \lim _n F_n(x) = F(x), Xn converges distributionally to X. But \mathbb{P}(|X_n-X| > \epsilon) = \mathbb{P}(|2X| > \epsilon) = \mathbb{P}(|X| > \epsilon/2) \neq. So Xn does not converge to X in probability

Warning: One might think that if X_n \overset{P} \to b, then \mathbb{E}(X_n) \to b, this is incorrect. Let X be a random variable with probability \mathbb{P}(X_n=n^2)=1/n. \mathbb{P}(X_n=0) = 1-(1/n)Now, \mathbb{P}(|X_n| < \varepsilon) = \mathbb{P}(X_n = 0) =1-(1/n) \to 1.Therefore, X_n \overset{P} \to 0.But, \mathbb{E}(X_n) = [n^2\times(1/n)]+[0\times (1-(1/n))] = n,therefore\mathbb{E}(X_n) \to \infty

5.5 Theorem

Let Xn,X,Yn,Y be random variables, let g be a continuous function

  1. if X_n \overset{P} \to X, and Y_n \overset{P} \to Y, thenX_n+Y_n \overset{P} \to X+Y
  2. if X_n \overset{qm} \to X, and Y_n \overset{qm} \to Y, thenX_n+Y_n \overset{qm} \to X+Y
  3. If Xn converges to X in distribution and Yn converges to c in distribution, then Xn+Yn converges to X+c in distribution
  4. if X_n \overset{P} \to X, and Y_n \overset{P} \to Y, thenX_nY_n\overset{P}\to XY
  5. If Xn converges to X in distribution and Yn converges to c in distribution, then XnYn converges to cX in distribution
  6. if X_n \overset{P} \to X, theng(X_n) \overset{P} \to g(X)
  7. If Xn converges to X in distribution, then g(Xn) converges to g(X) in distribution

Among them, 3-5 are Slutzky's theorem. It is worth noting that Xn converges to X in distribution, and Yn converges to Y in distribution. It cannot be concluded that Xn+Yn converges to X+ in distribution. Y

5.3 Law of large numbers

Now we come to the pinnacle achievement in probability theory - The Law of Large Numbers. This theory states that the average of a large number of samples is close to the mean of the distribution. For example, if you toss a large number of coins, the proportion of heads will be close to 1/2. Now let's describe it more precisely.

Assume that X1 \mu =\mathbb{E}(X_1), _\sigma^2=\mathbb{V}(X_1)\bar{X}_n=n^{-1}\Sigma X_i\mathbb{E}(\bar{X}_n) = \mu\mathbb{V}(\bar{X}_ n)= \sigma^2/n

5.6 Theorem

The Weak Law of Large Numbers (WLLN)

If X1, X2...Xn are independently and identically distributed, then\bar{X}_n \overset{P} \to \mu

Explanation of WLLN (Law of Large Numbers): As n increases, the distribution of Xn gradually concentrates around μ.

Proof: Assumption \sigma < \infty. This assumption is not required, but it simplifies the proof. Using Chebyshev’s inequality we get:

\mathbb{P}(|\bar{X}_n-\mu| > \varepsilon) \leq \frac{\mathbb{V}(\bar{X}_n)}{\varepsilon^2}=\frac{\sigma^2}{n\varepsilon^2}.When n tends to infinity, this formula tends to 0.

5.7 Example

Consider tossing a coin where the probability of heads is p. Let Xi be the result of a single toss (0,1). Therefore p=\mathbb{P}(X_i=1)=E(X_i), the proportion of n heads after this toss is: \bar{X}_n. According to the law of large numbers, \bar{X}_nit converges to p in probability. This does not mean that it \bar{X}_nis numerically equal to p. It simply means that, when n is large enough, \bar{X}_nthe distribution of is tightly around p. If p=1/2, then for a large n, we can let \mathbb{P}(0.4 \leq \bar{X}_n \leq 0.6) \geq 0.7. First, \mathbb{E}(\bar{X}_n) = p = 1/2, and \mathbb{V}(\bar{X}_n)=\sigma^2/n=p(1-p)/n=1/(4n), from Chebyshev's inequality:

\mathbb{P}(0.4 \leq \bar{X}_n \leq 0.6)

=\mathbb{P}(|\bar{X}_n-\mu| \leq 0.1)\\\\ =1-\mathbb{P}(|\bar{X}_n-\mu| > 0.1)\\\\ \geq 1-\frac{1}{4n(0.1)^2}\\\\ =1-\frac{25}{n}

Got, if n=84, then the expression will be greater than 0.7

5.4 Central limit theorem

The law of large numbers states \bar{X}_nthat distributions of \mu. It does not help us state \bar{X}_nthe probabilistic properties, for which we also need the central limit theorem.

Assume that X1 \mu, ... _ This theorem is striking because it requires nothing more than the existence of a mean and a variance.\sigma^2\bar{X}_n\mu\sigma^2/n

5.8 Theorem

The Central Limit Theorem (CLT). Let X1,...Xn be independent and identically distributed \muwith mean and variance . \sigma^2Suppose \bar{X}_n=n^{-1}\Sigma_{i=1}^nX_i. Then

Z_n=\frac{\bar{X}_n-\mu}{\sqrt{\mathbb{V}(\bar{X}_n)}}=\frac{\sqrt n(\bar{X}_n-\mu)}{\sigma}The distribution converges to Z (normal distribution)

in other words,\underset {n\to \infty }\lim \mathbb{P}(Z_n \leq z) = \Phi(z) = \int _{-\infty}^z \frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx

Explanation: The probability state with respect to Xn can be approximated using a normal distribution. What we are approximating is the probability state, not the random variable itself.

In addition to Zn's distribution converging to N(0,1), there are several formats as follows to indicate that Zn's distribution converges to normal. They all mean the same thing.

5.9 Example

Suppose that the number of program errors per minute follows a Poisson distribution with mean 5. There are 125 programs available. Let X1,...X125 be the number of errors for these programs. we want to ask\mathbb{P}(\bar{X}_n < 5.5)

Let \mu = E(X_1) = \lambda = 5, \sigma^2 = \mathbb{V}(X_1) = \lambda =5.then\mathbb{P}(\bar{X}_n < 5.5 ) = \mathbb{P}(\frac{\sqrt n (\bar{X}_n - \mu)}{\sigma} < \frac{\sqrt n (5.5 - \mu)}{\sigma} ) \approx \mathbb{P}(Z < 2.5) = 0.9938

The central limit theorem tells us that Z_n=\sqrt n (\bar{X}_n-\mu)/\sigmait is approximately N(0,1). However, we rarely know it \sigma. Later we will estimate it in the following way \sigma:

S_n^2=\frac{1}{n-1}\overset{n}{\underset {i=1}\Sigma}(X_i-\bar{X}_n)^2

This leads to the following question: S_n^2does \sigmathe central limit theorem still hold if we use it instead? The answer is: yes

5.10 Theorem

Assuming the same conditions as CLT, then

\frac{\sqrt n (\bar{X}_n -\mu)}{S_n} \sim N(0,1)

You may be wondering, how accurate is this normal approximation? The answer will be given in the Berry-Esseen theorem

5.11 Theorem (The Berry-Esseen Inequality)

Assume \mathbb{E}|X_1|^3 < \infty. Then\underset z {sup}|\mathbb{P}(Z_n<z)-\Phi(z)| \leq \frac{33}{4}\frac{\mathbb{E}|X_1 - \mu|^3}{\sqrt n\sigma ^3}

There is also a multivariate version of the central limit theorem

5.12 Theorem (Multivariate central limit theorem)

Let X1,...Xn be independent and identically distributed vectors, where Xi is:

X_i=\begin{pmatrix} X_{1i}\\ X_{2i}\\ \vdots\\ X_{ki} \end{pmatrix}

The mean μ is:

\mu=\begin{pmatrix} \mu_1\\ \mu_2\\ \vdots \\ \mu_k \end{pmatrix}=\begin{pmatrix} \mathbb{E}(X_{1i})\\ \mathbb{E}(X_{2i})\\ \vdots \\ \mathbb{E}(X_{ki}) \end{pmatrix}

Variance matrix Σ.

Let \bar{X} = \begin{pmatrix} \bar{X}_1\\ \bar{X}_2\\ \vdots\\ \bar{X}_k \end{pmatrix}, where \bar{X}_j=n^{-1}\overset n {\underset {i=1}\Sigma }X_{ji}. Then \sqrt n(\bar{X} -\mu)converge to probabilityN(0,\Sigma)

5.5 Delta method

If the limit distribution of Yn is a normal distribution, then the Delta method provides g(Y_n)a method to find the limit distribution, where the function g is any continuous function.

5.13 Theorem (Delta method)

Assume: \frac{\sqrt n (Y_n -\mu)}{\sigma}The distribution converges to N(0,1), and g is a differentiable function, then \frac{\sqrt n( g(Y_n) - g(\mu))}{|g'(\mu)|\sigma}the distribution converges to N(0,1).

In other words, Y_n \approx N(\mu,\frac{\sigma^2}{n}),implicitg(Y_n) \approx N(g(\mu),(g'(\mu))^2\ \frac{\sigma^2}{n})

5.14 Example

Let X1,..Xn be independent and identically distributed with finite mean μ and finite variance σ. According to the central limit theorem, \sqrt n (\bar X_n -\mu )/\sigmathe distribution converges to N(0,1). Let W_n=e^{\bar X_n}. Therefore W_n=g(\bar X_n), where g(s)=e^s. Because g'(s)=e^s. According to the Delta method, we getW_n \approx N(e^\mu,e^{2\mu}\sigma^2/n)

The delta method also has a multivariate version

5.15 Theorem

Let Y_n=(Y_{n1},...Y_{nk})be a random vector sequence that satisfies the following:

\sqrt n (Y_n -\mu )probability converges toN(0,\Sigma)

order g:\mathbb{R}^k \to \mathbb{R}, and

\triangledown g(y)=\begin{pmatrix} \frac{\partial g}{\partial y_1}\\ \vdots\\ \frac{\partial g}{\partial y_K} \end{pmatrix}

Let \triangledown _\mube the value \triangledown g(y)at y=\mu, and let \triangledown _\munone of the elements be 0. So

\sqrt n (g(Y_n)-g(\mu))The distribution converges toN(0,\triangledown _\mu^T\Sigma\triangledown _\mu)

5.16 Example

Let \begin{pmatrix} X_{11}\\ X_{21} \end{pmatrix},\begin{pmatrix} X_{12}\\ X_{22} \end{pmatrix},\dots, \begin{pmatrix} X_{1n}\\ X_{2n} \end{pmatrix}be \mu=(\mu_1,\mu_2)^Tan IID random vector with mean and variance Σ. Let \bar X_1 = \frac{1}{n}\overset n {\underset{i=1}\Sigma}X_{1i}\bar X_2 = \frac{1}{n}\overset n {\underset{i=1}\Sigma}X_{2i}, and define Y_n=\bar X_1 \bar X_2. Therefore Y_n=g(\bar X_1,\bar X_2)where, g(s_1,s_2)=s_1s_2.according to the central limit theorem

\sqrt n \begin{pmatrix} \bar X_1 - \mu_1\\ \bar X_2 - \mu_2 \end{pmatrix}Converges to N(0,Σ) in distribution

now \triangledown g(s)=\begin{pmatrix} \frac{\partial g}{\partial s_1}\\ \frac{\partial g}{\partial s_2} \end{pmatrix}=\begin{pmatrix} s_2\\ s_1 \end{pmatrix}, and\triangledown_\mu^T\Sigma\triangledown_\mu=(\mu_2\ \ \mu_1)\begin{pmatrix} \sigma_{11} & \sigma_{12}\\ \sigma_{21} & \sigma_{22} \end{pmatrix}\begin{pmatrix} \mu_2\\ \mu_1 \end{pmatrix}=\mu_2^2\sigma_{11}+2\mu_1\mu_2\sigma_{12}+\mu_1^2\sigma_{ 22}

Therefore, \sqrt n (\bar X_1 \bar X_2 - \mu_1\mu_2)the distribution converges toN(0,\mu_2^2\sigma_{11}+2\mu_1\mu_2\sigma_{12}+\mu_1^2\sigma_{22})

End of this chapter

Untranslated: literature notes, appendices, homework

Guess you like

Origin blog.csdn.net/xiaowanbiao123/article/details/133301048