Contents of this chapter
- 5.1 Introduction
- 5.2 Types of convergence
- 5.3 The Law of Large Numbers
- 5.4 The Central Limit Theorem
- 5.5 Delta method
As for the key nouns, some words may not convey the meaning, so the key nouns are organized as follows
1. The Law of Large Numbers: The Law of Large Numbers
2. The Central Limit Theorem: The Central Limit Theorem
3. large sample theory: large sample theory
4. Limit Theory: limit Theory
5. Asymptotic theory: asymptotic theory
6. Slutzky's theorem: Slutzky's theorem
7. The Weak Law of Large Numbers(WLLN)
8. Multivariate central limit theorem:Multivariate central limit theorem
5.1 Introduction
One of the most interesting aspects of probability theory is the behavior of sequences of random variables. This part of probability theory is called large sample theory or limit theory or asymptotic theory. The most basic question is: the limit of the random variable sequence X1, X2,... What is behavior? Because statistics and data mining both collect data, we naturally think about what will happen when more and more data are collected.
In calculus, if for any reason , there is a number greater than n such that We just say it converges to x, and this x is the limit of . In probability theory, convergence becomes a little more subtle. Let's go back to calculus for a moment. If for all n, there is , then, obviously . Then let's think about the general model of this example. If X1, X2... are random variable sequences, they are independent and conform to N(0,1) distribution. Since these random variables all have the same distribution, we can say that Xn converges to X and X follows a normal distribution . But this is not very accurate, because for all n (the probability that two consecutive random variables are equal is 0)
Here's another example. Consider X1, X2,...where Xi follows a distribution. Intuitively, when n becomes large, Xn is concentrated near 0, so we can say that Xn tends to 0. But for all n . Clearly we need to develop a tool to discuss this convergence in a more rigorous way. This chapter will develop this suitable method.
There are two main perspectives in this chapter, informally stated as follows:
- The law of large numbers states that the sample average converges to the expectation, which means that it is close to μ with a high probability
- The central limit theorem shows that the distribution converges to the normal distribution. This means that when n is large enough, the sample mean follows a normal distribution
5.2 Types of convergence
Two main types of convergence are defined as follows:
5.1 Definition
Let X1, X2... be sequences of random variables, and X be another random variable. Let be the CDF of Xn, and let be the CDF of X.
- For any , at that time , there is . Then Xn is said to converge to X with probability, denoted as
- If for all t, there exists , where F is a continuous function, then Xn is said to converge to X in distribution, denoted as
When constraining the random variable to obey a point mass distribution, we slightly change the way we write it. If , and , then we can write it as . Similarly, we can also write it as
There is one more type of convergence that is introduced mainly because it is very useful for proving probabilistic convergence.
5.2 Definition
If, at that time , , then Xn is said to converge to X under the mean square. Referred to as
Similarly, if X obeys the point mass distribution, it can be written as
5.3 Example
Suppose . Intuitively, Xn gradually gathers near 0. So we can say that Xn converges to 0. Now let's see if it is correct. Let F be the point mass distribution function at 0. Note that Z is a standard normal random variable. For t<0, there is , because . And for t>0, there is , because .
Therefore, for , there is . So Xn converges to 0 on the redistribution.
Note that, ,so at t=0, convergence is not established. This is not important because t = 0 is not a continuous point of F, and in the definition of distribution convergence, only convergence at continuous points is required. See below
Now let's think about convergence in probability. For any , when , using Markov's inequality, we get
Therefore Xn converges to 0 with probability.
The following theorem gives the relationship between two types of convergence. The results are summarized in the figure below
5.4 Theorem
The following relationship is established
- implicit
- It implies that Xn converges to X in distribution,
- If Xn converges to X in distribution, and , then Xn converges to X in probability,
Usually, except for the third point, the reverse does not hold.
To prove, start by proving the first point. Assume that , for fixed . Then use Markov's inequality
Prove the second point. This proof is a bit complicated, so you can skip it if you don't want to read it. Fixed , let x be a continuous point of F. So
at the same time,
therefore,
Taking the limit , we get,
Listing is true for all , take the limit of the above formula , and F is continuous at x
Prove the third point. fixed , then
Now let us prove that the opposite direction is not true.
Convergence in probability does not mean convergence in mean square : let , and let again , then
.Thus , but for all n, , so Xn will not converge under the mean square.
Convergence in distribution does not mean convergence in probability : let , , where n=1,2,3.... Therefore . For all n, Xn and X have the same distribution function. Therefore, for all x , Xn converges distributionally to X. But . So Xn does not converge to X in probability
Warning: One might think that if , then , this is incorrect. Let X be a random variable with probability . Now, .Therefore, .But, ,therefore
5.5 Theorem
Let Xn,X,Yn,Y be random variables, let g be a continuous function
- if , and , then
- if , and , then
- If Xn converges to X in distribution and Yn converges to c in distribution, then Xn+Yn converges to X+c in distribution
- if , and , then
- If Xn converges to X in distribution and Yn converges to c in distribution, then XnYn converges to cX in distribution
- if , then
- If Xn converges to X in distribution, then g(Xn) converges to g(X) in distribution
Among them, 3-5 are Slutzky's theorem. It is worth noting that Xn converges to X in distribution, and Yn converges to Y in distribution. It cannot be concluded that Xn+Yn converges to X+ in distribution. Y
5.3 Law of large numbers
Now we come to the pinnacle achievement in probability theory - The Law of Large Numbers. This theory states that the average of a large number of samples is close to the mean of the distribution. For example, if you toss a large number of coins, the proportion of heads will be close to 1/2. Now let's describe it more precisely.
Assume that X1 , _
5.6 Theorem
The Weak Law of Large Numbers (WLLN)
If X1, X2...Xn are independently and identically distributed, then
Explanation of WLLN (Law of Large Numbers): As n increases, the distribution of Xn gradually concentrates around μ.
Proof: Assumption . This assumption is not required, but it simplifies the proof. Using Chebyshev’s inequality we get:
.When n tends to infinity, this formula tends to 0.
5.7 Example
Consider tossing a coin where the probability of heads is p. Let Xi be the result of a single toss (0,1). Therefore , the proportion of n heads after this toss is: . According to the law of large numbers, it converges to p in probability. This does not mean that it is numerically equal to p. It simply means that, when n is large enough, the distribution of is tightly around p. If p=1/2, then for a large n, we can let . First, , and , from Chebyshev's inequality:
Got, if n=84, then the expression will be greater than 0.7
5.4 Central limit theorem
The law of large numbers states that distributions of . It does not help us state the probabilistic properties, for which we also need the central limit theorem.
Assume that X1 , ... _ This theorem is striking because it requires nothing more than the existence of a mean and a variance.
5.8 Theorem
The Central Limit Theorem (CLT). Let X1,...Xn be independent and identically distributed with mean and variance . Suppose . Then
The distribution converges to Z (normal distribution)
in other words,
Explanation: The probability state with respect to Xn can be approximated using a normal distribution. What we are approximating is the probability state, not the random variable itself.
In addition to Zn's distribution converging to N(0,1), there are several formats as follows to indicate that Zn's distribution converges to normal. They all mean the same thing.
5.9 Example
Suppose that the number of program errors per minute follows a Poisson distribution with mean 5. There are 125 programs available. Let X1,...X125 be the number of errors for these programs. we want to ask
Let , .then
The central limit theorem tells us that it is approximately N(0,1). However, we rarely know it . Later we will estimate it in the following way :
This leads to the following question: does the central limit theorem still hold if we use it instead? The answer is: yes
5.10 Theorem
Assuming the same conditions as CLT, then
You may be wondering, how accurate is this normal approximation? The answer will be given in the Berry-Esseen theorem
5.11 Theorem (The Berry-Esseen Inequality)
Assume . Then
There is also a multivariate version of the central limit theorem
5.12 Theorem (Multivariate central limit theorem)
Let X1,...Xn be independent and identically distributed vectors, where Xi is:
The mean μ is:
Variance matrix Σ.
Let , where . Then converge to probability
5.5 Delta method
If the limit distribution of Yn is a normal distribution, then the Delta method provides a method to find the limit distribution, where the function g is any continuous function.
5.13 Theorem (Delta method)
Assume: The distribution converges to , and g is a differentiable function, then the distribution converges to N(0,1).
In other words, ,implicit
5.14 Example
Let X1,..Xn be independent and identically distributed with finite mean μ and finite variance σ. According to the central limit theorem, the distribution converges to N(0,1). Let . Therefore , where . Because . According to the Delta method, we get
The delta method also has a multivariate version
5.15 Theorem
Let be a random vector sequence that satisfies the following:
probability converges to
order , and
Let be the value at , and let none of the elements be 0. So
The distribution converges to
5.16 Example
Let be an IID random vector with mean and variance Σ. Let , and define . Therefore where, .according to the central limit theorem
Converges to N(0,Σ) in distribution
now , and
Therefore, the distribution converges to
End of this chapter
Untranslated: literature notes, appendices, homework