Boole's, Doob's inequality, Central Limit Theorem, Kolmogorov extension theorem, Lebesgue's domin

1. Boole’s inequality

In probability theory, Boole’s inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events.

Boolean inequality (Boole's inequality), proposed by George Boole, means that the probability of all events is not greater than the sum of the probabilities of a single event.

Formally, for a countable set of events A1, A2, A3, …, we have
Insert picture description here

Insert picture description here

1.1 Proof using induction

Insert picture description here

1.2 Proof without using induction

Insert picture description here

1.3 Generalization

Boole’s inequality may be generalized to find upper and lower bounds on the probability of finite unions of events. These bounds are known as Bonferroni inequalities, after Carlo Emilio Bonferroni;
Insert picture description here
Boole’s inequality is the initial case, k = 1. When k = n, then equality holds and the resulting identity is the inclusion–exclusion principle.

2. Doob’s inequality

Insert picture description here
Insert picture description here
for every K>0 and p>1.

2.1 proof of Doob’s inequalities

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

3. Central Limit Theorem (Central Limit Theorem)

The central limit theorem refers to a given population of arbitrary distribution. I randomly select n samples from these populations each time, a total of m times. Then take the m groups of samples to find the average value. The distribution of these averages is close to a normal distribution.

Shall we give a chestnut first?

Now we want to count the weight of people across the country to see what the average weight of our country is. Of course, it is unrealistic for us to survey the weight of everyone in the country. So we plan to investigate a total of 1,000 groups with 50 people in each group. Then, we find the average weight of the first group, the average weight of the second group, and the average weight of the last group. The central limit theorem says: These averages are normally distributed. And, as the number of groups increases, the effect will be better. Finally, when we add up the average values ​​calculated by the 1000 groups and take an average value, this average value will be close to the national average weight.

Some points to note:

  • The distribution of the population itself does not require a normal distribution. In the
    above example, the weight of a person is normally distributed. But if our example is to roll a dice (average distribution), in the end, the average of each group will also form a normal distribution. (magical!)

  • Each group of samples should be large enough, but it does not need to be too large. When
    sampling, it is generally believed that 30 or more samples per group can make the central limit theorem work.

The central limit theorem is the following two sentences:
1) The average value of any sample will be approximately equal to the average value of the population in which it is located.
2) Regardless of the distribution of the population, the sample average of any population will surround the average of the population and have a normal distribution.

Insert picture description here

In real life, we cannot know the statistical parameters such as the average and standard deviation of the objects we want to study. The central limit theorem guarantees in theory that we can use only a part of the sampling method to achieve the purpose of inferring the statistical parameters of the research object.

3.1 What is the use of the central limit theorem?

1) When there is no way to get all the data of the population, we can use the sample to estimate the population. If we know the mean and standard deviation of a correctly drawn sample, we can estimate the mean and standard deviation of the population. For example, if you are the leader of Xicheng District, Beijing, you want to evaluate the teaching quality of schools in Xicheng District. At the same time, you don't believe in the results of the unified examination of each school, so it is necessary to conduct a sample test for each school, that is, randomly select 100 students to take a test similar to the unified examination. As a leader in charge of education, do you think it is feasible to judge the teaching quality of the entire school based on the results of 100 students? The answer is feasible. The central limit theorem tells us that a correctly drawn sample will not be very different from the group it represents. In other words, the sample results (test scores of 100 randomly selected students) can well reflect the situation of the entire group (test performance of all students in a school). Of course, this is also the operating mechanism of opinion polls. The 1,200 Americans selected through a complete sample extraction plan can tell us to a large extent what the people of the entire country are thinking at the moment. 2) Judging whether a sample belongs to the population based on the average and standard deviation of the population. If we have the specific information of a population and the data of a sample, we can infer whether the sample is one of the samples of the population . Through the normal distribution of the central limit theorem, we can calculate the probability that a sample belongs to the population. If the probability is very low, then we can confidently say that the sample does not belong to the group.

Law of Large Numbers https://www.zhihu.com/question/19911209/answer/245487255

4. Kolmogorov extension theorem

In mathematics, the Kolmogorov extension theorem (also known as Kolmogorov existence theorem, the Kolmogorov consistency theorem or the Daniell-Kolmogorov theorem) is a theorem that guarantees that a suitably “consistent” collection of finite-dimensional distributions will define a stochastic process. It is credited to the English mathematician Percy John Daniell and the Russian mathematician Andrey Nikolaevich Kolmogorov.

4.1 Statement of the theorem

Insert picture description here

In fact, it is always possible to take as the underlying probability space Ω = ( R n ) T \Omega =(\mathbb {R} ^{n})^{T} Ω=(Rn)T and to take for X X X the canonical process X  ⁣ : ( t , Y ) ↦ Y t X\colon (t,Y)\mapsto Y_{t} X:(t,And )Yt. Therefore, an alternative way of stating Kolmogorov’s extension theorem is that, provided that the above consistency conditions hold, there exists a (unique) measure ν \nu ν on ( R n ) T (\mathbb {R} ^{n})^{T} (Rn)T with marginals ν t 1 … t k \nu _{t_{1}\dots t_{k}} νt1tk for any finite collection of times t 1 … t k t_{1}\dots t_{k} t1tk. Kolmogorov’s extension theorem applies when T T T is uncountable, but the price to pay for this level of generality is that the measure ν \nu ν is only defined on the product σ-algebra of ( R n ) T (\mathbb {R} ^{n})^{T} (Rn)T, which is not very rich.

4.2 Explanation of the conditions

Insert picture description here

4.3 Implications of the theorem

Since the two conditions are trivially satisfied for any stochastic process, the power of the theorem is that no other conditions are required: For any reasonable (i.e., consistent) family of finite-dimensional distributions, there exists a stochastic process with these distributions.

The measure-theoretic approach to stochastic processes starts with a probability space and defines a stochastic process as a family of functions on this probability space. However, in many applications the starting point is really the finite-dimensional distributions of the stochastic process. The theorem says that provided the finite-dimensional distributions satisfy the obvious consistency requirements, one can always identify a probability space to match the purpose. In many situations, this means that one does not have to be explicit about what the probability space is. Many texts on stochastic processes do, indeed, assume a probability space but never state explicitly what it is.

The theorem is used in one of the standard proofs of existence of a Brownian motion, by specifying the finite dimensional distributions to be Gaussian random variables, satisfying the consistency conditions above. As in most of the definitions of Brownian motion it is required that the sample paths are continuous almost surely, and one then uses the Kolmogorov continuity theorem to construct a continuous modification of the process constructed by the Kolmogorov extension theorem.

4.4 General form of the theorem

The Kolmogorov extension theorem gives us conditions for a collection of measures on Euclidean spaces to be the finite-dimensional distributions of some R n \mathbb {R} ^{n} Rn-valued stochastic process, but the assumption that the state space be R n \mathbb {R} ^{n} Rn is unnecessary. In fact, any collection of measurable spaces together with a collection of inner regular measures defined on the finite products of these spaces would suffice, provided that these measures satisfy a certain compatibility relation. The formal statement of the general theorem is as follows.

This theorem has many far-reaching consequences; for example it can be used to prove the existence of the following, among others:

  • Brownian motion, i.e., the Wiener process,
  • a Markov chain taking values in a given state space with a given transition matrix,
  • infinite products of (inner-regular) probability spaces.

4.5 History

According to John Aldrich, the theorem was independently discovered by British mathematician Percy John Daniell in the slightly different setting of integration theory.

Insert picture description here
Insert picture description here
Insert picture description here

5. Lebesgue’s dominated convergence theorem

In measure theory, Lebesgue’s dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the L1 norm. Its power and utility are two of the primary theoretical advantages of Lebesgue integration over Riemann integration.

In addition to its frequent appearance in mathematical analysis and partial differential equations, it is widely used in probability theory, since it gives a sufficient condition for the convergence of expected values of random variables.

5.1 Statement

Insert picture description here
Insert picture description here

5.2 Proof

Without loss of generality, one can assume that f is real, because one can split f into its real and imaginary parts (remember that a sequence of complex numbers converges if and only if both its real and imaginary counterparts converge) and apply the triangle inequality at the end.

Lebesgue’s dominated convergence theorem is a special case of the Fatou–Lebesgue theorem. Below, however, is a direct proof that uses Fatou’s lemma as the essential tool.
Insert picture description here
Insert picture description here

https://en.wikipedia.org/wiki/Boole%27s_inequality
https://planetmath.org/alphabetical.html
https://zhuanlan.zhihu.com/p/25241653
https://www.zhihu.com/question/22913867
https://en.wikipedia.org/wiki/Kolmogorov_extension_theorem
https://blog.csdn.net/weixin_44207974/article/details/111503988
https://blog.csdn.net/weixin_44207974/article/details/111602960
https://en.wikipedia.org/wiki/Dominated_convergence_theorem

Guess you like

Origin blog.csdn.net/Anne033/article/details/113706990
Recommended