All of Statistics Chapter 1

Statistics (1) Probability

Update history date content
1 2023-9-24 Change "intersection" in the third theorem that probability satisfies to "union"

In order to learn machine learning, now relearn statistics and record the entire learning process. This article is a study note, mainly translated from "all of statistics" by Larry Wasserman

Contents of this Chapter
1.1 Introduction
1.2 Sample space and events
1.3 Probability
1.4 Probability in finite sampling space
1.5 Independent events
1.6 Conditional probability
1.7 Bayes’ theorem

Because when translating English nouns into Chinese nouns, the meaning is often unclear, so some nouns are given in both Chinese and English. In fact, the understanding of the nouns will be deeper if you use the English nouns directly.

Key terms:

  1. Sample space: sample space
  2. events:events
  3. Sample results:sample outcoms
  4. actual results:realizations
  5. Sample elements or elements:elements
  6. Disjoint: disjoint
  7. Mutually exclusive: mutually exclusive
  8. Indicative function, or indicator function: indicator function
  9. probability distribution:probability distribution
  10. probability measure:probability measure
  11. Axiom:Axiom
  12. Frequency:frequency
  13. Credibility: degree of belief
  14. Frequentist school: frequentist
  15. Bayesian schools:Bayesian schools
  16. Lemma:lemma
  17. Theorem:Theorem
  18. equally likely: equally likely
  19. Uniform probability distribution: uniform probability distribution.
  20. Combinatorial methods:combinatorial methods
  21. Independent Events:Independent Events
  22. Venn diagram: Venn diagram
  23. Conditional Probability: Conditional Probability
  24. Expert system:expert system
  25. Bayesian networks:Bayes' nets
  26. Total Probability Theorem or Total Probability Theorem: The Law of Total Probability
  27. Prior probability: prior probability
  28. posterior probability: posterior probability

1.1 Introduction

Probability is a mathematical language for quantifying uncertainty. In this chapter, we will introduce the basic concepts under probability theory. First we start with the sample space, which is the set of all possible outcomes.

1.2 Sample space and events

Sample space (sample space) Ω represents the set of all possible outcomes in a random experiment. The lowercase ω represents a point in the sample space, called sample outcomes (sample outcomes), or actual results (realizations), or sample elements (elements) .

The subsets in the sample space Ω are called events (Events)

1.1 Example

If we toss the coin twice, then Ω={HH,HT,TH,TT} , and the first toss of the coin is positive A={HH,HT}

1.2 Example

Let ω be the result of a measurement of some physical quantity, such as temperature. Then Ω=R={-∞,∞} , some people may say that defining Ω as R is not accurate because the temperature has a minimum limit. However, when the sample space is larger than actual, there is usually no problem.

Then the event A with a temperature greater than 10 and less than or equal to 23 can be expressed as A=(10,23]

1.3 Example

Note that because text is somewhat difficult to describe complex mathematical formulas, some complex mathematical formulas will be replaced by pictures.

If we toss a coin infinitely, the sample space will be an infinite set. It can be expressed as:

Ω= {ω=(ω1,ω2,ω3,…,):ω∈{H,T}}

Insert image description here

Let E be the first time heads appear in the third coin toss, then E can be expressed as:

E={(ω1,ω2,ω3,…,): ω1=T,ω2=T,ω3=H,ωi∈{H,T} For i>3}

Insert image description here

Symbols for complement, union, intersection and difference

Given an event A, use the following formula to express the complement of A

Insert image description here

Formally, A^c can be read as "not A".

Then the complement of Ω is ∅.

The union of event A and event B is:

Insert image description here

The above formula can be considered as: A or B.

If A1, A2... are set sequences, then their union can be expressed as:

Insert image description here

The intersection of event A and event B is:

Insert image description here

Pronounced: A and B, sometimes the intersection is written as AB or (A, B)

If A1, A2,... are set sequences, then their intersection can be expressed as:

Insert image description here

The difference between sets is defined as follows:

Insert image description here

If every element in A is also contained in B, then it can be written as

Insert image description here

The equivalent way of writing is also

Insert image description here

If A is a finite set, then |A| represents the number of elements in A

The figure below gives an overview

Insert image description here

mutually exclusive, disjoint

If Ai∩Aj=∅,i ≠ j , then we say that A1, A2,... are disjoint (disjoint), or mutually exclusive (mutually exclusive)

For example, A1=(0,1],A2=(1,2],A3=(2,3],… are disjoint.

Divide Ω into a series of disjoint sets, A1, A2, A3,... Then they satisfy

Insert image description here

Indicative function or indicator function

The indicator function of A, or indicator function, is defined as:

Insert image description here

Monotonically increasing and monotonically decreasing

If A1 ⊂ A2 ⊂ A3 ⊂ A4 … , and we define An as follows

Insert image description here

Then we have A1, A2, A3,... which are monotonically increasing.

If A1 ⊃ A2 ⊃ A3 ⊃ A4…, and we define An as follows

Insert image description here

1.4 Example

Because the text is difficult to type, I provide pictures directly.

Insert image description here

1.3 Probability

We will assign a real number P(A) to event A, called the probability of A. We call this P a probability distribution or probability measure.

1.5 Definition of probability distribution or probability measure

If the function P, for each event A, has P(A) as a real number and satisfies the following three axioms (axiom), then P is the probability distribution (Probability distribution) or probability measure (Probability measure)

  1. Axiom 1(axiom 1): P(A) ≥ 0
  2. Axiom 2(axiom 2): P(Ω) = 1
  3. Axiom 3 (axiom 3): If A1, A2,... do not intersect, then the probability distribution of the union is equal to the sum of the probability distributions of each subset. The formula is as follows
    Insert image description here

There are many explanations and understandings of P(A). There are two most common explanations: frequencies and degree of belief. To explain by frequency is that in repeated experiments, P(A) represents event A. The ratio of true times to long-running times.

For example, when we say that the probability of a coin landing heads is 1/2, it means that in a trial of tossing a coin multiple times, as the number of tosses increases, the proportion of heads to the total number of times tends to 1 /2

In an infinitely long, unpredictable tossing sequence, the proportional limit of heads tends to a constant. This is an ideal situation. This ideal situation is like a straight line in geometry.

Explained in terms of degree-of-belief, P(A) measures the observer's belief that A is true.

No matter which explanation is used, we require them to satisfy the above three axioms. The differences between these two explanations will not matter much before dealing with statistical inference. In statistical inference, these two different explanations, This resulted in two different schools of inference: frequentist and Bayesian schools. We will discuss this in Chapter 11

We can deduce many properties of P from the above three axioms, as follows

Insert image description here

In the following lemma (Lemma), a less obvious property is given

1.6 Lemma

For any events A and B, they satisfy the following formula

P(A∪B) = P(A) + P(B) - P(AB)

Proof: Because the complement is difficult to print, I directly added the picture of manual proof.

Insert image description here

1.7 Example

Toss two coins, let H1 be the first positive event, and H2 be the second positive event. If all outcomes are equally likely, then P(H1∪H2) = P(H1 ) + P(H2) - P(H1H1) = 1/2 + 1/2 -1/4 = 3/4

1.8 Probability Continuity Theorem (Theorem)

When n ⟶ ∞, if An ⟶ A, then P(An) ⟶ P(A)

The proof is as follows:

Insert image description here

1.4 Probability on Finite Sample Spaces

Assume that the sample space is finite Ω = {ω1,ω2,ω3,…}. For example, if we throw the dice twice, then the sample space Ω will have 36 elementsΩ = {(i,j);i,j∈{ 1,…6}}. If each result is equally likely, then P(A)= |A|/36. Where |A| represents the number of elements in event A

The probability that the sum of the dice points is 11 is 2/36, because there are only two outcomes corresponding to this event.

If Ω is finite and every outcome is equally likely, then

P(A)=|A|/|Ω|

This is called: **uniform probability distribution**.

In order to calculate the probability, you need to count the number of points of event A. The method of counting points is called combinatorial methods. We don't need to go into the details of combinatorial methods, but we need some The knowledge of calculation theory will be convenient for future use.

给定n个对象,排列这些对象的,排列个数有:n!=n(n-1)(n-2)(n-3)...1,其中0!=1.

We can also define it as follows,

Insert image description here

Pronounced as: n choose k, it indicates how many choices there are when k objects are selected from n objects.

For example, we have a class of 20 people and need to choose 3 people to be class committee members, then there are so many choices

Insert image description here

We may also observe the following properties

Insert image description here

1.5 Independent Events

If we toss a coin twice, the probability of both heads is 1/2*1/2=1/4. We multiply the probabilities directly because we think the two events are independent.

The formal definition of an independent event is as follows:

1.9 Definition of independent events

If P(AB) = P(A) * P(B), then A and B are independent, written as A∐B

A set of events is said to be independent if it satisfies the following formula

Insert image description here

If A and B are not independent, write in the following format

Insert image description here

Independence can arise in two different ways. The first is to explicitly assume that two events are independent. For example, in the process of two coin tosses, we assume that the tosses are independent. This independence reflects the fact that the coins do not Will remember the first throw.

Second, by verifying that P(AB)=P(A) P(B), it is inferred that A and B are independent of each other. For example, when throwing a uniform dice, let A={2,4, 6},B={1,2,3,4}. Then A∩B={2,4}.P(AB)=2/6=P(A)P(B)=(1/2) ( 2/3). Therefore we can conclude that A and B are independent. In this example, we do not assume that A and B are independent, in fact they are indeed independent.

Assuming that A and B do not intersect and the probability of each event is positive, are they independent?

The answer is, not independent. Because P(A)P(B)>0, but P(AB)=P(∅)=0

Therefore, through this example, it can be concluded that it is impossible to judge whether events are independent through the Venn diagram.

1.10 Example

Toss a coin 10 times, let A = at least one head, let Tj = the jth time there is a tail, then what is P(A)

P(A) = 1- P(A的补集)=1-P(所有都是反面)=1-P(T1T2..T10)
    = 1-P(T1)*P(T2)...P(T10)
    = 1- (1/2) * (1/2) ... * (1/2)
    ≈ 0.999

1.11 Example

Two people take turns shooting. The first person's probability of success is 1/3 and the second person's probability of success is 1/4. What is the probability that the first person succeeds before the second person?

Insert image description here

A summary of independence is given below

Insert image description here

1.6 Conditional Probability

If P(B)>0, we define that the probability of A occurring under the condition that B occurs is:

1.12 Definition of conditional probability

If P(B)>0, then, under the condition that B occurs, the conditional probability of A is:

P(A|B) = P(AB)/P(B)

Think of conditional probability as the proportion of cases in which B occurs, in which A occurs.

For any fixed B, P(B)>0, then P(·|B) is the probability (that is, it satisfies the three axioms mentioned above). In particular, P(A|B)>0, P( Ω|B)=1, if A1, A2,... do not intersect, then

Insert image description here

However, P(A|B∪C) = P(A|B) + P(A|C) usually does not hold, and the calculation rules of probability are usually applied to the left side of the vertical line, not the right side of the vertical line.

P(A|B) = P(B|A) is also usually not true. People are often confused by this rule. For example, if you have measles, the probability that you have erythema is 1, but there are conditions for you to have erythema. Next, your probability of getting measles is not 1.

In this case, the difference between P(A|B) and P(B|A) is obvious, but there are still some situations that make this difference very difficult to detect. This often happens in legal cases and is sometimes called :prosecutor's fallacy(Prosecutor's fallacy)

1.13 Example

For disease D, the drug test result is + or -, their probability is

x D complement of D
+ .009 .099
- .001 .891

According to the definition of conditional probability, we have

P(+|D)=P(+∩D)/P(D)=.009/(.009+.001)=.9

P(-|D's complement)=P(-∩(D's complement))/P(D's complement)=.891/(.891+.099)≈.9

Obviously, its test is very accurate. The probability of positive (+) when there is disease is .9, and the probability of negative (-) when there is no disease is about 0.9

Suppose, you took this test and it was positive (+), what is the probability that you have the disease? Most answers are probably 90%, but the correct answer is:

P(D|+)=P(D∩+)/P(+)=.009/(.009+.099)≈.08

The lesson here is that you need to work out your answers numerically, not through intuition

The following lemma comes directly from the definition of conditional probability

1.14 Lemma

If A and B are independent events, then P(A|B)=P(A), of course, for any A, B event pair, it satisfies

P(AB)= P(A|B)P(B) = P(B|A)P(A)

From the above lemma, we can see another explanation of independent events, that is, the occurrence of B does not change the probability of A. The formula P(AB)=P(A)P(B|A) is used to calculate a certain Particularly useful when some probabilities

1.15 Example

Draw two cards from a deck of cards without replacing them. Assume that A is the first event of drawing the club A, and B is the second event of drawing the diamond Q, then P(AB)=P§P(B |A)=(1/52)*(1/51)

A summary of conditional probabilities is given below

Insert image description here

1.7 Bayes' Theorem

Bayesian theory is the basis of expert systems and Bayes' nets. Bayesian networks are introduced in Chapter 17. First we need some preliminary theoretical results.

1.16 The Law of Total Probability Theorem or Total Probability Theorem (The Law of Total Probability)

Let Ω be A1, A2,...Ak. For any event B, P(B) is equal to

Insert image description here

Proof, as follows

Insert image description here

1.17 Bayes' Theorem

Let Ω be A1, A2,...Ak, satisfy P(Ai)>0, if P(B)>0, then for i = 1,...,k, we have

Insert image description here

1.18 Description

We call P(Ai) the prior probability of A, and P(Ai|B) the posterior probability of A.

Proof: We can apply the definition of conditional probability twice, as follows

Insert image description here

1.19 Example

I divided the emails into three categories, A1 = "Spam", A2 = "Low priority", A3 = "High priority". Based on experience, I got them The probability of,P(A1)=.7,P(A2)=.2,P(A3)=.1

Of course.7+.2+.1=1. Suppose event B is an email containing the word "free". Based on experience, we can get P(B|A1)=.9,P(B|A2)=.01,P( B|A3)=.01

Note:.9+.01+.01≠1

Now I receive an email with the word "free" in it, what is the probability that it is spam?

According to Bayes’ theorem:

P(A1|B)=(.9*.7)/{(.9*.7)+(.01*.2)+(.01*.1)}=.995

End of Chapter 1,

Untranslated, literature notes, appendices and homework

Guess you like

Origin blog.csdn.net/xiaowanbiao123/article/details/132861817