Statistics (2) Random Variables
Update history | date | content |
1 | 2023-9-19 | Correction: Example 2.46 |
2 | 2023-9-25 | Correction: The function has no solution and should be an unbounded function |
Contents of this chapter
- introduction
- Distribution Functions and Probability Functions
- Some important discrete random variables (Discrete Random Variables)
- Some important continuous random variables (Continuous Random Variables)
- Bivariate Distributions
- Marginal Distributions
- independent random variable
- Conditional Distributions
- Multivariate Distributions and Independent Simultaneous Distributions (IID)
- Two important multidimensional distributions
- Transformation of Random Variables
- Transformation of Multiple Random Variables
Some of the key terms do not convey the meaning, so the key terms are organized as follows:
1. Distribution Functions: Distribution Functions
2. Probability Functions:
3. Discrete Random Variables:
4. Continuous Random Variables:
5 . Two-dimensional distribution: Bivariate Distributions
6. Marginal distribution: Marginal Distributions
7. Conditional distribution: Conditional Distributions
8. Multidimensional distribution: Multivariate Distributions9. Cumulative distribution function: cumulative distribution function
10. normalized: normalized
11. Probability function:Probability function
12. Probability mass function:Probability mass function
13. Probability density function:Probability density funciton
14. continuous:continuous
15. Quantile function: quantile function
16. first quartile: first quartile
17. second quartile: second quartile
18. third quartile: third quartile
19. Median:median
20. Equal probability distribution: equal in distribution
21. Point Mass Distribution: Point Mass Distribution
22. Discrete Uniform Distribution: Discrete Uniform Distribution
23. Bernuolli Distribution:Bernuolli Distribution
24. Binomial Distribution:Binomial Distribution
25. Geometric Distribution: Geometric Distribution
26. Poisson Distribution:Possion Distribution
27. Normal Distribution:Normal Distribution
28. Gaussian Distribution: Gaussian Distribution
29. Standard Normal Distribution:Standard Normal Distribution
30. Exponential Distribution:Exponential Distribution
31. Gamma Distribution:Gamma Distribution
32. Beta Distribution: Beta Distribution
33. Cauchy Distribution: Cauchy Distribution
34. Marginal Distributions:Marginal Distributions
35. Marginal mass function:marginal mass function
36. Marginal density function:marginal denstiy function
37. Conditional Probability mass function: Conditional Probability mass function
38. Conditional Probability Density funciton: Conditional Probability Density funciton
40. Random Vector: Random Vector
41. Multinomial Distribution: Multinomial Distribution
42. Multivariate normal distribution or multidimensional normal distribution: Multivariate Normal
2.1 Introduction
Statistics and data mining are concerned with data, so how do we connect the sample space (sample) and events (events) with the data? This connection is provided by random variables (random variables)
2.1 Definition of random variables
A random variable is a mapping, expressed as , that for each outcome ω, there is a real number assigned to X(ω).
At certain stages in probability courses, we rarely mention sample space (sample space) anymore, but directly use random variables (random variables). But you should keep in mind that sample space (sample space) actually exists, and it implies Behind the random variables.
2.2 Example
Toss a coin ten times, let X(ω) be the number of heads in the sequence ω, for example, ω=HHTHHTHHTT, then X(ω)=6.
2.3 Example
Let be a unit circle and pick a random point in Ω (we will describe this idea more precisely later), usually in the format ω = (x, y). Then some examples of random variables are: X(ω) = x ,Y(ω) = y,Z(ω) = x+y or
Given a random variable X and a subset A of real numbers, define and let
Note: X is a random variable, x is the specific value of the random variable X
2.4 Example
Toss a coin twice and let X be the number of heads, then P(X=0)=P({TT})=1/4, P(X=1)=P({HT,TH})= 1/2, P(X=2)=P({HH})=1/4, random variables and their distribution can be summarized as follows:
oh | P({ω}) | X(ω) |
TT | 1/4 |
0 |
TH | 1/4 | 1 |
HT | 1/4 | 1 |
HH | 1/4 | 2 |
x | P(X=x) |
0 | 1/4 |
1 | 1/2 |
2 | 1/2 |
2.2 Distribution functions and Probability functions
Given a random variable X, we define its cumulative distribution function or distribution functions as follows:
2.5 CDF definition
The cumulative distribution function, or CDF, is defined as
Later, we will see that the CDF effectively contains all information about the random variable. Sometimes we use instead .
2.6 Example
Toss a coin twice, let X be the number of heads. Then , the distribution function is:
The corresponding function graph is as follows:
Although this example is very simple, study it carefully because the properties of CDF can be confusing.
Note that this function is right-continuous and non-decreasing. Although x only takes 0, 1, and 2, it is defined for all real numbers. Do you understand why ?
The following theorem shows that CDF completely determines the distribution of random variables
2.7 Theorem
Suppose X has a cumulative distribution function (CDF) F, and Y has a cumulative distribution function (CDF) G. If for all x, it is satisfied , then for all A, then
Translator's Note: The above theorem can be regarded as CDF determining the probability distribution
2.8 Theorem
F is a mapping on [0,1]. If and only if F satisfies the following three conditions, F is the cumulative distribution function of a certain probability P.
- F is non-increasing: x1<x2, then F(x1) <= F(x2)
- F has been normalized: ,
- F is right continuous: for all x, , where
prove:
Assuming that F is a CDF, let us prove that the third point is true.
Let x be a real number;
y1,y1,... is a sequence of real numbers that satisfies y1>y2>...and .
So,
Then we get, ,
so,
so,
Certificate completed
The first point is similar to the second point.
To prove the other direction--that is, if F satisfies the first, second and third points, proving that F is the CDF of a certain probability P--you need to use deeper tools in the field of analysis.
2.9 Definition of probability function or probability mass function
If the random variable X has finite values and is discrete, then the probability function or probability mass function of X is defined as:
.
Therefore, for all , there is and . Sometimes we simply use instead .
The relationship between CDF and is:
2.10 Example
The probability function of the 2.6 example is
See below
2.11 Definition of probability density function
For a continuous random variable
Therefore, we can get the sum , which is differentiable at all x points .
Sometimes, we use or to express
2.12 Example
Let the probability density function PDF of the random variable X be as follows
Obviously, and . Then the random variable with this kind of PDF is called Uniform(0,1) distribution. The concept of Uniform(0,1) distribution means that a point is randomly selected in the [0,1] interval.
Then CDF is:
As shown below:
2.13 Example
If the random variable X has the following PDF:
Because , this is a PDF that satisfies the definition
Warning; Continuous random variables can cause confusion.
First of all, it should be noted that if
Secondly, please note that PDF can be greater than 1 (this is different from probability mass function), for example:
It can be obtained, and , therefore, it is a PDF that satisfies the definition, but it can be f(x)=5 in some intervals. In fact, PDF can be unbounded, for example: , it can be obtained, and therefore it is also a PDF that satisfies the definition . PDF, but it is an unbounded function.
2.14 Example
Assume that this is not a PDF because:
2.15 Lemma
Assume F is the CDF of random variable X, then:
- If X is continuous, then
This is useful for defining the inverse function (or quantile function) of a CDF.
2.16 Definition of the inverse function or quantile function of CDF
Suppose X is a random variable with a cumulative distribution function F. Then the inverse function or quantile function of CDF is defined as:
,among them ,
If F is strictly increasing and continuous, then there is a unique real number x such that
We will call it: the first quartile; we will call it the median or second quartile; we will call it the third quartile
Two random variables X and Y, which are equal in distribution, can be written as . If for all x, there is , this does not mean that X and Y are equal. It only means that X and Y has the same probability state. For example, let Y=-X, then we get , so , but X and Y are not equal. In fact
2.3 Some important discrete random variables
A warning about the notation: the probability distribution function representing the random variable When you see this symbol, you should think of it as: the random variable X satisfies the distribution F, rather than as, X is approximately F
Point Mass Distribution: If the probability satisfies the following conditions, then the random variable X has a Point Mass Distribution at a, written as ,:
,
Then , the probability mass function is
Discrete Uniform Distribution: Assume k>1 is an integer, and assume that X has the following probability mass function:
Then we say that X has a uniform distribution on {1,...k}
Bernuolli Distribution: Let X represent a coin toss, then P(X=1)=p, P(X=0)=1-p, where p is between [0,1], we It is said that X has a Bernoulli Distribution (Bernoulli Distributtion), written as . Then its probability function
Binomial Distribution (Binomial Distribution) : Assume that the probability of the coin landing on heads is p. Toss the coin n times, let X be the number of heads, assuming that each toss is independent, let be its mass function , then it expands as follows:
A random variable with such a mass function is called a binomial random variable and is written . If , then
Warning : Let us take this opportunity to prevent some confusion. X represents a random variable, and x represents the specific value of the random variable; n and p are parameters, that is, fixed real numbers. The parameter p is usually unknown and must be obtained from the data , this is also the content of statistical inference. In most statistical models, there are both random variables and parameters, so do not confuse them.
Geometirc Distribution : If X has the following probability function, then the random variable X obeys the geometric distribution with parameter p, written as :
What we can get:
Think of X as the number of times it takes to get the first heads when tossing a coin.
Possion Distribution : If the probability mass function is as follows, then the random variable X obeys the Poisson distribution with parameter λ. Writing: :
Notice:
The Poisson distribution is often used as a model for rare events, such as radiation attenuation and traffic accidents. If then
Warning : We defined a random variable as: a mapping from the sample space Ω to the real number R, but we did not mention the sample space in the distribution above. As we mentioned earlier, the sample space often "disappears" ", but it still exists behind the scenes. Let us explicitly construct a Bernoulli random variable, let Ω=[0,1], and define P to satisfy P([a,b])=ba, where 0 < = a <= b <= 1. Take p as a fixed value on [0,1] and define:
Then, P(X=1)=P(ω<=p)=P([0,p])=p and P(X=0)=1-p. Therefore, X obeys Bernoulli distribution, written. We This is not done for all the distributions above. In fact, we treat the random variable as a random number, but formally it is a mapping defined in the sample space.
2.4 Some important continuous random variables
Uniform Distribution : If X has the following probability density function, then X satisfies the uniform distribution, written as :
When a<b, the distribution function is:
Normal (Gaussian) Distribution (Normal Distribution, or Gaussian Distribution) : If the probability density function satisfies the following, then X satisfies the Normal Distribution (Normal Distribution) of the parameters μ and σ
Here, μ is a real number R, σ > 0.
The parameter μ is the center (or mean) of the distribution, and σ is the dispersion (or standard deviation) of the distribution. (The mean and standard deviation will be defined in the next chapter). The normal distribution plays an important role in probability theory and statistics. The role of . Many phenomena in nature are also approximate to the normal distribution. Later we will learn the Central Limit Theorem (Center Limit Theorem), which shows that the distribution of the sum of random variables can be approximated by the normal distribution.
If μ=0, σ=1, it is called the standard normal distribution. Traditionally, the standard normal random variable is represented by Z, and its PDF and CDF are represented by sum. The PDF image is as follows :
Some useful conclusions are given below:
- If , then
- If , then
- If ,i=1,...n are independent, then
It can be pushed from 1 to out:
Therefore, as long as we can calculate the CDF of the standard normal, we can calculate any probability. All statistical packages can calculate the sum . Most statistics textbooks, including this book, have a table of values .
2.17 Example
Assume and seek . The solution is:
Now , this means that we need to find q to satisfy P(X<q)=0.2. The solution is as follows:
From the standard table, . Therefore, we get q=1.1181
Exponential Distribution : If the probability density function satisfies the following, then X satisfies the exponential distribution (Exponential Distribution) with parameter β, written as:
The exponential distribution is used to model the life cycle of electronic components, as well as the waiting time between rare events.
Gamma Distribution : For α>0, the gamma function is defined as: . If the probability density function satisfies the following, then X is said to satisfy the gamma distribution with parameters α and β, written as :
Exponential Distribution is Gamma(1,β) distribution. If it is independently distributed, then it satisfies
Beta Distribution : If f(x) satisfies the following conditions, then X satisfies the beta distribution with parameters α>0 and β>0, written as: :
t and Cauchy Distribution (t and Cauchy Distribution) : If f(x) satisfies the following conditions, then X is said to satisfy the t distribution with degree of freedom v, written as: :
The t distribution is similar to the normal distribution, but it has thicker tails. In fact, the normal distribution corresponds to the t distribution with degrees of freedom v=∞. The Cauchy distribution corresponds to the t distribution with v=1. Probability density function for:
In order to figure out that this is indeed a density function, aligning its integrals gives:
distribution (chi-square distribution): If f(x) satisfies the following, then X is a chi-square distribution (χ^2 distribution) with p degrees of freedom, written as:
If Z1, Z2,...Zp are independent standard normal random variables, then we have
2.5 Bivariate Distributions
Given a set of discrete random variables X and Y, define the joint mass function as: , from now on, it will be written as . When a more complex formula is required, it will be written directly as
2.18 Example
There are two two-dimensional distributions of random variables X and Y, and their values are 0 or 1.
Y=0 | Y=1 | ||
X=0 | 1/9 | 2/9 | 1/3 |
X=1 | 2/9 | 4/9 | 2/3 |
1/3 | 2/3 |
Therefore, f(1,1)=P(X=1,Y=1)=4/9
2.19 PDF definition of two-dimensional random variables
In the continuous case, if the following three conditions are met, the function f(x,y) is said to be the PDF of the variable (X,Y)
- For all , there are
- and
- For any set , there is
In the discrete and continuous cases, we define the joint CDF as
2.20 Example
Assuming that (X, Y) is uniform within the unit square, then there is:
求P(X<1/2,Y<1/2).
The event A={X<1/2,Y<1/2} corresponds to a subset of the unit square. In this case, integrate f over this subset. Find the area of A to be 1/4 .So P(X<1/2,Y<1/2)=1/4
2.21 Example
Assume (X,Y) has the following probability density function:
Then we get:
At this time, it can be proved that f(x,y) is PDF
2.22 Example
If the distribution is defined over a non-rectangular area, then the calculation is a bit more complicated. Here is an example, cited in DeGroot and Schervish (2002). Suppose (X, Y) has the following density function:
Please note that, now let us find the value of c.
The key here is to pay attention to the range of the integral. We choose a variable, such as x, and let it change within its range of values. Then, for each fixed value of x, we let y vary within its range, that is, x^2 ≤ y ≤ 1. If you look at the image below it may help you
Therefore therefore c=21/4
Now let us calculate P(X>=Y). The corresponding set is: A={(x,y);0<=x<=1,x^2<=y<=x} Therefore
2.6 Marginal Distributions
2.23 Definition
If (X, Y) satisfies the joint distribution and its mass function is . Then the marginal mass function for x is defined as:
The marginal mass function for y is defined as:
2.24 Example
If given by the following table. For the marginal distribution of X, it is the sum of the rows, and for the marginal distribution of Y, it is the sum of the columns.
Y=0 | Y=1 | ||
X=0 | 1/10 | 2/10 | 3/10 |
X=1 | 3/10 | 4/10 | 7/10 |
4/10 | 6/10 |
available ,
2.25 Definition
For continuous random variables, the marginal density function is defined as:
Then the corresponding marginal distribution function is expressed by
2.26 Example
Assume , then
2.27 Example
If: , then we can get
2.28 Example
Let (X,Y) have the following density function
Therefore, it can be obtained
2.7 Independent Random Variable
2.29 Definition
There are two random variables X and Y. If for all A and B, there is , then we say that X and Y are independent. Writing . Otherwise, then X and Y are said to be related, writing, as follows (picture)
In principle, to check whether X and Y are independent, we need to check all A and B subsets according to the formula in the definition. But fortunately, we have the following conclusions to use, although these conclusions are based on continuous random variable representation, but it also applies to discrete random variables
2.30 Theorem
Assume that X and Y have a joint PDF if and only if it holds for all x and y, then
2.31 Example
Assume that X and Y have the following distribution
Y=0 | Y=1 | ||
X=0 | 1/4 | 1/4 | 1/2 |
X=1 | 1/4 | 1/4 | 1/2 |
1/2 | 1/2 | 1 |
Then , and.X and Y are independent because , , , .
If X and Y have the following distribution
Y=0 | Y=1 | ||
X=0 | 1/2 | 0 | 1/2 |
X=1 | 0 | 1/2 | 1/2 |
1/2 | 1/2 | 1 |
Then X and Y are not independent, because but
2.32 Example
Assume that X and Y are independent and have the same density function, as follows:
.
Let's find it . Using independence, the joint density function is:
have to:
The following conclusions help to verify the independent
2.33 Theorem
Assuming that the range of X and Y is a rectangle (possibly unbounded), if for functions g and h (not necessarily probability density functions), it is satisfied, then X and Y are independent .
2.34 Example
Let X and Y have the following density functions:
The range of X and Y is a rectangle . You can also write f(x,y) as f(x,y)=g(x)h(y). Among them, , .so
2.8 Conditional Distribution
If X and Y are discrete, then we can compute the conditional distribution of X in the case Y = y. Specifically , this leads us to define the conditional probability mass function as follows
2.35 Definition of conditional probability mass function
If, the conditional probability mass function is defined as follows:
For continuous distributions, we use the same definition. The difference in explanation is: in the discrete case, the conditional probability mass function is the conditional probability. In the continuous case, the probability must be obtained by integration.
2.36 Definition of conditional probability density function
For continuous random variables, the conditional probability density function is defined as follows: If ,
Then, the probability is:
2.37 Example
Assume that X and Y have a joint uniform distribution on the unit square. Therefore, below , . is 0 elsewhere. Given Y=y, X is a Uniform(0,1) distribution .We can write:
From the definition of conditional density: . This is very useful in some cases, such as Example 2.39
2.38 Example
Assume: .find
In the 2.27 example, it can be obtained . Therefore:
so,
2.39 Example
If X obeys . After obtaining the X value, the resulting Y obeys . Then what is the marginal distribution function of Y?
First, , and therefore we have
Then the edge density function of Y is:
,in
2.40 Example
Think about the density function in Example 2.28 and find .
When _ _ _
Ask now
2.9 Multivariate Distributions And IID
Let X=(X1 , X2...Xn), where X1, Distribution, conditional distribution, most of which are similar to two-dimensional situations.
If for each A1, A2,...An, there is , then X1, X2...Xn are independent. Just pass the check.
2.41 IID definition
If X1, X2,...Xn are independent of each other and have the same cumulative distribution function (CDF) F, we say that X1, X2, ...
If the density function of F is f, it can also be written . We also call X1,...Xn n random samples of size n from F
Much of statistical theory and practice is based on independently identically distributed (IID) observational data, and we will look at this in detail when we discuss statistics.
2.10 Two important multidimensional distributions
Multinomial (multinomial distribution) : The multidimensional version of the binomial distribution is called a multidimensional distribution. Consider extracting 1 small ball from a box containing k different colors. These small balls are marked: "color1, color2.. ..colork". Let p=(p1,...pk), where pj>=0, and let pj be the probability that the color of the drawn ball is j. Draw n times (independent sampling with replacement) and let X=(X1,X2..Xk) where Xj represents the number of times color j appears. Therefore , at this time we say that
in,
2.42 Lemma
If , among them, X=(X1,X2..Xk), p=(p1,p2...pk).The marginal distribution of
Multivariate Normal (multidimensional normal distribution or multivariate normal distribution): The one-dimensional normal distribution has two parameters, μ and σ. In the multidimensional version, μ is a vector and σ is a matrix Σ.
now order
Among them, and are independent of each other. Then the density function of Z is:
We say that Z conforms to the standard multivariate normal distribution, written as: , where 0 represents a vector with k 0 elements. The capital I represents the identity matrix.
More generally, if the vector X has the following density function, then X is a multidimensional normally distributed vector, denoted as:
Which represents the determinant of Σ. μ is a vector of length k. Σ is a symmetric positive definite matrix. If μ = 0, Σ = I, it becomes a standard multidimensional normal distribution
Because Σ is a symmetric, positive definite matrix. Therefore there exists a matrix —called the square root of Σ—that satisfies the following properties:
- Also symmetrical
- ,in
2.43 Theorem
If and , then conversely, if then
Assuming that the random normal vector X is divided into X = (Xa, Xb), then μ can be written as μ = (μa, μb), and Σ can be written as
2.44 Theorem
Suppose , then
- The marginal distribution of Xa satisfies:
- The conditional distribution of Xb under the condition of Xa=xa is:
- If a is a vector, then
2.11 Transformation of random variables
Suppose X is a random variable whose CDF is and PDF is . Let be a function of The mass function of Y is as follows:
2.45 Example
If P(X=-1)=P(X=1)=1/4, P(X=0)=1/2. Let Y=X^2. Then P(Y=0)=P(X= 0)=1/2,P(Y=1)=P(X=1)+P(X=-1)=1/2. As follows:
x | |
-1 | 1/4 |
0 | 1/2 |
1 | 1/4 |
y | |
0 | 1/2 |
1 | 1/2 |
Y has fewer values than X because the conversion is not one-to-one.
For continuous situations, it is more complicated. Here are the following three steps to find
- For each y, find the set
- Then find the CDF:
3. PDF is the derivative of CDF:
2.46 Example
Let , x>0. Therefore . Suppose . Then then
therefore
2.47 Example
Assume that the density function of PDF.X is:
Y can only take values between (0,9), consider two situations: the first, 0<y<1; the second, 1<= y < 9.
For the first case, .
For the second case, ,
Taking the derivative of F we get:
When r is strictly monotonically increasing or monotonically decreasing, then r has its inverse function, , in this case the density function can be expressed as:
2.12 Transformation of Multiple Random Variables
In some cases, we are interested in transformations of multiple random variables. For example, if X and Y are given random variables. We might know the distribution of X/Y, X+Y, max{X,Y} .Let Z=r(X,Y) be the function we are interested in. Then the steps to find it are similar to the previous ones:
- For each z, find the set
- Find the CDF:
3. Then derive its derivative;
2.48 Example
Assume and are independent. Find the density function
The joint density function of (X1,X2) is:
Order , get:
Now comes the hard part: finding .
First assume , then it is a triangle surrounded by (0,0), (y,0), (0,y). As shown below
In this case, the area of the triangle is
Assume again , then it is all the areas except the triangle surrounded by (1, y - 1), (1, 1), (y - 1,1). The area of this part is. Therefore
Derive it and get PDF
End of this chapter
Untranslated: Appendix, homework