All of Statistics Chapter 2

Statistics (2) Random Variables

Update history date content
1 2023-9-19 Correction: Example 2.46F_X(x)=1-e^{-x}
2 2023-9-25 Correction: The function has no solution and should be an unbounded function

Contents of this chapter

  1. introduction
  2. Distribution Functions and Probability Functions
  3. Some important discrete random variables (Discrete Random Variables)
  4. Some important continuous random variables (Continuous Random Variables)
  5.  Bivariate Distributions
  6. Marginal Distributions
  7. independent random variable
  8. Conditional Distributions
  9. Multivariate Distributions and Independent Simultaneous Distributions (IID)
  10. Two important multidimensional distributions
  11. Transformation of Random Variables
  12. Transformation of Multiple Random Variables

Some of the key terms do not convey the meaning, so the key terms are organized as follows:

1. Distribution Functions: Distribution Functions
2. Probability Functions:
3. Discrete Random Variables:
4. Continuous Random Variables:
5 . Two-dimensional distribution: Bivariate Distributions
6. Marginal distribution: Marginal Distributions
7. Conditional distribution: Conditional Distributions
8. Multidimensional distribution: Multivariate Distributions

9. Cumulative distribution function: cumulative distribution function

10. normalized: normalized

11. Probability function:Probability function

12. Probability mass function:Probability mass function

13. Probability density function:Probability density funciton

14. continuous:continuous

15. Quantile function: quantile function

16. first quartile: first quartile

17. second quartile: second quartile

18. third quartile: third quartile

19. Median:median

20. Equal probability distribution: equal in distribution

21. Point Mass Distribution: Point Mass Distribution

22. Discrete Uniform Distribution: Discrete Uniform Distribution

23. Bernuolli Distribution:Bernuolli Distribution

24. Binomial Distribution:Binomial Distribution

25. Geometric Distribution: Geometric Distribution

26. Poisson Distribution:Possion Distribution 

27. Normal Distribution:Normal Distribution

28. Gaussian Distribution: Gaussian Distribution

29. Standard Normal Distribution:Standard Normal Distribution

30. Exponential Distribution:Exponential Distribution

31. Gamma Distribution:Gamma Distribution

32. Beta Distribution: Beta Distribution

33. Cauchy Distribution: Cauchy Distribution

34. Marginal Distributions:Marginal Distributions

35. Marginal mass function:marginal mass function

36. Marginal density function:marginal denstiy function

37. Conditional Probability mass function: Conditional Probability mass function

38. Conditional Probability Density funciton: Conditional Probability Density funciton

40. Random Vector: Random Vector

41. Multinomial Distribution: Multinomial Distribution

42. Multivariate normal distribution or multidimensional normal distribution: Multivariate Normal

2.1 Introduction

Statistics and data mining are concerned with data, so how do we connect the sample space (sample) and events (events) with the data? This connection is provided by random variables (random variables)

2.1 Definition of random variables

A random variable is a mapping, expressed as  X: \Omega \rightarrow \mathbb{R}, that for each outcome ω, there is a real number assigned to X(ω).

At certain stages in probability courses, we rarely mention sample space (sample space) anymore, but directly use random variables (random variables). But you should keep in mind that sample space (sample space) actually exists, and it implies Behind the random variables.

2.2 Example

Toss a coin ten times, let X(ω) be the number of heads in the sequence ω, for example, ω=HHTHHTHHTT, then X(ω)=6.

2.3 Example

Let \Omega =\left \{ (x,y); x^{2} + y^{2} \leqslant 1\right \}be a unit circle and pick a random point in Ω (we will describe this idea more precisely later), usually in the format ω = (x, y). Then some examples of random variables are: X(ω) = x ,Y(ω) = y,Z(ω) = x+y orW(\omega )=\sqrt{x^{2}+y^{2}}

Given a random variable X and a subset A of real numbers, define X^{-1}(A)=\left \{ \omega \in \Omega : X(\omega) \in A \right \}and let

\mathbb{P}(X\in A) = \mathbb{P}(X^{-1}(A))=\mathbb{P}(\left \{ \omega \in \Omega; X(\omega) \in A \right \})

\mathbb{P}(X=x)=\mathbb{P}(X^{-1}(A))=\mathbb{P}(\left \{\omega \in \Omega;X(\omega)=x\right \})

Note: X is a random variable, x is the specific value of the random variable X

2.4 Example

Toss a coin twice and let X be the number of heads, then P(X=0)=P({TT})=1/4, P(X=1)=P({HT,TH})= 1/2, P(X=2)=P({HH})=1/4, random variables and their distribution can be summarized as follows:

oh P({ω}) X(ω)
TT

1/4

0
TH 1/4 1
HT 1/4 1
HH 1/4 2

x P(X=x)
0 1/4
1 1/2
2 1/2

2.2 Distribution functions and Probability functions

Given a random variable X, we define its cumulative distribution function or distribution functions as follows:

2.5 CDF definition

The cumulative distribution function, or CDF, F_{x}:\mathbb{R}\rightarrow [0,1]is defined as

F_{X}(x)=P(X\leqslant x)

Later, we will see that the CDF effectively contains all information about the random variable. Sometimes we use Finstead F_{x}.

2.6 Example

 Toss a coin twice, let X be the number of heads. Then \mathbb{P}(X=0)=\frac{1}{4},\mathbb{P}(X=1)=\frac{1}{2},\mathbb{P}(X=2)=\frac{1}{4}, the distribution function is:

F_X(x)=\left\{\begin{matrix} 0 & ,x<0\\ \frac{1}{4} & ,0\leqslant x < 1\\ \frac{3}{4} & ,1 \leqslant x < 2\\ 1 & ,x \geqslant 2 \end{matrix}\right.

The corresponding function graph is as follows:

Although this example is very simple, study it carefully because the properties of CDF can be confusing.

Note that this function is right-continuous and non-decreasing. Although x only takes 0, 1, and 2, it is defined for all real numbers. Do you understand why F_X(1.4)=.75?

The following theorem shows that CDF completely determines the distribution of random variables

2.7 Theorem

Suppose X has a cumulative distribution function (CDF) F, and Y has a cumulative distribution function (CDF) G. If for all x, it is satisfied F(x)=G(x) , then for all A, then\mathbb{P}(X\in A) = \mathbb{P}(Y \in A) 

Translator's Note: The above theorem can be regarded as CDF determining the probability distribution

2.8 Theorem

F is a mapping on [0,1]. If and only if F satisfies the following three conditions, F is the cumulative distribution function of a certain probability P.

  1. F is non-increasing: x1<x2, then F(x1) <= F(x2)
  2. F has been normalized: \lim_{x\rightarrow -\infty }F(x)=0,\lim_{x\rightarrow \infty }F(x)=1
  3. F is right continuous: for all x, F(x)=F(x^{+}), whereF(x^+)=\lim_{\begin{matrix} y\to x\\ y\geq x \end{matrix}}F(y)

prove:

Assuming that F is a CDF, let us prove that the third point is true.

        Let x be a real number;

        y1,y1,... is a sequence of real numbers that satisfies y1>y2>...and \lim_iy_i=x .

        So,A_i=(-\infty,y_i],A=(-\infty,x]

        Then we get, A= \bigcap_{i=1}^{\infty}Ai,A1\supset A2\supset A3 ...

        so,\lim_iP(A_i)=P(\bigcap_iA_i)

        so,F(x)=P(A)=P(\bigcap_iA_i)=\lim_iP(A_i)=\lim_iF(y_i)=F(x^+)

        Certificate completed

The first point is similar to the second point.

To prove the other direction--that is, if F satisfies the first, second and third points, proving that F is the CDF of a certain probability P--you need to use deeper tools in the field of analysis.

2.9 Definition of probability function or probability mass function

 If the random variable X has finite values ​​and is discrete, then the probability function or probability mass function of X is defined as:

f_X(x)=P(X=x).

Therefore, for all x\in R, there is f_X(x) \geq 0and \sum_if(x_i)=1. Sometimes we simply use finstead f_X.

The relationship between CDF and f_Xis:

F_X(x)= P(X\leq x) = \sum_{x_i\leq x}f_X(xi)

2.10 Example

The probability function of the 2.6 example is

f_X(x)=\left\{\begin{matrix} 1/4 & x=0 \\ 1/2 & x=1\\ 1/4& x=2 \\ 0 & otherwise \end{matrix}\right.

See below

 

2.11 Definition of probability density function

f_X(x)For a continuous random f_X(x)\geq 0variable\int_{-\infty}^{\infty}f_X(x)=1a \leq b,\mathbb{P}(a < x <b)=\int_a^bf_X(x)dxf_X(x)

Therefore, we can get F_X(x) = \int_{-\infty}^xf_X(x)dxthe sum , which is differentiable f_X(x)={F}'_X(x)at all x points .F_X

Sometimes, we use \int f(x)dxor \int fto express\int_{-\infty}^\infty f(x)dx

2.12 Example

Let the probability density function PDF of the random variable X be as follows

f_X(x)=\left\{\begin{matrix} 1 & for 0 \leq x \leq 1 \\ 0 & otherwise \end{matrix}\right.

Obviously, f_X(x)\geq 0 and  \int f_X(x)dx=1. Then the random variable with this kind of PDF is called Uniform(0,1) distribution. The concept of Uniform(0,1) distribution means that a point is randomly selected in the [0,1] interval.

Then CDF is:

F_X(x)=\left\{\begin{matrix} 0 & x <0 \\ x & 0 \leq x \leq 1\\ 1 & x > 1 \end{matrix}\right.

As shown below:

2.13 Example 

 If the random variable X has the following PDF:

f(x)=\left\{\begin{matrix} 0 & for x < 0\\ \frac{1}{(1+x)^2} & otherwise \end{matrix}\right.

Because \int f(x)dx=1, this is a PDF that satisfies the definition

Warning; Continuous random variables can cause confusion.

\mathbb{P}(X=x)=0First of all, it should be f(x)noted \mathbb{P}(X=x)that if

Secondly, please note that PDF can be greater than 1 (this is different from probability mass function), for example:

f(x)=\left\{\begin{matrix} 5 & x \in [0,1/5]\\ 0 & otherwise \end{matrix}\right.It can be obtained, f(x) = 0and \int f(x)dx =1, therefore, it is a PDF that satisfies the definition, but it can be f(x)=5 in some intervals. In fact, PDF can be unbounded, for example: , it can be obtained, and therefore it is also a PDF that satisfies the f(x)=\left\{\begin{matrix} \frac{2}{3}x^{-\frac{1}{3}} & 0 < x < 1\\ 0 & otherwise \end{matrix}\right.definition \int f(x)dx =1. PDF, but it is an unbounded function.

2.14 Example

Assume  f(x)=\left\{\begin{matrix} 0 & x < 0 \\ \frac{1}{(1+x)} & otherwise \end{matrix}\right.that this is not a PDF because:

\int f(x)dx=\int_0^\infty dx/(1+x)=\int_1^\infty du/u = log(\infty) = \infty

2.15 Lemma

Assume F is the CDF of random variable X, then:

  1. \mathbb{P}(X=x)=F(x)-F(x^-),where F(x^-)=\lim_{y \to x}F(y)
  2. \mathbb{P}(x < X \leq y) =F(y) -F(x)
  3.  \mathbb{P}(X > x) = 1- F(x)
  4. If X is continuous, thenF(b)-F(a)= \mathbb{P}(a < X < b) = \mathbb{P}(a \leq X < b)=\mathbb{P}(a < X \leq b)=\mathbb{p}(a \leq X \leq b) 

 This is useful for defining the inverse function (or quantile function) of a CDF.

2.16 Definition of the inverse function or quantile function of CDF

Suppose X is a random variable with a cumulative distribution function F. Then the inverse function or quantile function of CDF is defined as:

F^{-1}(q)=inf\left \{ x:F(x) > q \right \},among them q \in [0,1],

If F is strictly increasing and continuous, then F^{-1}(q)there is a unique real number x such thatF(x)=q

We will F^{-1}(1/4)call it: the first quartile; we will F^{-1}(1/2)call it the median or second quartile; we will F^{-1}(3/4)call it the third quartile

Two random variables X and Y, which are equal in distribution, can be written as X \overset{\text{d}}{=} Y. If for all x, there is F_X(x)=F_Y(x), this does not mean that X and Y are equal. It only means that X and Y has the same probability state. For example, let \mathbb{P}(X=1)=\mathbb{P}(X=-1)=1/2Y=-X, then we get \mathbb{P}(Y=1)=\mathbb{P}(Y=-1)=1/2, so X \overset{d}{=} Y, but X and Y are not equal. In fact\mathbb{P}(X=Y)=0

2.3 Some important discrete random variables

X \sim FX \sim FA warning about the notation: the probability distribution function representing the random variable When you see this symbol, you should think of it as: the random variable X satisfies the distribution F, rather than as, X is approximately F

Point Mass Distribution: If the probability satisfies the following conditions, then the random variable X has a Point Mass Distribution at a, written as X \sim \delta_a,:

\mathbb{P}(X=a)=1,

Then F(x)=\left\{\begin{matrix} 0 & x < a\\ 1 & x \geq a \end{matrix}\right. , the probability mass function is

f(x)=\left\{\begin{matrix} 1 & x = a\\ 0 & otherwise \end{matrix}\right.

Discrete Uniform Distribution: Assume k>1 is an integer, and assume that X has the following probability mass function:

f(x)=\left\{\begin{matrix} 1/k & x=1,2,...k\\ 0 & otherwise \end{matrix}\right.

Then we say that X has a uniform distribution on {1,...k}

Bernuolli Distribution: Let X represent a coin toss, then P(X=1)=p, P(X=0)=1-p, where p is between [0,1], we It is said that X has a Bernoulli Distribution (Bernoulli Distributtion), written as X\sim Bernoulli(p). Then its probability functionf(x)=P^x(1-p)^{1-x},x \in \left \{ 0,1 \right \}

Binomial Distribution (Binomial Distribution) : Assume that the probability of the coin landing on heads is p.  0 \leq p \leq 1Toss the coin n times, let X be the number of heads, assuming that each toss is independent, let be f(x)=\mathbb{P}(X=x)its mass function , then it expands as follows:

f(x)=\left\{\begin{matrix} \binom{n}{x}p^x(1-p)^{n-x} & x=0,...n\\ 0 & otherwise \end{matrix}\right.

A random variable with such a mass function is called a binomial random variable and is written X\sim Binomial(n,p). If X_1\sim Binomial(n_1,p), X_2 \sim Binomial(n_2,p), thenX_1 + X_2 \sim Binomial(x_1+x_2,p)

Warning : Let us take this opportunity to prevent some confusion. X represents a random variable, and x represents the specific value of the random variable; n and p are parameters, that is, fixed real numbers. The parameter p is usually unknown and must be obtained from the data , this is also the content of statistical inference. In most statistical models, there are both random variables and parameters, so do not confuse them.

Geometirc Distribution : If X has the following probability function, then the random variable X obeys the geometric distribution with parameter p, written as X \sim Geom(p):

\mathbb{P}(X=k)=p(1-p)^{k-1},k \geq

What we can get:

\sum_{k=1}^{\infty}P(X=k)= p\sum_{k=1}^{\infty}(1-p)^k=\frac{p}{1-(1 -p)}=1

Think of X as the number of times it takes to get the first heads when tossing a coin.

Possion Distribution : If the probability mass function is as follows, then the random variable X obeys the Poisson distribution with parameter λ. Writing: X\sim Poisson( \lambda ):

f(x)=e^{- \lambda } \frac{\lambda ^ x}{x!}, x \geq 0

Notice:

\sum_{x=0}^{\infty}f(x)=e^{-\lambda}\sum_{x=0}^{\infty}\frac{\lambda^x}{x!}=e^{-\lambda}e^{\lambda}=1

The Poisson distribution is often used as a model for rare events, such as radiation attenuation and traffic accidents. If X_1 \sim Poisson(\lambda_1),X_2 \sim Poisson(\lambda_2)thenX_1+X_2 \sim Poisson(\lambda_1+\lambda_2)

Warning : We defined a random variable as: a mapping from the sample space Ω to the real number R, but we did not mention the sample space in the distribution above. As we mentioned earlier, the sample space often "disappears" ", but it still exists behind the scenes. Let us explicitly construct a Bernoulli random variable, let Ω=[0,1], and define P to satisfy P([a,b])=ba, where 0 < = a <= b <= 1. Take p as a fixed value on [0,1] and define:

X(\omega )=\left\{\begin{matrix} 1 & \omega \leq p\\ 0 & \omega > p \end{matrix}\right.

Then, P(X=1)=P(ω<=p)=P([0,p])=p and P(X=0)=1-p. Therefore, X obeys Bernoulli distribution, written. X\sim Bernoulli(p)We This is not done for all the distributions above. In fact, we treat the random variable as a random number, but formally it is a mapping defined in the sample space.

2.4 Some important continuous random variables

Uniform Distribution : If X has the following probability density function, then X satisfies the uniform distribution, written as X \sim Uniform (a,b):

f(x)=\left\{\begin{matrix} \frac{1}{b-a} & x \in [a,b]\\ 0 & otherwise \end{matrix}\right.

When a<b, the distribution function is:

F(x)=\left\{\begin{matrix} 0 & x < a\\ \frac{x-a}{b-a} & x \in [a,b]\\ 1 & x>b \end{matrix}\right.

Normal (Gaussian) Distribution (Normal Distribution, or Gaussian Distribution) : If the probability density function satisfies the following, then X satisfies the Normal Distribution (Normal Distribution) of the parameters μ and σ

f(x)=\frac{1}{\sigma \sqrt{2\pi}} exp\left \{ -\frac{1}{2\sigma^2}(x-\mu )^2 \right \}

Here, μ is a real number R, σ > 0.

The parameter μ is the center (or mean) of the distribution, and σ is the dispersion (or standard deviation) of the distribution. (The mean and standard deviation will be defined in the next chapter). The normal distribution plays an important role in probability theory and statistics. The role of . Many phenomena in nature are also approximate to the normal distribution. Later we will learn the Central Limit Theorem (Center Limit Theorem), which shows that the distribution of the sum of random variables can be approximated by the normal distribution.

If μ=0, σ=1, it is called the standard normal distribution. Traditionally, the standard normal random variable is represented by Z, and its PDF and CDF are represented by sum. The PDF image is as \phi (z)follows \Phi (z):

Some useful conclusions are given below:

  1. If X\sim N(\mu,\sigma^2), thenZ=(X-\mu)/\sigma \sim N(0,1)
  2. If Z \sim N(0,1), thenX=\mu+\sigma Z \sim N(\mu,\sigma ^2)
  3. If X_i \sim N(\mu_i,\sigma_i^2),i=1,...n are independent, then\overset{n}{\underset{i=1}\sum}X_i \sim N\left ( \overset{n}{\underset{i=1}\sum}\mu_i,\overset{n}{\underset {i=1}\sum}\sigma_i^2 \right )

It can be pushed from 1 to out:

P(a < x < b) =P(\frac{a-\mu}{\sigma} < Z < \frac{b-\mu}{\sigma}) = \Phi (\frac{b-\mu}{\sigma}) - \Phi(\frac{a-\mu}{\sigma})

Therefore, as long as we can calculate the CDF of the standard normal, we can calculate any probability. All statistical packages can calculate the sum \Phi(z). \Phi^{-1}(q)Most statistics textbooks, including this book, have a \Phi(z)table of values .

2.17 Example

Assume X \sim N(3,5)and seek \mathbb{P}(X>1). The solution is:

\mathbb{P}(X>1) \\= 1- \mathbb{P}(X<1) \\= 1 - \mathbb{P}(Z<\frac{1-3}{\sqrt{5}})\\=1-\Phi(-0.8944)\\=0.81

Now q=\Phi^{-1}(0.2), this means that we need to find q to satisfy P(X<q)=0.2. The solution is as follows:

0.2=P(X < q)\\=P(Z<\frac{q-\mu}{\sigma}) \\=\Phi(\frac{q-\mu}{\sigma})

From the standard table, \Phi(-0.8416)=0.2. Therefore, -0.8416=\frac{1-\mu}{\sigma}=\frac{q-3}{\sqrt{5}}we get q=1.1181

Exponential Distribution : If the probability density function satisfies the following, then X satisfies the exponential distribution (Exponential Distribution) with parameter β, written as:X \sim Exp(\beta)

f(x)=\frac{1}{\beta}e^{-\frac{x}{\beta}},x>0,\beta > 0

The exponential distribution is used to model the life cycle of electronic components, as well as the waiting time between rare events.

Gamma Distribution : For α>0, the gamma function is defined as: \Gamma(\alpha)=\int_0^\infty y^{\alpha-1} e^y dy. If the probability density function satisfies the following, then X is said to satisfy the gamma distribution with parameters α and β, written as :X\sim Gamma(\alpha,\beta)

f(x)=\frac{1}{\beta^\alpha \Gamma(\alpha)}x^{\alpha - 1}e^{-x/\beta},\alpha > 0,\beta > 0

Exponential Distribution is Gamma(1,β) distribution. If it X_i\sim Gamma(\alpha_i,\beta)is independently distributed, then it satisfies\sum_{i=1}^nX_i \sim Gamma(\sum_{i=1}^n \alpha_i,\beta)

Beta Distribution : If f(x) satisfies the following conditions, then X satisfies the beta distribution with parameters α>0 and β>0, written as: X \sim Beta(\alpha,\beta):

f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1},0 < x <1

t and Cauchy Distribution (t and Cauchy Distribution) : If f(x) satisfies the following conditions, then X is said to satisfy the t distribution with degree of freedom v, written as: X \sim t_v:

f(x)=\frac{\Gamma(\frac{v+1}{2})}{\Gamma(\frac{v}{2})}\frac{1}{​{(1+\frac{x^2}{v})}^{(v+1)/2}}

The t distribution is similar to the normal distribution, but it has thicker tails. In fact, the normal distribution corresponds to the t distribution with degrees of freedom v=∞. The Cauchy distribution corresponds to the t distribution with v=1. Probability density function for:

f(x)=\frac{1}{\pi(1+x^2)}

 In order to figure out that this is indeed a density function, aligning its integrals gives:

\int_{-\infty}^\infty f(x) dx\\= \frac{1}{\pi}\int \frac{dx}{1+x^2} \\= \frac{1}{\pi}\int \frac{dtan^{-1}x}{dx}\\=\frac{1}{\pi}[tan^{-1}(\infty)-tan^{-1}(-\infty)]\\=\frac{1}{\pi}[\frac{\pi}{2}-(-\frac{\pi}{2})]=1

\chi ^2 distribution (chi-square distribution): If f(x) satisfies the following, then X is a chi-square distribution (χ^2 distribution) with p degrees of freedom, written as:X \sim \chi_p^2

f(x)=\frac{1}{\Gamma(p/2)2^{p/2}}x^{(p/2)-1}e^{-x/2},x>0

 If Z1, Z2,...Zp are independent standard normal random variables, then we have\sum_{i=1}^p Z_i^2 \sim \chi_p^2

2.5 Bivariate Distributions

Given a set of discrete random variables X and Y, define the joint mass function as: , f(x,y)=\mathbb{P}(X=x \ and\ Y = y)from now on, it will \mathbb{P}(X=x \ and \ Y=y)be written as \mathbb{P}=(X=x,Y=y). When a more complex formula is required, it will f_{x,y}be written directly asf

2.18 Example

There are two two-dimensional distributions of random variables X and Y, and their values ​​are 0 or 1.

Y=0 Y=1
X=0 1/9 2/9 1/3
X=1 2/9 4/9 2/3
1/3 2/3

Therefore, f(1,1)=P(X=1,Y=1)=4/9

2.19 PDF definition of two-dimensional random variables

In the continuous case, if the following three conditions are met, the function f(x,y) is said to be the PDF of the variable (X,Y)

  1. For all (x,y), there aref(x,y)\geq 0
  2. \int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y)dx dy = 1and
  3. For any set A \subset \mathbb{R} \times \mathbb{R}, there is\mathbb{P}((X,Y) \subset A)=\int\int_A f(x,y)dxdy

In the discrete and continuous cases, we define the joint CDF asF_{X,Y}(x,y)=P(X\leq x,Y\leq y)

2.20 Example

Assuming that (X, Y) is uniform within the unit square, then there is:

f(x,y)=\left\{\begin{matrix} 1 & 0 \leq x \leq 1, 0\leq y \leq 1\\ 0 & otherwise \end{matrix}\right.

求P(X<1/2,Y<1/2).

The event A={X<1/2,Y<1/2} corresponds to a subset of the unit square. In this case, integrate f over this subset. Find the area of ​​A to be 1/4 .So P(X<1/2,Y<1/2)=1/4

2.21 Example

Assume (X,Y) has the following probability density function:

f(x,y)=\left\{\begin{matrix} x+y & 0 \leq x \leq 1, 0 \leq y \leq 1, \\ 0 & otherwise \end{matrix}\right.

Then we get:

\int_0^1 \int_0^1(x+y)dxdy\\\\=\int_0^1[\int_0^1xdx]dy+\int_0^1[\int_0^1ydx]dy\\\\=\int_0^1\frac{1}{2}dy+\int_0^1ydy\\\\=\frac{1}{2}+\frac{1}{2}=1

At this time, it can be proved that f(x,y) is PDF

2.22 Example

If the distribution is defined over a non-rectangular area, then the calculation is a bit more complicated. Here is an example, cited in DeGroot and Schervish (2002). Suppose (X, Y) has the following density function:

f(x,y)=\left\{\begin{matrix} cx^2y & x^2 \leq y \leq 1 \\ 0& otherwise \end{matrix}\right.

Please note that, -1 \leq x \leq 1now let us find the value of c.

The key here is to pay attention to the range of the integral. We choose a variable, such as x, and let it change within its range of values. Then, for each fixed value of x, we let y vary within its range, that is, x^2 ≤ y ≤ 1. If you look at the image below it may help you

Therefore   1 \\\\=\int\int f(x,y)dxdy = c \int_{-1}^1\int_{x^2}^1x^2ydydx\\\\=c\int_{-1}^1x^2[\int_{x^2}^1ydy]dx\\\\=c\int_{-1}^1x^2\frac{​{1-x^4}}{2}dx\\\\=\frac{4c}{21} therefore c=21/4

Now let us calculate P(X>=Y). The corresponding set is: A={(x,y);0<=x<=1,x^2<=y<=x} Therefore

\mathbb{P}(X\geq Y)=\frac{21}{4}\int_0^1\int_{x^2}^xydydx\\\\=\frac{21}{4}\int_0^1x^2[\int_{x^2}^xydy]dx\\\\=\frac{21}{4}\int_0^1x^2\frac{x^2-x^4}{2}dx\\\\=\frac{3}{20}

2.6 Marginal Distributions

2.23 Definition

If (X, Y) satisfies the joint distribution and its mass function is f_{x,y}. Then the marginal mass function for x is defined as:

f_X(x)=P(X=x)=\underset{y}\sum P(X=x,Y=y)=\underset{y}\sum f(x,y)

The marginal mass function for y is defined as:

f_Y(y)=P(Y=y)=\underset{x}\sum P(X=x,Y=y)=\underset{x}\sum f(x,y) 

2.24 Example 

If f_{x,y}given by the following table. For the marginal distribution of X, it is the sum of the rows, and for the marginal distribution of Y, it is the sum of the columns.

Y=0 Y=1
X=0 1/10 2/10 3/10
X=1 3/10 4/10 7/10
4/10 6/10

available  f_X(0)=3/10,f_X(1)=7/10

2.25 Definition

For continuous random variables, the marginal density function is defined as:

f_X(x)=\int f(x,y)dy , f_Y(y)=\int f(x,y)dx

Then the corresponding marginal distribution function is F_X(x) and F_Y(y) expressed by

2.26 Example

Assume f_{X,Y}(x,y)=e^{-(x+y)}\ \ \ x,y\geq0, thenf_X(x)=e^{-x}\int_0^\infty e^{-y}dy = e^{-x}

2.27 Example

If: f(x,y)=\left\{\begin{matrix} x+y & 0 \leq x \leq 1,0 \leq y \leq 1\\ 0 & otherwise \end{matrix}\right., then we can get

f_Y(y)=\int_0^1(x+y)dx=\int_0^1 xdx+\int_0^1ydx=\frac{1}{2}+y

2.28 Example

Let (X,Y) have the following density function

f(x,y)=\left\{\begin{matrix} \frac{21}{4}x^2y & x^2 \leq y \leq 1\\ 0 & otherwise \end{matrix}\right.

Therefore, it can be obtained

f_X(x)=\int f(x,y)dy = \frac{21}{4}x^2\int_{x^2}^1ydy=\frac{21}{8}x^2(1-x^4) 

2.7 Independent Random Variable

2.29 Definition

There are two random variables X and Y. If for all A and B, there is \mathbb{P}(X \in A,Y \in B)=\mathbb{P}(X \in A)\mathbb{P}(Y \in B), then we say that X and Y are independent. Writing X \coprod Y. Otherwise, then X and Y are said to be related, writing, as follows (picture)

In principle, to check whether X and Y are independent, we need to check all A and B subsets according to the formula in the definition. But fortunately, we have the following conclusions to use, although these conclusions are based on continuous random variable representation, but it also applies to discrete random variables

2.30 Theorem

Assume that X and Y have a joint PDF  f_{x,y}if and only if f_{X,Y}(x,y)=f_X(x)f_Y(y)it holds for all x and y, thenX \coprod Y

2.31 Example

Assume that X and Y have the following distribution

Y=0 Y=1
X=0 1/4 1/4 1/2
X=1 1/4 1/4 1/2
1/2 1/2 1

Then f_X(0)=f_X(1)=1/2, f_Y(0)=f_Y(1)=1/2and.X and Y are independent because f_X(0)f_Y(0)=f(0,0), f_X(0)f_Y(1)=f(0,1), f_X(1)f_Y(0)=f(1,0), f_X(1)f_Y(1)=f(1,1).

If X and Y have the following distribution

Y=0 Y=1
X=0 1/2 0 1/2
X=1 0 1/2 1/2
1/2 1/2 1

Then X and Y are not independent, because f_X(0)f_Y(1)=1/4butf(0,1)=0

2.32 Example 

Assume that X and Y are independent and have the same density function, as follows:

f(x)=\left\{\begin{matrix} 2x & 0 \leq x \leq 1\\ 0 & otherwise \end{matrix}\right..

Let's find it \mathbb{P}(X+Y \leq 1). Using independence, the joint density function is:

f(x,y)=f_X(x)f_Y(y)=\left\{\begin{matrix} 4xy & 0 \leq x \leq 1, 0\leq y\leq 1\\ 0 & ,otherwise \end{matrix}\right. 

have to:\mathbb{P}(X+Y \leq 1)\\\\=\int\int_{x+y \leq 1}f(x,y)dxdy\\\\=4 \int_0^1x[\int_0^{1-x}ydy]dx\\\\=4\int_0^1x\frac{(1-x)^2}{2}dx\\\\=\frac{1}{6}

The following conclusions help to verify the independent 

2.33 Theorem

Assuming that the range of X and Y is a rectangle (possibly unbounded), if for functions g and h (not necessarily probability density functions), it is satisfied, then X and Y are independent f(x,y)=g(x)h(y).

2.34 Example

Let X and Y have the following density functions:

f(x,y)=\left\{\begin{matrix} 2e^{-(x+2y)} & ,x> 0,y>0 \\ 0 & ,otherwise \end{matrix}\right.

The range of X and Y is a rectangle (0,\infty)\times (0 ,\infty). You can also write f(x,y) as f(x,y)=g(x)h(y). Among them, g(x)=2e^{-x}, h(y)=e^{-2y}.soX \coprod Y

2.8 Conditional Distribution

If X and Y are discrete, then we can compute the conditional distribution of X in the case Y = y. Specifically \mathbb{P}(X=x|Y=y)=P(X=x,Y=y)/\mathbb{P}(Y=y), this leads us to define the conditional probability mass function as follows

2.35 Definition of conditional probability mass function

If, f_Y(y) > 0the conditional probability mass function is defined as follows:

f_{X|Y}(x|y)=P(X=x|Y=y)=\frac{P(X=x,Y=y)}{P(Y=y)}=\frac{f_{X,Y}(x,y)}{f_Y(y)}

For continuous distributions, we use the same definition. The difference in explanation is: in the discrete case, the conditional probability mass function is the conditional probability. f_{X|Y}(x|y)=P(X=x|Y=y)In the continuous case, the probability must be obtained by integration.

2.36 Definition of conditional probability density function

For continuous random variables, the conditional probability density function is defined as follows: If f_Y(y)>0,

f_{X|Y}(x|y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}

 Then, the probability is:

P(X\in A | Y = y) = \int_A f_{X|Y}(x|y)dx

2.37 Example

Assume that X and Y have a joint uniform distribution on the unit square. Therefore, 0 \leq x \leq 1below , f_{X|Y}(x|y)=1. is 0 elsewhere. Given Y=y, X is a Uniform(0,1) distribution .We can write:X|Y=y \sim Uniform(0,1)

From the definition of conditional density: f_{X,Y}(x,y)=f_{X|Y}(x|y)f_Y(y)=f_{Y|X}(y|x)f_X(x). This is very useful in some cases, such as Example 2.39

2.38 Example

Assume: f(x,y)=\left\{\begin{matrix} x+y & 0 \leq x \leq 1 , 0 \leq y \leq 1\\ 0 & otherwise \end{matrix}\right..find\mathbb{P}(X<1/4|Y=1/3)

In the 2.27 example, it can be obtained f_Y(y)=y+(1/2). Therefore:

f_{X|Y}(x|y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}=\frac{x+y}{y+\frac{1}{2}}

so, 

\mathbb{P}(X<\frac{1}{4}|Y=\frac{1}{3})\\\\=\int_0^{1/4}f_{X|Y}(x|\frac{1}{3})dx\\\\=\int_0^{1/4}\frac{x+\frac{1}{3}}{\frac{1}{3}+\frac{1}{2}}dx\\\\=\frac{\frac{1}{32}+\frac{1}{12}}{\frac{1}{3}+\frac{1}{2}}\\\\=\frac{11}{80} 

2.39 Example

If X obeys X \sim Uniform(0,1). After obtaining the X value, the resulting Y obeys Y|X=x \sim Uniform(x,1). Then what is the marginal distribution function of Y?

First, f_X(x)=\left\{\begin{matrix} 1 & ,0\leq x \leq 1\\ 0 & ,otherwise \end{matrix}\right., and f_{Y|X}(y|x)=\left\{\begin{matrix} \frac{1}{1-x} & , 0 < x< y< 1\\ 0 & ,otherwise \end{matrix}\right.therefore we have

f_{X,Y}(x,y)=f_{Y|X}(y|x)f_X(x)=\left\{\begin{matrix} \frac{1}{1-x} & ,0<x<y<1\\ o & ,otherwise \end{matrix}\right.

Then the edge density function of Y is:

f_Y(y)=\int_0^y f_{X,Y}(x,y)dx=\int_0^y \frac{dx}{1-x}dx = - \int_1^{1-y}\frac{du}{u}=-log(1-y),in0 < y<1

2.40 Example

Think about the density function in Example 2.28 and find f_{Y|X}(y|x).

When x^2 \leq y \leq 1_ f_X(x)=(21/8)x^2(1-x^4)_ x^2 \leq y \leq 1_

f_{Y|X}(y|x)=\frac{f(x,y)}{f_X{x}}=\frac{(21/4)x^2y}{(21/8)x^2(1-x^4)}=\frac{2y}{1-x^4}

Ask now\mathbb{P}(Y\geq 3/4 |X = 1/2)=\int_{3/4}^1f(y|1/2)dy=\int_{3/4}^1\frac{32y}{5}dy=\frac{7}{15}

2.9 Multivariate Distributions And IID

f(x_1,x_2,x_3.....x_n)Let X=(X1 , X2...Xn), where X1, Distribution, conditional distribution, most of which are similar to two-dimensional situations.

If for each A1, A2,...An, there is \mathbb{P}(X_1 \in A_1,X_2 \in A2....X_n \in A_n)= \overset{n}{\underset{i=1}{\prod}}\mathbb{P}(X_i \in A_i), then X1, X2...Xn are independent. f(x_1,x_2,...x_n)=\overset{n}{\underset{i=1}{\prod }}f_{X_i}(x_i)Just pass the check.

2.41 IID definition

If X1, X2,...Xn are independent of each other and have the same cumulative distribution function (CDF) F, we say that X1, X2, X_1,...X_n \sim F...

If the density function of F is f, it can also be written X_1,..X_n \sim f. We also call X1,...Xn n random samples of size n from F

Much of statistical theory and practice is based on independently identically distributed (IID) observational data, and we will look at this in detail when we discuss statistics.

2.10 Two important multidimensional distributions

Multinomial (multinomial distribution) : The multidimensional version of the binomial distribution is called a multidimensional distribution. Consider extracting 1 small ball from a box containing k different colors. These small balls are marked: "color1, color2.. ..colork". Let p=(p1,...pk), where pj>=0, and \sum_{j=1}^kp_j=1let pj be the probability that the color of the drawn ball is j. Draw n times (independent sampling with replacement) and let X=(X1,X2..Xk) where Xj represents the number of times color j appears. Therefore n=\sum_{j=1}^kX_j, at this time we say X \sim Multnomial(n,p)that

f(x)=\binom{n}{x_1...x_k}p_1^{x_1}...p_k^{x_k}

 in,\binom{n}{x_1...x_k}=\frac{n!}{x_1!...x_k!}

2.42 Lemma

If X \sim Multinomial(n,p), among them, X=(X1,X2..Xk), p=(p1,p2...pk).The marginal distribution of

Multivariate Normal (multidimensional normal distribution or multivariate normal distribution): The one-dimensional normal distribution has two parameters, μ and σ. In the multidimensional version, μ is a vector and σ is a matrix Σ.

now order

Z=\begin{pmatrix} Z_1\\ \vdots \\ Z_k \end{pmatrix}

Among them, Z_1...Z_k \sim N(0,1)and are independent of each other. Then the density function of Z is:

f(z)=\overset{k}{\underset{i}{\prod }}f(z_i)=\frac{1}{(2\pi)^{k/2}}exp\left \{ -\frac{1}{2} \overset{k}{\underset{i}{\sum}} z_j^2\right \}\\\\=\frac{1}{(2\pi)^{k/2}}exp\left \{ -\frac{1}{2}z^Tz \right \} 

We say that Z conforms to the standard multivariate normal distribution, written as: Z\sim N(0,I), where 0 represents a vector with k 0 elements. The capital I represents the k\times kidentity matrix.

More generally, if the vector X has the following density function, then X is a multidimensional normally distributed vector, denoted as:X \sim N(\mu,\Sigma)

f(x;\mu,\Sigma)=\frac{1}{(2\pi)^{k/2}|(\Sigma)|^{1/2}}exp\left \{ -\frac{1}{2} (x-\mu)^T \Sigma^{-1}(x-\mu)\right \}

Which |\Sigma|represents the determinant of Σ. μ is a vector of length k. Σ is a k\times ksymmetric positive definite matrix. If μ = 0, Σ = I, it becomes a standard multidimensional normal distribution

Because Σ is a symmetric, positive definite matrix. Therefore there exists a matrix \Sigma^{1/2}—called the square root of Σ—that satisfies the following properties:

  1. \Sigma^{1/2}Also symmetrical
  2. \Sigma=\Sigma^{1/2}\Sigma^{1/2}
  3. \Sigma^{1/2}\Sigma^{-1/2}=\Sigma^{-1/2}\Sigma^{1/2}=I,in\Sigma^{-1/2}=(\Sigma^{1/2})^{-1}

2.43 Theorem

If Z \sim N(0,I)and X=\mu+\Sigma^{1/2}Z, then X\sim N(\mu,\Sigma)conversely, if X \sim N(\mu,\Sigma)then\Sigma^{-1/2}(X-\mu) \sim N(0,I)

Assuming that the random normal vector X is divided into X = (Xa, Xb), then μ can be written as μ = (μa, μb), and Σ can be written as\Sigma=\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab}\\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}

2.44 Theorem

Suppose X \sim N(\mu,\Sigma), then

  1. The marginal distribution of Xa satisfies:X_a \sim N(\mu_a,\Sigma_{aa})
  2. The conditional distribution of Xb under the condition of Xa=xa is:X_b|X_a=x_a \sim N(\mu_b+\Sigma_{ba}\Sigma_{aa}^{-1}(x_a-\mu_a),\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab})
  3. If a is a vector, thena^TX \sim N(a^T\mu,a^T\Sigma a)
  4. V=(X-\mu)^T\Sigma^{-1}(X-\mu) \sim \chi _k^2

2.11 Transformation of random variables

Suppose X is a random variable whose CDF is and PDF is . Let be F_Xa function of The mass function of Y is as follows:f_XY=r(X)Y=X^2Y=e^XY=r(X)

f_Y(y)=\mathbb{P}(Y=y)=\mathbb{P}(r(X)=y)=\mathbb{P}(\left \{ x;r(x)=y \right \})\\\\=\mathbb{P}(X \in r^{-1}(y))

2.45 Example

If P(X=-1)=P(X=1)=1/4, P(X=0)=1/2. Let Y=X^2. Then P(Y=0)=P(X= 0)=1/2,P(Y=1)=P(X=1)+P(X=-1)=1/2. As follows:

x f_X(x )
-1 1/4
0 1/2
1 1/4
y f_Y(y)
0 1/2
1 1/2

Y has fewer values ​​than X because the conversion is not one-to-one.

For continuous situations, it is more complicated. Here are the following three steps to findf_Y

  1. For each y, find the setA_y=\left \{ x;r(x) \leq y \right \}
  2. Then find the CDF:

 F_Y(y)=P(Y\leq y)=P(r(X) \leq y)=P(\left \{ x;r(x) \leq y \right \})\\\\=\int_{A_y}f_X(x)dx

    3. PDF is the derivative of CDF:f_Y(y)={F_Y}'(y)

2.46 Example 

Let f_X(x)=e^{-x}, x>0. Therefore F_X(x)=\int_0^x f_X(s)ds= 1- e^{-x}. Suppose Y=r(X)=logX. Then A_y=\left \{ x:x \leq e^y \right \}then

F_Y(y)=P(Y \leq y)=P(logX \leq y)=P(X \leq e^y)=F_x(e^y)=1-e^{-{e^y}}

thereforef_Y(y)=e^ye^{-e^y},y \in \mathbb{R}

2.47 Example 

Assume X \sim Uniform(-1,3)that Y=X^2the density function of PDF.X is:

f_X(x)=\left\{\begin{matrix} 1/4 & , -1 < x< 3\\ 0 & ,otherwise \end{matrix}\right.

Y can only take values ​​between (0,9), consider two situations: the first, 0<y<1; the second, 1<= y < 9.

For the first case, A_y=[-\sqrt{y},\sqrt{y}].F_Y(y)=\int_{A_y}f_X(x)dx=(1/2)\sqrt{y}

For the second case, A_y=[-1,\sqrt{y}],F_Y(y)=\int_{A_y}f_X(x)dx=(1/4)(\sqrt{y}+1)

Taking the derivative of F we get:

f_Y(y)=\left\{\begin{matrix} \frac{1}{4\sqrt{y}} &, 0 < y< 1\\ \frac{1}{8\sqrt{y}} & , 1<y<9\\ 0 & ,otherwise \end{matrix}\right.

When r is strictly monotonically increasing or monotonically decreasing, then r has its inverse function, s=r^{-1}, in this case the density function can be expressed as:

f_Y(y)=f_X(s(y))|\frac{ds(y)}{dy}| 

2.12 Transformation of Multiple Random Variables 

In some cases, we are interested in transformations of multiple random variables. For example, if X and Y are given random variables. We might know the distribution of X/Y, X+Y, max{X,Y} .Let Z=r(X,Y) be the function we are interested in. Then f_Z(z)the steps to find it are similar to the previous ones:

  1. For each z, find the setA_z=\left \{ (x,y) :r(x,y) \leq z\right \}
  2. Find the CDF:

F_Z(z)=P(Z \leq z)=P(r(X,Y) \leq z)=P(\left \{ (x,y):r(x,y) \leq z \right \})\\\\=\int\int_{A_z}f_{X,Y}(x,y)dxdy

    3. Then derive its derivative;f_Z(z)={F_Z}'(z)

2.48 Example

Assume X_1,X_1 \sim Unifrom(0,1)and are independent. Find Y=X_1+X_2the density function

The joint density function of (X1,X2) is:

f(x_1,x_2)=\left\{\begin{matrix} 1 &\ 0 \leq x \leq 1,0 \leq y \leq 1\\ 0 & otherwise \end{matrix}\right.

Order r(x_1,x_2)=x_1+x_2, get:

F_Y(y)=P(Y \leq y)=P(r(X_1,X_2) \leq y)=P(\left \{ (x_1,x_2);r(x_1,x_2) \leq y \right \}) \\\\=\int\int_{A_y}f(x_1,x_2)dx_1dx_2

Now comes the hard part: finding A_y.

First assume 0 < y \leq 1, then A_yit is a triangle surrounded by (0,0), (y,0), (0,y). As shown below

In this case, \int\int_{A_y}f(x_1,x_2)dx_1dx_2the area of ​​the triangle isy^2/2

Assume again 1 < y< 2, then A_yit is all the areas except the triangle surrounded by (1, y - 1), (1, 1), (y - 1,1). The area of ​​this part is. 1-(2-y)^2/2Therefore

F_Y(y)=\left\{\begin{matrix} 0 & ,y <0\\ \frac{y^2}{2} & , 0 \leq y < 1\\ 1- \frac{(2-y)^2}{2} & ,1 \leq y < 2\\ 1 & ,y \geq 2 \end{matrix}\right.

Derive it and get PDF

f_Y(y)=\left\{\begin{matrix} y &,0 \leq y \leq 1 \\ 2-y & ,1 \leq y \leq 2\\ 0 & ,otherwise \end{matrix}\right.

End of this chapter

Untranslated: Appendix, homework

                 

             

Guess you like

Origin blog.csdn.net/xiaowanbiao123/article/details/132867050