All of Statistics Chapter 3

Statistics (3) Expectation

This chapter includes:

  • 3.1 Expectation of random variables
  • 3.2 Nature of expectations
  • 3.3 Variance and Covariance
  • 3.4 Expectations and variances of several important random variables
  • 3.5 Conditional expectations
  • 3.6 Moment Generating Function

As for the key nouns, some words may not convey the meaning, so the key nouns are organized as follows

1. Expectation:Expectation

2. Variance: Variance

3. Covariance:Covariance

4. Mean:Mean

5. First Moment:First Moment

6. Integration by parts: Integration by parts

7. Law of the unconscious statistician:Law of the unconscious statistician

8. Moment: Moment

9. standard deviation: standard deviation

10. sample mean: sample mean

11. Sample variance: sample variance

12. Correlation:correlation

13. Conditional Expectation: Conditional Expectation

14. The Rule of Iterated Expectations: The Rule of Iterated Expectations

15. Conditional Variance: Conditional Variance

16. Hierarchical model: hierarchical model

17. Moment Generating Function: Moment Generating Function

18. Laplace transform: Laplace transform

3.1 Expectation of random variables 

The mean and expectation of a random variable X is the mean of X

3.1 Definition

The expected value (expected value), or mean (mean), or first moment (first moment) of the random variable X is defined as follows:

\mathbb{E}(X)=\int x dF(x)=\left\{\begin{matrix} \sum_xxf(x) & if \ X\ is \ discrete\\ \int xf(x)dx& if \ X \ is \ continuous \end{matrix}\right.

Among them, it is assumed that the summation (or integration) conforms to the definition. We use the following formula to express the expected value of X:

\mathbb{E}(X)=\mathbb{E}X=\int xdF(x)=\mu=\mu_X

 Expectation is a single-valued generalization of a distribution. Will be regarded as the mean \mathbb{E}(X)of many independent simultaneous distributions X1, Theorem of number laws. It will be introduced in Chapter 5.\sum_{i=1}^nX_i/n\mathbb{E}\approx \sum_{i=1}^nX_i/n

The notation \int x dF(x)needs some explanation. We use it simply as a convenient unifying notation so that we don't have to write for discrete random variables \sum_x xf(x)and for continuous random variables \int xf(x)dx. But you should know that this notation is used in actual analysis courses, has a precise meaning.

In order to ensure \mathbb{E}(X)that the definition is met, if \int |x|dF_X(x) < \infty, we say \mathbb{E}(X)it exists. Otherwise, we say the expectation does not exist.

3.2 Example

Suppose X \sim Bernoulli(p), then\mathbb{E}(X)=\sum_{x=0}^1xf(x)=(0 \times (1-p))+(1 \times p) = p

3.3 Example

Toss a coin twice. Let X be the number of heads. Then

\mathbb{E}(X)=\int xdF_X(x) = \sum_x xf_X(x)=(0 \times f(0))+(1 \times f(1)) +(1 \times f(2))=(0 \times (1/4))+(1 \times (1/2))+(2 \times (1/4)) =1

3.4 Example

Suppose X \sim Unifrom(-1,3), then,

\mathbb{E}(X)=\int xdF_X(x)=\int xf_X(x)dx =\frac{1}{4}\int_{-1}^3xdx=1

3.5 Example

To review, if a random variable obeys the Cauchy distribution, then it has the following density function:

f_X(x)=\left \{ \pi(1+x^2) \right \}^{-1}

Using integration by parts (let) u=x,v=\tan^{-1}x, we get

\int |x|dF(x)=\frac{2}{\pi}\int_0^\infty\frac{xdx}{1+x^2}=[xtan^{-1}x]_0^\infty-\int_0^\infty\tan^{-1}xdx=\infty

Therefore the mean does not exist. If you simulate the Cauchy distribution many times and take the average, you will find that the mean never stabilizes. This is because the Cauchy distribution has thick tails, so extreme observations are common .

From now on, whenever we discuss expectations, we will assume that they exist.

Assume Y=r(X), how to calculate \mathbb{E}(Y)it? One way is to find f_Y(y)and pass \mathbb{E}(Y)=\int yf_Y(y)dy. But there is an easier way.

3.6 Theorem

Law of the unconscious statistician. Suppose Y=r(X), then

\mathbb{E}(Y) = \mathbb{E}(r(X))=\int r(x)dF_X(x)

This result is intuitive. Consider the situation where we play a game in which you randomly draw X, and then I pay you Y=r(X). Your average income is r(X) times X=x Probability, sum (or integrate) over x. Here is a special case, let A be an event. And let r(x)=I_A(x), if x \in A, I_A(x)=1and x \notin A, I_A(x) = 0, then:

E(I_A(x))=\int I_A(x)f_X(x)dx=\int_Af_X(x)dx=\mathbb{P}(X\in A)

In other words, probability is a special case of expectation.

3.7 Example

Suppose X \sim Unif(0,1), Y=r(X)=e^X,then \mathbb{E}(Y)=\int_0^1e^xf(x)dx=\int_0^1e^xdx=e-1.

Alternatively, you can find out f_Y(y)that in 1 < y < ethe case of , it is f_Y(y)=1/y. Then

\mathbb{E}(Y)=\int_1^eyf(y)dy=e-1

3.8 Example 

Take a stick of unit length and break it randomly. Let Y be the length of the longer part. So what is the mean value of Y X \sim Unif(0,1)? IfY=r(X)=max\left \{ X,1-X \right \}0 < x < 1/2r(x)=1-x1/2 \leq x <1r(x)=x

\mathbb{E}(Y)=\int r(x)dF(x)=\int _0^{1/2}(1-x)dx+\int_{1/2}^1xdx=\frac{3}{4}

Multi-variable functions are handled in the same way. For Z=r(X,Y)example:

\mathbb{E}(Z)=\mathbb{E}(r(X,Y))=\int\int r(x,y)dF(x,y) 

3.9 Example 

Assume (X, Y) is on the unit square and obeys joint uniform distribution. Assume again Z=r(X,Y)=X^2+Y^2, then:

\mathbb{E}(Z)=\int\int r(x,y)dF(x,y)=\int_0^1\int_0^1(x^2+y^2)dxdy\\\\=\int_0^1x^2dx+\int_0^1y^2dy=\frac{2}{3}

The kth-order moment (moment) of X is defined as: \mathbb{E}(X^k), where satisfies\mathbb{E}(|X|^k) < \infty

3.10 Theorem

If the k-th moment exists, then when j<k, the j-th moment also exists

prove:

\mathbb{E}(|X|^j)=\int_{-\infty}^\infty|x|f_X(x)dx\\\\=\int_{|x|\leq1}|x|^jf_X(x)dx+\int_{|x| > 1}|x|^jf_X(x)dx\\\\ \leq \int_{|x|\leq1}f_X(x)dx+\int_{|x| > 1}|x|^kf_X(x)dx\\\\ \leq 1+\mathbb{E}(|X|^k) < \infty

The kth order central moment is defined as:\mathbb{E}((X-\mu)^k)

3.2 Nature of expectations

3.11 Theorem

If X1,...Xn are random variables, and a1,...an are constants, then

\mathbb{E}(\underset{i}{\sum a_iX_i})=\underset{i}{\sum}a_i \mathbb{E}(X_i)

3.12 Example

Assuming that X obeys X\sim Binomial(n,p), then what is the mean value of X? We can try to find it using the definition, as follows:

\mathbb{E}(X)=\int x dF_X(x)=\sum xf(x)=\sum x\binom{n}{x}p^x(1-p)^{n-x}

It is relatively difficult to sum up the above formula. We notice X=\sum_{i=0}^n X_ithat X_i can be expressed as the i-th coin toss. When it is heads, Xi=1, when it is tails, Xi=0. Then. Therefore \mathbb{E}(X_i)=p\times 1+(1-p)\times 0=p:

\mathbb{E}(X)=\mathbb{E}(\sum X_i)=\sum \mathbb{E}(X_i)=np

3.13 Theorem

Assume X1,...Xn are independent random variables, then:

\mathbb{E}(\overset{n}{\underset{i=1}\prod }X_i)=\underset{i}\prod\mathbb{E}(X_i)

Note: The above summation rule does not require the random variables to be independent. However, the product rule requires the random variables to be independent.

3.3 Variance and Covariance (Variance and Covariance)

The variance measures the "spread" degree of the distribution. (Translator's Note: The degree of "spread" in double quotes, that is, whether the distribution is concentrated or diffuse).

3.14 Theorem

Let X be a random variable XX with mean μ. The variance of XX is defined as follows, recorded as: \sigma^2 or \sigma_X^2 or \mathbb{V}(X) or \mathbb{V}X:

\sigma^2=\mathbb{E}(X-\mu)^2=\int (x-\mu)^2dF(x)

If this expectation exists, then the standard deviation is sd(X)=\sqrt{\mathbb{V}(X)}, denoted as\sigma or \sigma_X

3.15 Theorem

If the variance exists and satisfies the definition, then it has the following properties:

1. \mathbb{V}(X)=\mathbb{E}(X^2)-\mu^2

2. If a and b are constants, then\mathbb{V}(aX+b)=a^2\mathbb{E}(X)

3. If X1..Xn are independent and a1,...,an are constants, then

\mathbb{V}(\overset{n}{\underset{i=1}{\sum}}a_iX_i)=\overset{n}{\underset{i=1}{\sum}}a_i^2\mathbb {V}(X_i)

3.16 Example

X \sim Binomial(n,p)Assume that X=\sum_iX_i_ \mathbb{P}(X_i=1)=p_ \mathbb{P}(X_i=0)=1-p_

\mathbb{E}(X_i)=p\times 1+(1-p)\times 0=p

Now

\mathbb{E}(X_i^2)=p\times 1^2+(1-p)\times 0^2=p

Therefore: \mathbb{V}(X_i)=\mathbb{E}(X_i^2)-p^2=p-p^2=p(1-p).finally\mathbb{V}(X)=\mathbb{V}(\sum_iX_i)=\sum_i\mathbb{V}(X_i)=\sum_ip(1-p)=np(1-p)

Note: If p=0, or p=1, then\mathbb{V}(X)=0

If X1..Xn are random variables, then we can define the sample mean as:

\bar{X_n}=\frac{1}{n}\overset{n}{\underset{i=1}{\sum}}X_i

Sample variance is defined as:

S_n^2=\frac{1}{n-1}\overset{n}{\underset{i=1}{\sum}}(X_i - \bar{X_n})^2

3.17 Theorem

Suppose X1,...Xn are independent and identically distributed random variables, and \mu=\mathbb{E}(X_i), \sigma^2=\mathbb{V}(X_i)then

\mathbb{E}(\bar{X}_n)=\mu,\mathbb{V}(\bar{X}_n)=\frac{\sigma^2}{n},\mathbb{E}(S_n^2)=\sigma^2

If X and Y are random variables, then the covariance and correlation between X and Y measure how strong the linear relationship between X and Y is.

3.18 Definition

Let X and Y be random variables with mean \in_X,\in_Yand standard deviation \sigma_X,\sigma_Y. The covariance between X and Y is defined as follows:

Cov(X,Y)=\mathbb{E}((X-\mu_X)(Y-\mu_Y))

Correlation is defined as follows:

\rho =\rho_{X,Y}=\frac{Cov(X,Y)}{\sigma_X\sigma_Y}

3.19 Theorem

The covariance satisfies:

Cov(X,Y)=\mathbb{E}(XY)-\mathbb{E}(X)\mathbb{E}(Y)

The correlation satisfies:

-1 \leq \rho(X,Y) \leq 1

If a and b are constants, Y=aX+b, then when a>0, \rho (X,Y) = 1; when a<0\rho (X,Y) = -1

If X and Y are independent, then Cov(X,Y)=\rho = 0in general, the converse proposition does not hold.

3.20 Theorem

\mathbb{V}(X+Y)=\mathbb{V}(X)+\mathbb{V}(Y)+2Cov(X,Y); \mathbb{V}(X-Y)=\mathbb{V}(X)+\mathbb{V}(Y)-2Cov(X,Y), in a more general case, for multiple random variables X1..Xn:

\mathbb{V}(\sum_ia_iX_i)=\sum_ia_i^2\mathbb{V}(X_i)+2\sum\sum_{i<j}a_ia_jCov(X_i,Y_j)

3.4 Expectations and variances of several important random variables

The following table contains the expectations of several important random variables:

We have previously derived the expectation and variance of the binomial distribution. For calculations of other distributions, please see the after-class exercises.

The last two items in the above table are the multivariate models, which involve the vector X and have the following format:

X=\begin{pmatrix} X_1\\ \vdots \\ X_n\end{pmatrix}

The mean of the random vector X is defined as follows:

\mu=\begin{pmatrix} \mu_1\\ \vdots \\ \mu_n \end{pmatrix}=\begin{pmatrix} \mathbb{E}(X_i)\\ \vdots \\ \mathbb{E}(X_n) \end{pmatrix}

The variance-covariance matrix Σ is defined as follows:

\mathbb{V}(X)=\begin{pmatrix} \mathbb{V}(X_1) & Cov(X_1,X_2) & \dots & Cov(X_1,X_k)\\ Cov(X_2,X_1)& \mathbb {V}(X_2) & \dots & Cov(X_2,X_k)\\ \vdots & \vdots & \vdots & \vdots \\ Cov(X_k,X_1) & Cov(X_k,X_2) & \dots & \mathbb {V}(X_k) \end{pmatrix}

X obeys X \sim Multinomial(n,p), then \mathbb{E}(X)=np=n(p_1,...p_k).

\mathbb{V}(X)=\begin{pmatrix} np_1(1-p_1) & -np_1p_2 & \dots & -np_1p_k\\ -np_2p_1 & np_2(1-p_2) & \dots & -np_2p_k\\ \vdots & \vdots & \vdots & \vdots\\ -np_kp_1 & -np_kp_2 & \dots & np_k(1-p_k) \end{pmatrix}

To understand this, it is necessary to note that the marginal distribution of any element of the vector satisfies the binomial distribution X_i \sim Binomial(n,p_i). Therefore \mathbb{E}(X_i)=np_i, \mathbb{V}(X_i)=np_i(1-p_i), should also be noted X_i+X_j \sim Binomial(n,p_i+p_j), so \mathbb{V}(X_i+X_j)=n(p_i+p_j)(1-[p_i+p_j])in other words using the variance formula of and we get\mathbb{V}(X_i+X_j)=\mathbb{V}(X_i)+\mathbb{V}(Y_i)+2Cov(X_i,X_j)=np_i(1-p_i)+np_j(1-p_j)+2Cov(X_i,X_j)

Equality the above formula with \mathbb{V}(X_i+X_j)=n(p_i+p_j)(1-[p_i+p_j]), and find Cov(X_i,X_j)=-np_ip_j.

Finally, there is a lemma that can be used to find the mean and variance of a linear combination of multivariate random vectors, which can be very useful in some situations.

3.21 Lemma

If a is a vector, X is a random vector with mean μ and variance Σ. Then, \mathbb{E}(a^TX)=a^T\mu, \mathbb{V}(a^TX)=a^T\Sigma aif A is a matrix, then \mathbb{E}(AX)=A\mu,\mathbb{V}(AX)=A\Sigma A^T

3.5 Conditional Expectation

Assuming that X and Y are random variables, when Y=y, what is the mean value of X? The answer is that we calculate the mean value of X as before, but in the definition of expectation, we replace the f_X(x)alternative f_{X|Y}(X|Y)of

3.22 Theorem

Given Y=y, the conditional expectation of X is defined as:

\mathbb{E}(X|Y=y)=\left\{\begin{matrix} \sum x f_{X|Y}(x|y) & ,discrete \ case\\ \int x f_{X|Y}(x|y)dx &,continuous \ case \end{matrix}\right.

If r(x,y) is a function of x and y, then

\mathbb{E}(r(X,Y)|Y=y)=\left\{\begin{matrix} \sum r(x,y)f_{X|Y}(x|y) & ,discrete \ case\\ \int r(x,y)f_{X|Y}(x|y)dx &,continuous \ case \end{matrix}\right.

Warning: There is a subtle point to note here. Although \mathbb{E}(X)it is a numerical value, \mathbb{E}(X|Y=y)it is a function about y. We do not know the value before getting the y value \mathbb{E}(X|Y=y), so it is a random variable, denoted as \mathbb{E}(X|Y). In other words \mathbb{E}(X|Y)is a random variable, and its value is \mathbb{E}(X|Y=y). Similarly, \mathbb{E}(r(X,Y)|Y)is also a random variable, and its value is \mathbb{E}(r(X,Y)|Y=y). This is a very confusing point, so let's look at an example.

3.23 Example

Assume that X follows a uniform distribution X\sim Unif(0,1). After X=x, Y|X=x \sim Unif(x,1).Intuitively, we expect that \mathbb{E}(Y|X=x)=(1+x)/2, in fact f_{Y|X}(y|x)=1/(1-x),x < y < 1,

\mathbb{E}(Y|X=x)=\int_x^1 yf_{Y|X}(y|x)dy=\frac{1}{1-x}\int _x^1 y dy= \frac{1+x}{2}

Therefore, \mathbb{E}(Y|X)=(1+X)/2.Note \mathbb{E}(Y|X)=(1+X)/2is a random variable, and its value is the value when X=x\mathbb{E}(Y|X=x)=(1+x)/2

3.24 Theorem (The Rule of Iterated Expectations)

For random variables X and Y, assuming that expectations exist, then we have:

\mathbb{E}[\mathbb{E}(Y|X)]=\mathbb{E}(Y),\mathbb{E}[\mathbb{E}(X|Y)]=\mathbb{E}(X)

In a more general case:

\mathbb{E}[\mathbb{E}(r(X,Y)|X)]=\mathbb{E}(r(X,Y))

prove:

We f(x,y)=f(x)f(y|x)prove the first equation using conditional expectations and.

\mathbb{E}[\mathbb{E}(Y|X)]=\int \mathbb{E}(Y|X) f_X(x)dx = \int \int y f(y|x)dy f(x)dx =\int \int yf(y|x)f(x)dx dy= \int \int yf(x,y)dxdy = \mathbb{E}(Y)

3.25 Example

Consider 3.23 Example. How to calculate E(Y)? One method is to find the joint density function f(x, y) and then calculate it \mathbb{E}(y)=\int\int y f(x,y)dx dy. Another simpler method only requires two steps. First, it is already known \mathbb{E}(Y|X)=(1+X)/2, so

\mathbb{E}(Y)=\mathbb{E}\mathbb{E}(Y|X)=\mathbb{E}(\frac{1+X}{2})=\frac{1+\mathbb{E}(X)}{2}=\frac{1+(1/2)}{2}=3/4

3.26 Definition

Conditional variance is defined as follows:

\mathbb{V}(Y|X=x)=\int(y-\mu(x))^2 f(y|x) dy

in,\mu(x)=E(Y|X=x)

3.27 Theorem

For random variables X and Y:

\mathbb{V}(Y)=\mathbb{E}\mathbb{V}(Y|X)+\mathbb{V}\mathbb{E}(Y|X)

3.28 Example

Randomly select a county from US, and then randomly select n people from this county. Let variable because it varies by county. Given Q=q, we have X \sim Binomial(n,p). Therefore \mathbb{E}(X|Q=q)=nq. \mathbb{V}(X|Q=q)=nq(1-q)Assume that the random variable Q obeys a uniform distribution Uniform(0,1). The distribution constructed in stages like this is called a hierarchical model and can be written as:

Q \sim Unifrom(0,1)

X|Q=q \sim Binomial(n,q)

Now \mathbb{E}(X)=\mathbb{E}\mathbb{E}(X|Q)=\mathbb{E}(nQ)=n\mathbb{E}(Q) = n/2. Let us calculate the variance of X.

Now \mathbb{V}(X)=\mathbb{E}\mathbb{V}(X|Q)+\mathbb{V}\mathbb{E}(X|Q), let's calculate these two terms

first,\mathbb{E}\mathbb{V}(X|Q)=\mathbb{E}(nQ(1-Q))=n\mathbb{E}(Q(1-Q))=n\int q(1-q)f(q)dq = n\int _0^1 q(1-q)dq = n/6

Next,\mathbb{V}\mathbb{E}(X|Q)=\mathbb{V}(nQ)=n^2\mathbb{V}(Q)=n^2\int (q-(1/2))^2dq=n^2/12

therefore\mathbb{V}(X) =(n/6)+(n^2/12)

3.6 Moment Generating Function 

Now, we will define the moment generating function, which is used to find moments, find the distribution of sums of random variables, and is also used in the proof of certain theorems

3.29 Definition

The moment generating function (Moment Generating Function) MGF or Laplace transform (Laplace transform) is defined as follows:

\psi _X(t) = \mathbb{E}(e^{tx})=\int e^{tx}dF(x)

where t varies over the range of real numbers

In the following content, we assume that MGF is defined in the open interval near t=0.

When MGF satisfies the definition, it can be proved that the differential and "expected value" operations can be exchanged. This leads to

\psi '(0)=[\frac{d}{dt}\mathbb{E}(e^{tx})]=\mathbb{E}[\frac{d}{dt}e^{tx}]=\mathbb{E}(Xe^{tx})_{t=0}=\mathbb{E}(X)

By taking k derivative operations, we can get \psi ^{(k)}(0)=\mathbb{E}(X^k). This gives us a way to calculate the moments of the distribution

3.30 Example

Assume that X obeys exponential distribution X \sim Exp(1), for any t<1 we get:

\psi_X(t)=\mathbb{E}e^{tx}=\int_0^\infty e^{tx}e^{-x}dx=\int _0^\infty e^{(t-1)x}dx=\frac{1}{1-t}

If t \geq 1, the integral will diverge. Therefore when t < 1, \psi _X(t) = 1/(1-t), now \psi'(0)=1\ \ \psi''(0)=2,

Therefore, \mathbb{E}(X)=1,\mathbb{V}(X)=\mathbb{E}(X^2)-\mu^2=2-1=1

3.31 Lemma

The properties of MGF are:

1. If Y=aX+b, then\psi_Y(t)=e^{bt}\psi_X(at)

2. If X1,...Xn are independent, and Y=\sum_iX_i, then \psi_Y(t)=\prod _i\psi_i(t), where \psi_iis the MGF of Xi

3.32 Example

Suppose X obeys the binomial distribution X \sim Binomial(n,p). We know that X=\sum_{i=1}^nX_i, where \mathbb{P}(X_i=1)=p,\mathbb{P}(X_i=0)=1-p. Now \psi_i(t)=\mathbb{E}e^{X_it}=(p\times e^t)+((1-p))=pe^t+q, where q=1-p.

therefore,\psi_X(t)=\prod _i\psi_i(t)=(pe^t+q)^n

Recall the previous content, if X and Y have the same distribution function, then we write it asX \overset{d}{=} Y

3.33 Theorem

Assume that X and Y are random variables. If there is an open interval near 0 point for all t, \psi_X(t)=\psi_Y(t)thenX \overset{d}{=} Y

3.34 Example

Assume that X1 obeys the binomial distribution X_1\sim Binomial(n_1,p)and X2 obeys the binomial distribution X_2\sim Binomial(n_2,p), and the two are independent. Let Y=X1+X2, then we get:

\psi_Y(t) =\psi_1(t)\psi_2(t)=(pe^t+q)^{n_1}(pe^t+q)^{n_2}=(pe^t+q)^{n_1+n_2}

We can think of this as the moment generating function of the binomial distribution Binomial(n1 + n2, p). Because the moment generating function characterizes the distribution (i.e., there does not exist another random variable with the same moment generating function). We conclude that Y obeys the binomial distributionY\sim Binomial(n_1+n_2,p)

3.35 Example

Suppose Y1 obeys the Poisson distribution Y_1 \sim Poisson(\lambda_1), Y2 obeys the Poisson distribution Y_2 \sim Poisson(\lambda_2)and the two are independent. The moment generating function of Y=Y1+Y2 is: \psi_Y(t)=\psi_{Y_1}(t)\psi_{Y_2}(t)=e^{\lambda_1(e^t-1)}e^{\lambda_2(e^t-1)}=e^{(\lambda_1+\lambda_2)(e^t-1)}, which is also Poisson(\lambda_1+\lambda_2)the moment generating function. Therefore, we have proved that two independent Poisson random variables The sum of has a Poisson distribution

End of this chapter

Untranslated: Appendix, homework

Guess you like

Origin blog.csdn.net/xiaowanbiao123/article/details/132926972