All of Statistics Chapter 4

Contents of this chapter:

  • 4.1 Probability Inequality
  • 4.2 Expectation inequality

As for the key nouns, some words may not convey the meaning, so the key nouns are organized as follows

1. Inequalities: Inequalities

2. Markov's Inequality: Markov's Inequality

3. Chebyshev's Inequality: Chebyshev's Inequality

4. Hoeffding's inequality: Hoeffding's inequality

5. Confidence Interval: Confidence Interval

6. Cauchy-Schwartz inequality: Cauchy-Schwartz inequality

7. Mill's Inequality: Mill's Inequality

8. Jensen's inequality: Jensen's inequality

4.1 Probability Inequality

Inequalities are useful for quantities that may be difficult to calculate, and can be used to define upper and lower bounds. It will also be used in the next chapter on convergence theory. Our first inequality is Markov's Inequalities

4.1 Theorem (Markov’s inequality)

Assume X is a non-negative random variable, assuming it \mathbb{E}(X)exists. For any t>0, there is:

\mathbb{P}(X>t) \leq \frac{\mathbb{E}(X)}{t}

prove:

Because X>0, so:

\mathbb{E}(X) = \int_0^\infty xf(x)dx=\int_0^txf(x)dx+\int_t^\infty xf(x)dx \\\\ \geq \int_t^\infty x f(x)dx \geq t\int_t^\infty f(x)dx = t\mathbb{P}(X>t)

4.2 Theorem ( Chebyshev’s inequality  )

Assume \mu = \mathbb{E}(X), \sigma^2=\mathbb{V}(X),then:

\mathbb{P}(|X-\mu| \geq t) \leq \frac{\sigma^2}{t^2},and\mathbb{P}(|Z|\geq k) \leq \frac{1}{k^2}

Among them, Z=(X-\mu)/\sigma.In fact, \mathbb{P}(|Z| > 2) \leq \frac {1}{4},\mathbb{P}(|Z| > 3) \leq \frac {1}{9}

prove:

We use Markov's inequality to prove,

\mathbb{P}(|X-\mu| \geq t)=\mathbb{P}(|X-\mu|^2 \geq t^2) \leq \frac{\mathbb{E}(X-\mu)^2}{t^2}=\frac{\sigma^2}{t^2}

t=k\sigmaWe can prove the second inequality by replacing t with

4.3 Example

Suppose we test a prediction method, such as a neural network, on a new set of n test examples. If the prediction is wrong, let Xi = 1, if the prediction is correct, let Xi = 0. Then \bar{X}_n=n^{-1}\sum _{i=1}^nX_iis the observed error rate. Each Xi can be viewed as a Bernoulli random variable with unknown expectation p. We want to know the true but unknown error rate p. So what is the probability that \bar{X}_nit is not near p ?\varepsilon

We have \mathbb{V}(\bar{X}_n)=\mathbb{V}(X_1)/n=p(1-p)/n, then:

\mathbb{P}(|\bar{X}_n-p| > \varepsilon ) \leq \frac{\mathbb{V}(\bar{X}_n)}{\varepsilon^2}=\frac{p(1-p)}{n\varepsilon^2} \leq \frac{1}{4n\varepsilon^2}

Because for all p, there is p(1-p) \leq \frac{1}{4}. If \varepsilon =0.2, n=100, then the upper bound of the above formula is: 0.0625

Heffding's inequality is similar in spirit to Markov's inequality, but it is a stricter inequality. We present the results here in two parts.

4.4 Theorem (Hefting’s inequality)

Suppose Y1..Yn are independent observation values, satisfying: \mathbb{E}(Y_i)=0, a_i \leq Y_i \leq b_i, Suppose \varepsilon > 0, then for any t>0, there is:

\mathbb{P}(\overset{n}{\underset{i=1}{\sum}}Y_i \geq \varepsilon) \leq e^{-t\varepsilon}\overset{n}{\underset{i =1}{\prod}} e^{t^2(b_i-a_i)^2/8}

4.5 Theorem (Hefting’s inequality)

Suppose X_1...X_n\sim Bernoulli(p), then for any \varepsilon > 0, there are:

\mathbb{P}(|\bar{X}_n-p| > \valuepsilon ) \leq 2e^{-2n\valuepsilon^2}

in\bar{X}_n=n^{-1}\sum_{i=1}^nX_i

4.6 Example

Assume X_1...X_n\sim Bernoulli(p), n=100, \varepsilon = 0.2according to Chebyshev's inequality:

\mathbb{P}(|\bar{X}_n-p| > \varepsilon) \leq 0.0625

According to Heffding's inequality, we have

\mathbb{P}(|\bar{X}_n-p| > 0.2) \leq 2e^{-2(100)(0.2)^2}=0.00067

Heffding's inequality provides us with a simple way to create a confidence interval for the binomial distribution parameter p. We will discuss confidence intervals in detail later (Chapter 6), but here is the basic idea. Fix a positive number a, let

\varepsilon_n=\sqrt{\frac{1}{2n}log{\frac{2}{\alpha}}}

According to Heffding's inequality, we have

\mathbb{P}(|\bar{X}_n-p| > \valuepsilon ) \leq 2e^{-2n\valuepsilon^2}=\alpha

Let C=(\bar{X}_n-\varepsilon,\bar{X}_n+\varepsilon). Then \mathbb{P}(p \notin C ) = \mathbb{P}(|\bar{X}_n-p|>\varepsilon_n) \leq \alpha, therefore \mathbb{P}(p \in C) \geq 1-\alpha. That is, the random interval C contains the true parameter value p with probability 1 - a; we call C a 1 - a confidence interval. More on that later.

The following inequalities are useful for qualifying probability states associated with normal random variables. (requires proofreading)

4.7 Theorem (Mill’s inequality)

Suppose Z \sim N(0,1), then\mathbb{P}(|Z| > t) \leq \sqrt{\frac{2}{\pi}}\frac{e^{-t^2/2}}{t}

4.2 Expectation inequality

This section contains two inequalities about expected values

4.8 Theorem (Cauchy-Schwarz Inequality)

If X and Y have finite variance, then\mathbb{E}(|XY|) \leq \sqrt{\mathbb{E}(X^2)\mathbb{E}(Y^2)}

Note: The definition of the concave and convex functions below is opposite to that in domestic textbooks.

Recall: If each x, y, \alpha \in [0,1], satisfies the following, it is a convex function (Convex):

g(\alpha x+(1-\alpha)y) \leq \alpha g(x)+(1-\alpha)g(y)

If the function g has two-order differentiability, and for all x, g"(x) ≥ 0, then the function g is a convex function. It can be shown that if the function g is a convex function, then g lies above any tangent line. If Function g is a concave function, then -g is a convex function. Examples of convex functions include g(x) = x^2 and g(x) = e^x. Examples of concave functions include g(x) = -x^2 and g(x) = log(x).

4.9 Theorem (Jensen’s inequality)

If g is a convex function, then \mathbb{E}g(X) \geq g(\mathbb{E}X). If g is a concave function, then\mathbb{E}g(X) \leq g(\mathbb{E}X)

Proof: Let be a straight line tangent L(x)=a+bx to g(x) at a point . Because g is a convex function, it is above the straight line L(x), so:\mathbb{E}(X)

\mathbb{E}g(X) \geq \mathbb{E}L(X)=\mathbb{E}(a+bx) = a + b\mathbb{E}(X)=L(\mathbb{E}(X))=g(\mathbb{E}X)

According to Jensen's inequality \mathbb{E}(X^2) \geq (\mathbb{E}X)^2, if X is a positive number, then \mathbb{E}(1/X) \geq 1/\mathbb{E}(X). Because log is a concave function, then\mathbb{E}(logX) \leq log\mathbb{E}(X) 

End of this chapter

Untranslated: references, appendices, homework

Guess you like

Origin blog.csdn.net/xiaowanbiao123/article/details/133099466