Contents of this chapter:
- 4.1 Probability Inequality
- 4.2 Expectation inequality
As for the key nouns, some words may not convey the meaning, so the key nouns are organized as follows
1. Inequalities: Inequalities
2. Markov's Inequality: Markov's Inequality
3. Chebyshev's Inequality: Chebyshev's Inequality
4. Hoeffding's inequality: Hoeffding's inequality
5. Confidence Interval: Confidence Interval
6. Cauchy-Schwartz inequality: Cauchy-Schwartz inequality
7. Mill's Inequality: Mill's Inequality
8. Jensen's inequality: Jensen's inequality
4.1 Probability Inequality
Inequalities are useful for quantities that may be difficult to calculate, and can be used to define upper and lower bounds. It will also be used in the next chapter on convergence theory. Our first inequality is Markov's Inequalities
4.1 Theorem (Markov’s inequality)
Assume X is a non-negative random variable, assuming it exists. For any t>0, there is:
prove:
Because X>0, so:
4.2 Theorem ( Chebyshev’s inequality )
Assume , ,then:
,and
Among them, .In fact, ,
prove:
We use Markov's inequality to prove,
We can prove the second inequality by replacing t with
4.3 Example
Suppose we test a prediction method, such as a neural network, on a new set of n test examples. If the prediction is wrong, let Xi = 1, if the prediction is correct, let Xi = 0. Then is the observed error rate. Each Xi can be viewed as a Bernoulli random variable with unknown expectation p. We want to know the true but unknown error rate p. So what is the probability that it is not near p ?
We have , then:
Because for all p, there is . If , , then the upper bound of the above formula is: 0.0625
Heffding's inequality is similar in spirit to Markov's inequality, but it is a stricter inequality. We present the results here in two parts.
4.4 Theorem (Hefting’s inequality)
Suppose Y1..Yn are independent observation values, satisfying: , , Suppose , then for any t>0, there is:
4.5 Theorem (Hefting’s inequality)
Suppose , then for any , there are:
in
4.6 Example
Assume , n=100, according to Chebyshev's inequality:
According to Heffding's inequality, we have
Heffding's inequality provides us with a simple way to create a confidence interval for the binomial distribution parameter p. We will discuss confidence intervals in detail later (Chapter 6), but here is the basic idea. Fix a positive number a, let
According to Heffding's inequality, we have
Let . Then , therefore . That is, the random interval C contains the true parameter value p with probability 1 - a; we call C a 1 - a confidence interval. More on that later.
The following inequalities are useful for qualifying probability states associated with normal random variables. (requires proofreading)
4.7 Theorem (Mill’s inequality)
Suppose , then
4.2 Expectation inequality
This section contains two inequalities about expected values
4.8 Theorem (Cauchy-Schwarz Inequality)
If X and Y have finite variance, then
Note: The definition of the concave and convex functions below is opposite to that in domestic textbooks.
Recall: If each x, y, , satisfies the following, it is a convex function (Convex):
If the function g has two-order differentiability, and for all x, g"(x) ≥ 0, then the function g is a convex function. It can be shown that if the function g is a convex function, then g lies above any tangent line. If Function g is a concave function, then -g is a convex function. Examples of convex functions include g(x) = x^2 and g(x) = e^x. Examples of concave functions include g(x) = -x^2 and g(x) = log(x).
4.9 Theorem (Jensen’s inequality)
If g is a convex function, then . If g is a concave function, then
Proof: Let be a straight line tangent to g(x) at a point . Because g is a convex function, it is above the straight line L(x), so:
According to Jensen's inequality , if X is a positive number, then . Because log is a concave function, then
End of this chapter
Untranslated: references, appendices, homework