本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

Bayesian Least Mean Squares Estimation

In this section, we discuss in more detail the conditional expectation estimator. In particular, we show that it results in the least possible mean squared error (LMS).

We start by considering the simpler problem of estimating $\Theta$ with a constant $\hat\theta$ , in the absence of an observation $X$ . The estimation error $\hat\theta-\Theta$ is random (because $\Theta$ is random), but the mean squared error $E[(\hat\theta-\Theta)^2]$ is a number that depends on $\hat\theta$ , and can be minimized over $\hat\theta$ .
$E[(\hat\theta-\Theta)^2]=var(\hat\theta-\Theta)+(E[\hat\theta-\Theta])^2=var(\Theta)+(E[\Theta]-\hat\theta)^2$ It turns that the best possible estimate is to set $\hat\theta$ equal to $E[\Theta]$
Suppose now that we use an observation $X$ to estimate $\Theta$ , so as to minimize the mean squared error. Once we know the value $x$ of $X$ , the situation is identical to the one considered earlier, except that we are now in a new “universe,” where everything is conditioned on $X = x$ . We can therefore adapt our earlier conclusion and assert that the conditional expectation $E[\Theta|X=x]$ minimizes the conditional mean squared error $E[(\hat\theta-\Theta)^2|X=x]$ over all constants $\hat\theta$ .
Generally, the (unconditional) mean squared estimation error associated with an estimator $g (X)$ is defined as
$E[(\Theta-g(X))^2]$ For any given value $x$ of $X$ , $g (x)$ is a number, and therefore,
$E[(\Theta-E[\Theta|X=x])^2|X=x]\leq E[(\Theta-g(x))^2|X=x]$ Thus,
$E[(\Theta-E[\Theta|X])^2|X]\leq E[(\Theta-g(X))^2|X]$ which is now an inequality between random variables (functions of $X$ ). We take expectations of both sides, and use the law of iterated expectations, to conclude that
$E[(\Theta-E[\Theta|X])^2]\leq E[(\Theta-g(X))^2]$ If we view $E[\Theta|X]$ as an estimator/function of $X$ , the preceding analysis shows that out of all possible estimators, the mean squared estimation error is minimizes when $g(X)=E[\Theta|X]$ .

Example 8.11.
Let $\Theta$ be uniformly distributed over the interval $[4, 10]$ and suppose that we observe $\Theta$ with some random error $W$ . In particular, we observe the value of the random variable
$X=\Theta+W$ where we assume that $W$ is uniformly distributed over the interval $[- 1.1]$ and independent of $\Theta$ . What is the LMS estimate of $\Theta$ ?

SOLUTION

To calculate $E[\Theta|X = x]$ , we note that $f_\Theta(\theta) = 1/6$ , if $4\leq\theta\leq 10$ , and $f_\theta(\theta) = 0$ , otherwise. Conditioned on $\Theta$ being equal to some $\theta$ , $X$ is uniformly distributed over the interval $[\theta- 1, \theta + 1]$ . Thus, the joint PDF is given by
$f_{\Theta,X}(\theta,x)=f_\Theta(\theta)f_{X|\Theta}(x|\theta)=\frac{1}{6}\cdot\frac{1}{2}=\frac{1}{12}$ if $4\leq\theta\leq10$ and $\theta-1\leq x\leq\theta +1$ , and is zero for all other values of $(\theta, x)$ . The parallelogram in the right-hand side of Fig. 8.8 is the set of pairs $(\theta, x)$ for which $f_{\Theta,X}(\theta, x)$ is nonzero.
Given that $X = x$ , the posterior PDF $f_{\Theta|X}$ is uniform on the corresponding vertical section of the parallelogram. Thus $E[\Theta|X = x]$ is the midpoint of that section, which in this example happens to be a piecewise linear function of $x$ .

Problem 13.

(a) Let $Y_1, ... , Y_n$ be independent identically distributed random variables and let $Y =Y_1+···+Y_n$ . Show that
$E[Y_1|Y]=\frac{Y}{n}$
(b) Let $\Theta$ and $W$ be independent zero-mean normal random variables, with positive integer variances $k$ and $m$ , respectively. Use the result of part (a) to find $E[\Theta |\Theta + W]$ .
$(c)$ Repeat part (b) for the case where $\Theta$ and $W$ are independent Poisson random variables with integer means $\lambda$ and $μ$ , respectively.

SOLUTION

(a) By symmetry, we see that $E[Y_i| Y]$ is the same for all $i$ . Furthermore,
$E[Y_1 +· · ·+ Y_n | Y] = E[Y | Y] = Y$ Therefore, $E[Y_1|Y]=\frac{Y}{n}$ .
(b) We can think of $\Theta$ and $W$ as sums of independent standard normal random variables:
$\Theta=\Theta_1+...+\Theta_k,\ \ \ \ \ W=W_1+...+W_m$ We identify $Y$ with $\Theta + W$ and use the result from part (a), to obtain
$E[\Theta_i|\Theta+W]=\frac{\Theta+W}{k+m}$ Thus,
$E[\Theta|\Theta+W]=kE[\Theta_i|\Theta+W]=\frac{k}{k+m}(\Theta+W)$
$(c)$ We recall that the sum of independent Poisson random variables is Poisson. Thus the argument in part (b) goes through, by thinking of $\Theta$ and $W$ as sums of $\lambda$ (respectively, $μ$ ) independent Poisson random variables with mean one. We then obtain
$E[\Theta|\Theta+W]=\frac{\lambda}{\lambda+\mu}(\Theta+W)$

Some Properties of the Estimation Error

Let us use the notation
$\hat\Theta=E[\Theta|X],\ \ \ \ \ \ \ \ \ \ \ \ \ \tilde\Theta=\hat\Theta-\Theta$ for the LMS estimator and the associated estimation error, respectively. The random variables $\hat\Theta$ and $\tilde\Theta$ have a number of useful properties, which were derived in Section 4.3.

Example 8.14.

Let us say that the observation $X$ is $u n i n f o r m a t i v e$ if the mean squared estimation error $E[\tilde\Theta^2]= var(\tilde\Theta)$ is the same as $var(\Theta)$ , the unconditional variance of $\Theta$ . When is this the case?
Using the formula
$var(\Theta) = var(\tilde\Theta) + var(\hat\Theta)$ we see that $X$ is uninformative if and only if $var(\hat\Theta)=0$ . The variance of a random variable is zero if and only if that random variable is a constant, equal to its mean. We conclude that $X$ is uninformative if and only if the estimate $\hat\Theta = E[\Theta]$ , for every value of $X$ .
If $\Theta$ and $X$ are independent, we have $\hat\Theta=E[\Theta |X = x] = E[\Theta]$ for all $x$ , and $X$ is indeed uninformative, which is quite intuitive. The converse, however, is not true: it is possible for $E[\Theta |X = x]$ to be always equal to the constant $E[\Theta]$ , without $\Theta$ and $X$ being independent. (In fact, if $E[\Theta |X = x]=E[\Theta]$ , it can be derived that $\Theta$ and $X$ are uncorrelated.)

The Case of Multiple Observations and Multiple Parameters

The preceding argument and its conclusions apply even if $X$ is a vector of random variables, $X = (X_1, ... , X_n)$ . Thus, the mean squared estimation error is minimized if we use $E[\Theta|X_1, ... , X_n]$ as our estimator
$E[(\Theta-E[\Theta|X_1, ... , X_n])^2]\leq E[(\Theta-g(X_1, ... , X_n))^2]$
This provides a complete solution to the general problem of LMS estimation, but is often difficult to implement, for the following reasons:
- (a) In order to compute the conditional expectation $E[\Theta|X_1,...,X_n]$ , we need a complete probabilistic model, that is, the joint PDF $f_{\Theta,X_1, ... ,X_n}$ .
- (b) Even if this joint PDF is available, $E[\Theta|X_1, ... , X_n]$ can be a very complicated function of $X_1, ... , X_n$ .
As a consequence, practitioners often resort to approximations of the conditional expectation or focus on estimators that are not optimal but are simple and easy to implement.
- The most common approach, discussed in the next section, involves a restriction to linear estimators.

Finally, let us consider the case where we want to estimate multiple parameters $\Theta_1, ... , \Theta_m$ . It is then natural to consider the criterion
$E[(\Theta_1-\hat\Theta_1)^2]+...+E[(\Theta_m-\hat\Theta_m)^2]$ and minimize it over all estimators $\hat\Theta_1, ... , \hat\Theta_m$ . But this is equivalent to finding, an each $i$ , an estimator $\hat\Theta_i$ that minimizes $E[(\Theta_i-\hat\Theta_i)^2]$ , so that we are essentially dealing with $m$ decoupled estimation problems, one for each unknown parameter $\Theta_i$ , yielding $\hat\Theta_i=E[\Theta_i|X_1,...,X_n]$ , for all $i$ .

Chapter 8 (Bayesian Statistical Inference): Bayesian Least Mean Squares Estimation (贝叶斯最小均方估计)

目录

Bayesian Least Mean Squares Estimation

Some Properties of the Estimation Error

The Case of Multiple Observations and Multiple Parameters

猜你喜欢