统计推断(三) Exponential Family

1. Exponential family

Definition
- PDF: $p(y;x)=\exp(\lambda(x)^T t(y)-\alpha(x)+\beta(y))$
  $y\sim \varepsilon(x;\lambda(\cdot),t(\cdot),\beta(\cdot))$
- nature statistic: $t(y)$
- nature parameter: $\lambda(x)$
- log-partition function: $\alpha(x)$
- partition function: $Z(x)=\exp(\alpha(x))$
- distribution: $\exp(\beta(y))$
正则条件(regular)：若分布族中的任意一个分布 $p(y;x)$ 都有其支集(support)与 x 无关，则为正则
- 实质上是要求 CRB 正则条件中求导和积分可换序
  $\mathbb{E}\left[\frac{\partial}{\partial x}\ln p(y;x)\right]=\int\frac{\partial}{\partial x}p(y;x)dy = \frac{\partial}{\partial x}\int_a^b p(y;x)dy = 0$
指数分布族可以有多种获得方式
- 很多分布本身可以写成指数分布族形式
  - Bernulli distribution: $y\sim \mathcal{B}(x)$
  $p(y;x)=x^y (1-x)^{(1-y)} \\ \ln p(y;x)=\left(\ln(\frac{x}{1-x})\right)y-(-\ln(1-x))$
  - Gaussian $y=[y_1,y_2]^T\sim \mathcal{N}(x,1)$
  $p(y;x)=\frac{1}{\sqrt{2\pi}}\exp\left((y_1+y_2)x-x^2-\frac{y_1^2+y_2^2}{2}\right)$
- 多个分布的几何均值
  $p(y;x)=\frac{p_1^x(y)*p_2^{(1-x)}(y)}{Z(x)} \\ \ln p(y;x)=x\ln\left(\frac{p_1(y)}{p_2(y)}\right)-\ln Z(x)+\ln p_2(y)$
  - 例如 $p_1(y)\sim \mathcal{B}(\frac{1}{1+e^{-1}}), p_2(y)\sim \mathcal{B}(1/2)$
    $p(y;x)=(\frac{1}{1+e^{-1}})^{xy}(\frac{e^{-1}}{1+e^{-1}})^{x(1-y)}(1/2)^{(1-x)}\sim \mathcal{B}(\frac{1}{1+e^{-x}}) \\ \frac{p(y=1;x)}{p(y=0;x)}=e^x$
- Tilting
  $p(y;x)=\frac{p(y)e^{xy}}{Z(x)} \\ \ln p(y;x)=xy - \ln Z(x) + \ln p(y)$
  - 例如 $p(y)\sim \mathcal{N}(0,1)$ ， $p(y;x)\sim \mathcal{N}(x,1)$
linear exponential family
- 定义： $t(x)=x$ ， $\ln p(y;x)=x\ t(y) - \alpha(x)+\beta(y)$
- 性质： $\dot{\alpha}(x)=\mathbb{E}[t(y)], \ \ \dot{\dot{\alpha}}(x)=\mathbb{E}[t^2(y)]-\mathbb{E}[t(y)]^2=Var(t(y)) = J_y(x)$
Proof：
$KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ Z(x) &= e^{\al…$

$\dot{\dot{\alpha}}(x)=\int t(y)\cdot p(y;x)\cdot (t(y)-\dot{\alpha}(x))dy \\ J_y(x) = \mathbb{E}\left[-\frac{\partial^2}{\partial x^2} \ln p(y;x)\right]=\dot{\dot{\alpha}}(x)$
指数族分布与有效统计量(efficient statistics)
- 必要条件：若有效统计量存在，则可以写成指数族分布形式，且有
  $t(x)=\int^x J_y(u)du, \ \ \ \alpha(x)=\int^x u J_y(u) du$
Proof：
$KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \hat {x}_{eff}…$
- 充分条件：对于线性指数分布族，若有 $J_y(x)$ 不依赖于 x，也即 $J_y(x)$ 等于一个常数时，有效统计量存在
Proof： $J_y(x)=J$
$\dot{\dot{\alpha}}(x)=J, \ \ \ \dot{\alpha}(x)=Jx-c \\ \hat x_{eff}(y) = x + \frac{1}{J}\frac{\partial}{\partial x}\ln p(y;x) = x + \frac{1}{J} (t(y)-\dot{\alpha}(x)) = x + \frac{1}{J}(t(y)-Jx+c)=\frac{t(y)}{J}+\frac{c}{J}$
由于
$\frac{\partial}{\partial x}\ln p(y;x)|_{x=\hat x_{ML}} = 0 = t(y) - \dot{\alpha}(x)|_{x=\hat x_{ML}}$
有
$\hat x_{eff}(y) = c/J + \frac{1}{J}\dot{\alpha}(x)|_{x=\hat x_{ML}} = \hat x_{ML}(y)$

2. Sufficient statistics

2.1 Non-Bayesian case

Definition：t(y) 是关于分布 $p_{\mathsf{y}}(\cdot;x)$ 的充分统计量，如果 $p(y|t(y);x)$ 与 x 无关

Theorem 1(likelihood characterization)：

$t(y)$ is sufficient w.r.t $p(y;x)$ $\iff \ \frac{p_{y}(y;x)}{p_t(t(y);x)}$ doesn’t depend on x, for all x and y

Proof：omit…

Theorem 2(Neyman Factorization theorem)：

$t(y)$ is sufficient w.r.t $p(y;x)$ $\iff \ 存在a(\cdot,\cdot)和b(\cdot)使得 \ \ p(y;x)=a\left(t(y),x\right) \cdot b(y)$

Proof：omit…

minimum sufficient statistic： $t^*$ 是 minimal 的，如果对任意其他充分统计量 t ，都存在 g() 使得 $t^*=g(t)$
complete： $t^*$ 是 complete 的如果对任意函数 $\phi(\cdot)$ ，有 $E[\phi(t^*(y))]=0 \ \ \forall x \iff \phi(\cdot) \equiv 0$

Theorem：complete $\Longrightarrow$ minimal

Proof：假设 t 为complete，s 为 minimal，存在 $s=g(t)$ ， $E[t]=E\left[E\left[t|s=s\right]\right]$

$E[t|s=s]=f(s)=f(g(t))=\tilde{f}(t)$

取 $\phi(t)=t-\tilde{f}(t)$ ，有 $E[\phi(t)] = 0$

根据 complete 的定义，有 $\phi(t)\equiv0 \Longrightarrow t = \tilde{f}(t)=f(s)$

故 t 也是 minimal

2.2 Bayesian case

Definition：t(y) 是关于分布 $p_{\mathsf{y,x}}(\cdot,\cdot)$ 的充分统计量，如果 $p_{\mathsf{y|t,x}}(y|t(y),x)=p_\mathsf{y|t}(y|t(y))$ 与 x 无关

Theorem(Belief characterization)：

$t(y)$ is sufficient w.r.t $p(y,x)$ $\iff \ p(x|y)=p(x|t(y))$ , for all x and y

Proof：omit…

Theorem(Neyman Factorization theorem)：

$t(y)$ is sufficient w.r.t $p(y,x)$ $\iff \ p(y|x)=p(t(y)|x)\cdot p(y|t(y))$ , for all x and y

Proof：omit…

3. Conjugate priors

Idea: Given a model $p_\mathsf{y|x}$ , look for a family of prior $p_\mathsf{x}$ such that the induced posterior $p_\mathsf{x|y}$ also in this family
Definition: a family of distribution $q(\cdot;\theta)$ is conjugate to a model $p_{y|x}$ if
- $p_{y|x}(y_1,...,y_N|x) \propto q(x;\theta)$
- $q(x;\theta_1)q(x;\theta_2)\propto q(x;\theta_3)$
Theorem: 对于采样数 N，联合分布 $p^N_{y|x}()$ 有充分统计量，且其维度不依赖于 N，则对该模型存在共轭先验分布

其他内容请看：
统计推断(一) Hypothesis Test
统计推断(二) Estimation Problem
统计推断(三) Exponential Family
统计推断(四) Information Geometry
统计推断(五) EM algorithm
统计推断(六) Modeling
统计推断(七) Typical Sequence
统计推断(八) Model Selection
统计推断(九) Graphical models
统计推断(十) Elimination algorithm
统计推断(十一) Sum-product algorithm

Bonennult

发布了42 篇原创文章 · 获赞 34 · 访问量 3万+

私信关注

统计推断(三) Exponential Family

1. Exponential family

2. Sufficient statistics

2.1 Non-Bayesian case

2.2 Bayesian case

3. Conjugate priors

猜你喜欢