统计推断(八) Model Selection

1.Bayesian Approach

  • Consider a nested sequence of model classes
    P 1 P 2 P 3 \mathcal{P}_{1} \subset \mathcal{P}_{2} \subset \mathcal{P}_{3} \subset \cdots

  • ML decision rule:
    m ^ = arg max m { max p P m p ( y ) } = arg max m { max a p y x , H ( y a , H m ) } \hat{m}=\arg \max _{m}\left\{\max _{p \in \mathcal{P}_{m}} p(\boldsymbol{y})\right\}=\arg \max _{m}\left\{\max _{a} p_{y | x, H}\left(\boldsymbol{y} | a, H_{m}\right)\right\}

2. Laplace’s Method

  • 连续分布
    p × ( x ) = p 0 ( x ) Z p p_{\times}(x)=\frac{p_{0}(x)}{Z_{p}}

  • 用 taylor 级数近似似然函数
    ln p 0 ( x ) ln p ( x ^ ) + ( x x ^ ) d d x ln p 0 ( x ) x = x ^ + 1 2 ( x x ^ ) 2 d 2 d x 2 ln p 0 ( x ) x = x ^ p 0 ( x ) p 0 ( x ^ ) exp [ 1 2 J y = y ( x ^ ) ( x x ^ ) 2 ] \ln p_{0}(x) \approx \ln p(\hat{x})+\left.(x-\hat{x}) \frac{\mathrm{d}}{\mathrm{d} x} \ln p_{0}(x)\right|_{x=\hat{x}}+\left.\frac{1}{2}(x-\hat{x})^{2} \frac{\mathrm{d}^{2}}{\mathrm{d} x^{2}} \ln p_{0}(x)\right|_{x=\hat{x}} \\ p_{0}(x) \approx p_{0}(\hat{x}) \exp \left[-\frac{1}{2} J_{\mathbf{y}=\boldsymbol{y}}(\hat{x})(x-\hat{x})^{2}\right]

3. Bayes Information Criterion

  • MAP decision rule:
    m ^ = arg max m p y H ( y H m ) \hat{m}=\arg \max _{m} p_{\mathbf{y} | \mathbf{H}}\left(\boldsymbol{y} | H_{m}\right)
    其中
    p y H ( y H m ) = p y x , H ( y x , H m ) p x H ( x H m ) d x p_{\mathbf{y} | \mathbf{H}}\left(\boldsymbol{y} | H_{m}\right)=\int p_{\mathbf{y} | \mathbf{x}, \mathbf{H}}\left(\boldsymbol{y} | x, H_{m}\right) p_{\mathbf{x} | \mathbf{H}}\left(x | H_{m}\right) \mathrm{d} x

    q 0 ( x ) = p y x , H ( y x , H m ) p x H ( x H m ) p x y , H ( x y , H m ) q_{0}(x)=p_{\mathbf{y} | \mathbf{x}, \mathbf{H}}\left(\boldsymbol{y} | x, H_{m}\right) p_{\mathbf{x} | \mathbf{H}}\left(x | H_{m}\right) \propto p_{\mathbf{x} | \mathbf{y}, \mathbf{H}}\left(x | \boldsymbol{y}, H_{m}\right)
    可以有
    p y H ( y H ) = q 0 ( x ) d x p y x , H ( y x ^ , H ) p x H ( x ^ H ) 2 π J y 1 ( x ^ ) p_{\mathrm{y} | \mathrm{H}}(\boldsymbol{y} | H)=\int q_{0}(x) \mathrm{d} x \approx p_{\mathrm{y} | x, \mathrm{H}}(\boldsymbol{y} | \hat{x}, H) p_{\mathrm{x} | \mathrm{H}}(\hat{x} | H) \sqrt{2 \pi J_{\mathrm{y}}^{-1}(\hat{x})}
    其中最后一项为 Occam’s razor factor

其他内容请看:
统计推断(一) Hypothesis Test
统计推断(二) Estimation Problem
统计推断(三) Exponential Family
统计推断(四) Information Geometry
统计推断(五) EM algorithm
统计推断(六) Modeling
统计推断(七) Typical Sequence
统计推断(八) Model Selection
统计推断(九) Graphical models
统计推断(十) Elimination algorithm
统计推断(十一) Sum-product algorithm

发布了42 篇原创文章 · 获赞 34 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/weixin_41024483/article/details/104165247