PRML笔记2-关于回归参数w的先验的理解

  接上篇,现在考虑给 w \boldsymbol{w} w加入先验,考虑最简单的假设,也就是 w \boldsymbol{w} w服从均值为0,协方差矩阵为 α − 1 I \alpha^{-1}\boldsymbol{I} α1I的高斯分布。
p ( w ∣ α ) = N ( w ∣ 0 , α − 1 I ) = ( α 2 π ) ( M + 1 ) / 2 exp ⁡ { − α 2 w T w } \begin{aligned} p(\boldsymbol{w}|\alpha)&=\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ &=(\frac{\alpha}{2\pi})^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(wα)=N(w∣0,α1I)=(2πα)(M+1)/2exp{ 2αwTw}我们一步一步看一下给定 ( x , t , α , β ) (\boldsymbol{x},\boldsymbol{t},\alpha,\beta) (x,t,α,β)后,参数 w \boldsymbol{w} w的概率
p ( w ∣ t ) = p ( t ∣ w ) p ( w ) p ( t ) p ( w ∣ t , x , α , β ) = p ( t ∣ w , x , α , β ) p ( w ∣ x , α , β ) p ( t ∣ x , α , β ) \begin{aligned} p(\boldsymbol{w}|\boldsymbol{t})&=\frac{p(\boldsymbol{t}|\boldsymbol{w})p(\boldsymbol{w})}{p(\boldsymbol{t})}\\ p(\boldsymbol{w}|\boldsymbol{t},\boldsymbol{x},\alpha,\beta)&=\frac{p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)p(\boldsymbol{w}|\boldsymbol{x},\alpha,\beta)}{p(\boldsymbol{t}|\boldsymbol{x},\alpha,\beta)} \end{aligned} p(wt)p(wt,x,α,β)=p(t)p(tw)p(w)=p(tx,α,β)p(tw,x,α,β)p(wx,α,β)
由于 α \alpha α t t t独立,因此上式似然函数 p ( t ∣ w , x , α , β ) = p ( t ∣ w , x , β ) p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)=p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\beta) p(tw,x,α,β)=p(tw,x,β),而 w \boldsymbol{w} w的先验我们已经有了假设,因此得到书上的结果(此处个人理解):
p ( w ∣ x , t , α , β ) ∝ p ( t ∣ x , w , β ) p ( w ∣ α ) p(\boldsymbol{w}|\boldsymbol{x},\boldsymbol{t},\alpha,\beta)\propto p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) p(wx,t,α,β)p(tx,w,β)p(wα)
现在成了,我们最大化后验概率求 w \boldsymbol{w} w,变成了最大化似然函数 p ( t ∣ x , w , β ) p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta) p(tx,w,β)和先验概率 p ( w ∣ α ) p(\boldsymbol{w}|\alpha) p(wα)乘积的值。由于 p ( t ∣ x , w , β ) = ∏ n = 1 N N ( t n ∣ y ( x n , w ) , β − 1 ) = ∏ n = 1 N 1 ( 2 π ) 1 2 β − 1 2 e x p ( t n − y ( x n , w ) ) 2 − 2 β − 1 p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)=\prod_{n=1}^N\mathcal{N}(t_n|y(x_n,\boldsymbol{w}),\beta^{-1})=\prod_{n=1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}} p(tx,w,β)=n=1NN(tny(xn,w),β1)=n=1N(2π)21β211exp2β1(tny(xn,w))2
p ( w ∣ α ) = N ( w ∣ 0 , α − 1 I ) = ( α 2 π ) ( M + 1 ) / 2 exp ⁡ { − α 2 w T w } \begin{aligned} p(\boldsymbol{w}|\alpha)&=\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ &=(\frac{\alpha}{2\pi})^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(wα)=N(w∣0,α1I)=(2πα)(M+1)/2exp{ 2αwTw}
因此
p ( t ∣ x , w , β ) p ( w ∣ α ) = [ ∏ n = 1 N 1 ( 2 π ) 1 2 β − 1 2 e x p ( t n − y ( x n , w ) ) 2 − 2 β − 1 ] ( α 2 π ) ( M + 1 ) / 2 exp ⁡ { − α 2 w T w } \begin{aligned} p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)& =\left[\prod_{n=1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}}\right] \left(\frac{\alpha}{2\pi}\right)^{(M+1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(tx,w,β)p(wα)=[n=1N(2π)21β211exp2β1(tny(xn,w))2](2πα)(M+1)/2exp{ 2αwTw}两边取ln可得
ln ⁡ p ( t ∣ x , w , β ) p ( w ∣ α ) = − β 2 ∑ n = 1 N { y ( x n , w ) − t n } 2 + N 2 ln ⁡ β − N 2 ln ⁡ ( 2 π ) + M + 1 2 ln ⁡ α − M + 1 2 ln ⁡ 2 π − α 2 w T w \begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) &=-\frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2+\frac{N}{2}\ln{\beta}-\frac{N}{2}\ln{(2\pi)} +\frac{M+1}{2}\ln{\alpha}-\frac{M+1}{2}\ln{2\pi}-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned} lnp(tx,w,β)p(wα)=2βn=1N{ y(xn,w)tn}2+2Nlnβ2Nln(2π)+2M+1lnα2M+1ln2π2αwTw我们现在要找的是最可能的 w \boldsymbol{w} w的值,因此只考虑与 w \boldsymbol{w} w有关的部门,去掉常数可得:
ln ⁡ p ( t ∣ x , w , β ) p ( w ∣ α ) = − β 2 ∑ n = 1 N { y ( x n , w ) − t n } 2 − α 2 w T w \begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)&=-\frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned} lnp(tx,w,β)p(wα)=2βn=1N{ y(xn,w)tn}22αwTw这就相当于最小化
β 2 ∑ n = 1 N { y ( x n , w ) − t n } 2 + α 2 w T w \frac{\beta}{2}\sum_{n=1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2+\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} 2βn=1N{ y(xn,w)tn}2+2αwTw

猜你喜欢

转载自blog.csdn.net/zhuzheqing/article/details/129054447
今日推荐