pLSA参数估计的EM算法推导

pLSA 可参考 概率潜在语义分析(PLSA)(probabilistic latent semantic analysis)

EM算法可参考 ​​​​​​EM(期望最大)算法推导以及实例计算

拉格朗日乘子法可参考 如何理解拉格朗日乘子法?

好啦,下面开始做一个简单的推导。

首先我们先回顾一个 pLSA,以生成模型为例。

生成模型假设在话题 z 给定条件下,单词w 和文本 d 条件独立,即:

p(w,z|d)=p(z|d)p(w|d,z)=p(z|d)p(w|z)

设单词集合为 W=\{ w_{1},w_{2},\cdots ,w_{M}\},文本集合为 D=\{ d_{1},d_{2},\cdots ,d_{N}\},话题集合为Z=\{ z_{1},z_{2},\cdots ,z_{K}\}。给定单词-文本共现数据 T=\{n(w_{i},d_{j})\} ,  i=1,2,\cdots ,M ,j=1,2,\cdots ,N 。n(w_{i},d_{j}) 为单词 w_{i} 在文本 d_{j} 中出现的次数。

我们先求 pLSA 场景下的 Q 函数,然后极大化 Q 函数得到目前迭代的参数最优解。Q 函数如下:

Q(\theta ,\theta ^{(i)})= (\sum_{Z}^{}p(Z|Y,\theta ^{(i)})\ log\ p(Y|Z,\theta )p(Z|\theta ))

E步

E步:我们可以将 Q 函数先简单推导下:下述公式里的 n 指的是所有的单词-文本对,即(w_{i},d_{j})

Q(\theta ,\theta ^{(i)}) \\ \\ =\sum_{Z}^{}p(Z|Y,\theta ^{(i)})\ log\ p(Y|Z,\theta )p(Z|\theta ) \\ \\ =\sum_{Z}^{}\ log\ \{[p(Y|Z,\theta )p(Z|\theta )] ^{p(Z|Y,\theta ^{(i)})} \} \\ \\ =\sum_{Z}^{}\ log\ \prod_{j=1}^{N}\{[p(y_{j}|Z,\theta )p(Z|\theta )] ^{p(Z|y_{j},\theta ^{(i)})} \} \\ \\ =\sum_{Z}^{}\ \sum_{j=1}^{n}log\ \{[p(y_{j}|Z,\theta )p(Z|\theta )] ^{p(Z|y_{j},\theta ^{(i)})} \} \\ \\ =\sum_{j=1}^{n}\sum_{Z}^{}\ log\ \{[p(y_{j}|Z,\theta )p(Z|\theta )] ^{p(Z|y_{j},\theta ^{(i)})} \} \\ \\ =\sum_{j=1}^{n}\sum_{z}^{}\ p(z|y_{j},\theta ^{(i)})log\ [p(y_{j}|z,\theta )p(z|\theta )] \\ \\ =\sum_{j=1}^{n}\sum_{z}^{}\ p(z|y_{j},\theta ^{(i)})log\ [p(z,y_{j}|\theta )]

                                        \\ \\ =\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(z_{k},w_{i},d_{j})] \\ \\ =\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(d_{j})p(z_{k},w_{i}|d_{j})] \\ \\ =\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(d_{j})p(z_{k}|d_{j})p(w_{i}|z_{k})]

=\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(z_{k}|d_{j})p(w_{i}|z_{k})]+\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(d_{j})]

Q函数中的 p(z_{k}|w_{i},d_{j}) 可以根据贝叶斯公式计算:

p(z_{k}|w_{i},d_{j}) \\ \\=\frac{p(z_{k},w_{i},d_{j})}{p(w_{i},d_{j})} \\ \\ \\=\frac{p(d_{j})p(z_{k}|d_{j})p(w_{i}|z_{k})}{p(d_{j})\sum_{k=1}^{K}p(z_{k}|d_{j})p(w_{i}|z_{k})} \\ \\ \\ =\frac{p(z_{k}|d_{j})p(w_{i}|z_{k})}{\sum_{k=1}^{K}p(z_{k}|d_{j})p(w_{i}|z_{k})}

其中 p(z_{k}|d_{j}) 和 p(w_{i}|z_{k}) 由上一步迭代得到。所以我们可以把p(z_{k}|w_{i},d_{j})当作一个常数。

可以从数据中直接统计得出 p(d_{j}) 的估计, 由于我们只考虑 p(z_{k}|d_{j}) 和 p(w_{i}|z_{k}) 估计,Q 函数的第二项和参数估计无关,将第一项记做 Q{}' ,那么:

{Q}'=\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})\sum_{k=1}^{K}\ p(z_{k}|w_{i},d_{j})log\ [p(z_{k}|d_{j})p(w_{i}|z_{k})]

M步

M步:极大化Q函数,即极大化Q{}' 函数

通过约束最优化求解 Q{}' 函数的极大值。这时 p(z_{k}|d_{j}) 和 p(w_{i}|z_{k}) 是变量。因为变量  p(z_{k}|d_{j}) 和 p(w_{i}|z_{k}) 形成概率分布,满足约束条件:

\sum_{i=1}^{M}p(w_{i}|z_{k})=1 \ \ \ , k=1,2,\cdots ,K

\sum_{k=1}^{K}p(z_{k}|d_{j})=1 \ \ \ , j=1,2,\cdots ,N

应用拉格朗日法,引入拉格朗日乘子 \tau _{k} 和 \rho _{j} ,定义拉格朗日函数 \Lambda :

\Lambda ={Q}'+\sum_{k=1}^{K}\tau _{k}(1-\sum_{i=1}^{M}p(w_{i}|z_{k}))+\sum_{j=1}^{N}\rho _{j}(1-\sum_{k=1}^{K}p(z_{k}|d_{j}))

将拉格朗日函数\Lambda 分别对  p(z_{k}|d_{j}) 和 p(w_{i}|z_{k}) 和  \tau _{k} 和 \rho _{j} 求偏导数,并令其为0,得到下面的方程组:

p(z_{k}|d_{j}) 求偏导得到:

\sum_{i=1}^{M}\frac{n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{p(z_{k}|d_{j})}-\rho _{j}=0 \\ \\ \\\frac{ \sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{p(z_{k}|d_{j})}-\rho _{j}=0 \\ \\ \sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})=\rho _{j}p(z_{k}|d_{j}) \\ \\ p(z_{k}|d_{j})=\frac{\sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\rho _{j}}

对​​​​​​​p(w_{i}|d_{j})求偏导得到: 

\sum_{j=1}^{N}\frac{n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{p(w_{i}|z_{k})}-\tau _{k}=0 \\ \\ \\ \frac{ \sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{p(w_{i}|z_{k})}-\tau _{k}=0 \\ \\ \sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})=\tau _{k}p(w_{i}|z_{k}) \\ \\ p(w_{i}|z_{k})=\frac{\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\tau _{k}}

对  \tau _{k} 和 \rho _{j} 的偏导数为0 就是:

\sum_{i=1}^{M}p(w_{i}|z_{k})=1 \ \ \ , k=1,2,\cdots ,K

\sum_{k=1}^{K}p(z_{k}|d_{j})=1 \ \ \ , j=1,2,\cdots ,N

我们把四个式子放在一起联立起来:

\begin{Bmatrix} p(z_{k}|d_{j})=\frac{\sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\rho _{j}}\\ \\p(w_{i}|z_{k})=\frac{\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\tau _{k}}\\ \\ \sum_{i=1}^{M}p(w_{i}|z_{k})=1\\ \\\sum_{k=1}^{K}p(z_{k}|d_{j})=1\end{Bmatrix}

 可以解得:

\rho _{j}=\sum_{k=1}^{K}\sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j}) \\ \\ \rho _{j}=\sum_{i=1}^{M}n(w_{i},d_{j})\sum_{k=1}^{K}p(z_{k}|w_{i},d_{j}) \\ \\ \rho _{j}=\sum_{i=1}^{M}n(w_{i},d_{j}) \\ \\ \rho _{j}=n(d_{j})

(其中n(d_{j}) 表示文本 d_{j} 中的单词个数。)

\tau _{k}=\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})

p(z_{k}|d_{j})=\frac{\sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{n(d_{j})}

p(w_{i}|z_{k})=\frac{\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}

总结

我们总结一下 pLSA 参数估计的EM算法。

(1)设置参数  p(z_{k}|d_{j}) 和 ​​​​​​​p(w_{i}|z_{k})  的初始值。

(2)迭代执行以下E步,M步,直到收敛为止。

        E步:

p(z_{k}|w_{i},d_{j}) \\ \\=\frac{p(z_{k}|d_{j})p(w_{i}|z_{k})}{\sum_{k=1}^{K}p(z_{k}|d_{j})p(w_{i}|z_{k})}

        M步:

p(z_{k}|d_{j})=\frac{\sum_{i=1}^{M}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{n(d_{j})}

p(w_{i}|z_{k})=\frac{\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}{\sum_{i=1}^{M}\sum_{j=1}^{N}n(w_{i},d_{j})p(z_{k}|w_{i},d_{j})}

好啦,到这儿就结束啦,如有不正确的地方欢迎留言吖~

猜你喜欢

转载自blog.csdn.net/qq_32103261/article/details/120910246
今日推荐