Gan损失函数问题

生成器G(x)

鉴别器D(x)

r是真实的数据

z是噪声

g是生成器的分布

鉴别器损失函数

设为公式(1)

loss_{D} \left(x \right )\\=- E_{x $\sim$ p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{z $\sim$ p_{z}\left(z \right )} \left(\log \left(1- D \left(G\left(z \right ) \right ) \right ) \right ) \\= - E_{x $\sim$ p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{x $\sim$ p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right )

生成器损失函数

第一种

设为公式(2)

loss_{G} \left(x \right )\\=E_{z $\sim$ p_{z}\left(z \right )} \left(\log \left(1-D \left(G\left(z \right ) \right ) \right ) \right ) \\=E_{x $\sim$ p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) 

第二种

设为公式(3)

loss_{G} \left(x \right )\\=E_{z $\sim$ p_{z}\left(z \right )} \left(-\log \left(D \left(G\left(z \right ) \right ) \right ) \right ) \\=E_{x $\sim$ p_{g}\left(x \right )} \left(-\log \left(D \left(x \right ) \right ) \right )

最优鉴别器

对 loss_{D} \left(x \right ) 求导,令其=0

D^{*} \left(x \right ) =\frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )}

对于公式2,加上一个与g无关的项

E_{x $\sim$ p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) + E_{x $\sim$ p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right )

代入最优鉴别器

E_{x $\sim$ p_{r}\left(x \right )} \left(\log \left(D^{*}\left(x \right ) \right ) \right ) + E_{x $\sim$ p_{g}\left(x \right )} \left(\log \left(1- D^{*} \left(x \right ) \right ) \right )

\\=E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left(1- \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right )\\= E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left( \frac{p_{g}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right )

等价于

E_{p_{r}}\log \frac{p_{r}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}} + E_{p_{g}}\log \frac{p_{g}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}}-2\log2\\=D_{KL} \left(p_{r} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right )+ D_{KL} \left(p_{g} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right ) -2\log2\\=2JSD \left(p_{r}|| p_{g} \right ) -2 \log \left(2 \right )

所以越训练鉴别器他就越接近最优鉴别器

最小化生成器,G的分布就越接近真实分布

但是如果G的分布和真实分布几乎没有重叠的部分

生成器的损失就会趋于一个常数

证明:

      因为几乎没有重叠,所以,对于x,p_{r}\neq 0时,p_{g}\rightarrow 0 ,p_{g}\neq 0时,p_{r}\rightarrow 0

      进而JS散度为0生成器损失为常数-2 \log \left(2 \right ),梯度为0,就无法训练了

对于公式3

D_{KL} \left(p_{g}|| p_{r} \right ) \\= E_{p_{g}} \left( \log \frac{p_{g}}{p_{r}} \right )\\= E_{p_{g}} \left( \log \frac{\frac{p_{g}}{p_{r}\left(x \right )+p_{g}\left(x \right )}}{\frac{p_{r}}{p_{r}\left(x \right )+p_{g}\left(x \right )}} \right )\\= E_{p_{g}} \left(\log \frac{1-D^{*}\left(x \right )}{D^{*}\left(x \right )} \right )\\= E_{p_{g}}\left( \log \left(1-D^{*} \left(x \right ) \right ) \right ) - E_{p_{g}} \log \left( D^{*} \left(x \right ) \right )

所以

E_{p_{g}} \left(- \log \left(D^{*} \left(x \right ) \right ) \right ) \\= D_{KL} \left(p_{g} || p_{r} \right ) - E_{p_{g}} \left( \log \left(1- D^{*} \left(x \right ) \right ) \right )\\= D_{KL} \left(p_{g} || p_{r} \right ) - 2JSD\left(p_{r}||p_{g} \right )+2\log \left(2 \right )+E_{p_{r}} \left( \log \left( D^{*} \left(x \right ) \right ) \right )

后面两项训练生成器时相当于常数

所以就等价于前两项

最小化KL散度时JS散度就会变大

这个就矛盾了,又要让他们相似,又要拉远

参考

https://blog.csdn.net/Invokar/article/details/88917214

发布了93 篇原创文章 · 获赞 83 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/qq_39942341/article/details/104177165
今日推荐