信息熵,相对熵,KL散度,JS散度

信息量

-\log \left ( p\left(x \right ) \right )

p为概率

信息熵

H\left(X \right )= -\sum_{x\in X} p\left(x \right )\log \left(p\left(x \right ) \right )=- E_{x $\sim$ p} \left(\log \left(p\left(x \right )) \right )

log的底数一般是2或者e

性质:非负

相对熵(KL散度)

D_{KL} \left(P||Q \right ) = \sum_{x \in X} p\left(x \right ) \log \left(\frac{p\left(x \right )}{q\left(x \right )} \right )

用量衡量P分布(真实分布)和Q分布(训练出来的) 的距离(不是真正意义的距离)

更像是衡量一个分布相比另一个分布的信息损失

越小越相似

性质

1.非负

证明:

\log \left(x \right ) \leq x-1 当且仅当x=1时取等 (log的底数大于1)

D_{KL} \left(P||Q \right ) \\= -\sum_{x \in X} p\left(x \right ) \log \left(\frac{q\left(x \right )}{p\left(x \right )} \right )\\ \geq -\sum_{x \in X} p\left(x \right ) \left(\frac{q\left(x \right )}{p\left(x \right )} -1 \right ) \\= -\sum_{x \in X} q\left(x \right ) +\sum_{x \in X} p\left(x \right ) \\= 0

当且仅当P=Q时取等

也可以用Jensen不等式

2.非对称

很显然

3.值域

\left [ 0,+ \infty \right )

交叉熵

H\left(P,Q \right ) =- \sum_{x \in X} p\left(x \right ) \log \left(q\left(x \right ) \right ) = H\left(x \right )+KL\left(P||Q \right )

由于H(x)是训练集的,所以可以看作常数,所以训练的时候用交叉熵和KL散度几乎是一样的

JS散度

JSD\left(P||Q \right ) = \frac{1}{2} D_{KL} \left(P,\frac{P+Q}{2} \right ) +\frac{1}{2} D_{KL} \left(Q,\frac{P+Q}{2} \right )

性质

1.对称性

很显然

2.值域

\left [ 0,log \left(2 \right ) \right ]

当log以2为底时值域[0,1],当以e为底时值域 \left [ 0,\ln \left(2 \right ) \right ]

证明

JSD\left(P||Q \right ) = \frac{1}{2} D_{KL} \left(P,\frac{P+Q}{2} \right ) +\frac{1}{2} D_{KL} \left(Q,\frac{P+Q}{2} \right )\\ =\frac{1}{2} \sum_{x \in X} p\left(x \right ) \log \left(\frac{p\left(x \right )}{\frac{p\left(x \right ) + q\left(x \right )}{2}} \right )+ \frac{1}{2} \sum_{x \in X} q\left(x \right ) \log \left(\frac{q\left(x \right )}{\frac{p\left(x \right ) + q\left(x \right )}{2}} \right )\\ =\frac{1}{2}\sum_{x \in X} p\left(x \right ) \log \left(2 \right ) + \frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(2 \right ) + \frac{1}{2}\sum_{x \in X} p\left(x \right ) \log \left(\frac{p\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )+ \frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(\frac{q\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )

因为 \sum_{x \in X} p\left(x \right ) =1

\sum_{x \in X} q\left(x \right ) =1

所以

=\log \left(2 \right ) + \frac{1}{2}\sum_{x \in X} p\left(x \right ) \log \left(\frac{p\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )+ \frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(\frac{q\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )

当P=Q时

JSD\left(P||Q \right ) \\= \log \left(2 \right ) +\frac{1}{2}\sum_{x \in X} p\left(x \right ) \log \left(\frac{1}{2} \right )+\frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(\frac{1}{2} \right ) \\= \log \left(2 \right ) - \frac{1}{2} \log \left(\frac{1}{2} \right ) - \frac{1}{2} \log \left(\frac{1}{2} \right )\\=0

当P=0,或者Q=0

以P=0为例

JSD\left(P||Q \right )\\= \log \left(2 \right ) + \frac{1}{2}\sum_{x \in X} 0 \log \left(\frac{0}{0+ q\left(x \right )} \right )+ \frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(\frac{q\left(x \right )}{0+ q\left(x \right )} \right ) \\= \log \left(2 \right )

下界

JSD\left(P||Q \right ) \\= \frac{1}{2} D_{KL} \left(P,\frac{P+Q}{2} \right ) +\frac{1}{2} D_{KL} \left(Q,\frac{P+Q}{2} \right )\\ \geq 0 + 0 = 0

因为KL散度非负

上界

JSD\left(P||Q \right ) \\=\log \left(2 \right ) + \frac{1}{2}\sum_{x \in X} p\left(x \right ) \log \left(\frac{p\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )+ \frac{1}{2}\sum_{x \in X} q\left(x \right ) \log \left(\frac{q\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right )\\ \leq \log \left(2 \right ) + \frac{1}{2} \sum_{x \in X} p\left(x \right ) \times 0 + \frac{1}{2} \sum_{x \in X} q\left(x \right ) \times 0 \\= \log \left(2 \right )

(因为p和q是概率,值域[0,1],  \frac{p\left(x \right )}{p\left(x \right )+ q\left(x \right )} 取值最大为1,所以 \log \left(\frac{p\left(x \right )}{p\left(x \right )+ q\left(x \right )} \right ) \leq 0  )

发布了93 篇原创文章 · 获赞 83 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/qq_39942341/article/details/104176397
今日推荐