2018-04-22 开胃学习数学系列 - 交叉熵

简单说说交叉熵。前一篇简要提及了纯粹的信息熵。

但是很多时候我们没有准确的真实分布，我们可能有个非真实的分布。交叉熵用来衡量在给定的真实分布下，使用非真实分布所指定的策略消除系统的不确定性所需要付出的努力的大小。

交叉熵越低，这个策略就越好，最低的交叉熵也就是使用了真实分布所计算出来的信息熵，因为此时 pk = qk ，交叉熵 = 信息熵。这也是为什么在机器学习中的分类算法中，我们总是最小化交叉熵，因为交叉熵越低，就证明由算法所产生的策略最接近最优策略，也间接证明我们算法所算出的非真实分布越接近真实分布。

Cross entropy is a measure of incremental information in a distribution p in relative to a prior distribution q;

这个是网上找到的信息：

Properties of cross entropy 交叉熵的性质

a measure of the lack of incremental information in p from q，
这是衡量p分布中提取信息，和q对比，缺少的增量信息
reduces to regular entropy when the prior q is uniform
当prior q是uniform 的时候，交叉熵会减少到regular 熵
maximized with value 0 when p=q
当p = q时，最大值为0
no additional information. Proof: apply the Lagrange multiplier
就是没有额外的信息，应用拉格朗日乘子，可以证明
h(p|q)=−∞ if pk>0 for some qk=0
new discovery adds infinite amount of new information
如果对于某些准确分布中qk = 0，新分布pk> 0，则h（p | q）= - ∞
意味着新的发现，增加了无限量的新信息
finite if pk=0 for some qk>0
not much value for disapproving an existing theory
如果对于某些准确分布qk> 0，新分布pk = 0，说明对现有理论的修整拒绝并没有多大价值
-Prior and posterior distributions are asymmetric h(p|q)≠h(q|p)
先验分布和后验分布是不对称的h（p | q）≠h（q | p）
Kullback–Leibler distance is actually a misnomer
Kullback-Leibler距离实际上是一个误称

市场中的Prior beliefs概念

Prior beliefs are common in the market，Prior beliefs在市场上很常见：
e.g., stock returns are normally distributed 例如股票收益通常是正态分布的
Maximizing cross entropy is an ideal objective function to capture prior beliefs
最大化交叉熵是捕捉prior beliefs的理想目标方式

Only introduce minimal perturbation to the prior beliefs q
只对 prior beliefs q引入最小的扰动 minimal perturbation
While incorporating additional constraints to the distribution p
在分布p时，加入附加约束

Cross entropy optimization with bid/ask

Take the discrete form of cross entropy:
由于维度和限制，这个问题很难解决：

A is a matrix that computes benchmark prices from the distribution
A是从分布中，计算benchmark prices 的矩阵
b is the observed mid price of benchmark instruments
b是观察到的benchmark instruments 的中间价格
e is the pricing error to the mid price
e是中间价格的error
W is a diagonal penalty matrix, we usually choose W−1=αE
W是一个对角惩罚矩阵，我们通常选择W^-1 =αE
where E is a diagonal matrix of bid/ask spreads
其中E是买卖差价的对角矩阵，但是怎么写呢？
α controls the trade off between fit quality and entropy
α控制 fit quality 和熵之间的权衡

问题：

Prior beliefs概念：只对 prior beliefs q引入最小的扰动 minimal perturbation 什么意思
Cross entropy optimization with bid/ask: W是一个对角惩罚矩阵，我们通常选择W^-1 =αE

交叉熵容易跟相对熵搞混，二者联系紧密，但又有所区别。假设有两个分布p，q，则它们在给定样本集上的交叉熵定义如下：
CEH(p,q)=Ep[−logq]=−∑x∈Xp(x)logq(x)=H(p)+DKL(p||q)