Entropy: the average information Entropy :H(X)=E[I(X)]=E(logP(X)1)=∑x∈XP(x)logP(x)1 Relative entropy: D(P∣∣Q)=∑x∈XP(x)logQ(x)P(x) KL-divergenceD(P∣∣Q) = Relative entropy Cross entropy: H(P∣∣Q)=∑x∈XP(x)logQ(x)1
Minimizing Relative entropy is equivalent to minize Cross Entropy.
Conditional Entropy : How much uncertainty left given the other. First recall: Conditional Expectation E(Y∣X=x) is fixed number E(Y∣X) is a random variable of X
H(Y∣X=x) is analogous to conditional expectation taken the value on x H(Y∣X) is a little bit different than the above conditional expectation. Here, we take another expectation on X. so that: H(Y∣X)=x∈X∑p(x)H(Y∣X=x) =−x∈X∑p(x)y∈Y∑p(y∣x)logp(y∣x) =−x∈X∑y∈Y∑p(x)p(y∣x)logp(y∣x) =−x∈X∑y∈Y∑p(x,y)logp(y∣x) =−x∈X∑y∈Y∑p(x,y)logp(x)p(x,y) =x∈X∑y∈Y∑p(x,y)logp(x,y)p(x)
Relation Between Joint and Conditional Entropy
H(X, Y) = H(X) + H(Y|X)
Mutual Information I(X;Y)=I(Y;X) by symmetric I(X;Y)=H(X)−H(X∣Y)-the reduction of uncertain of X due to knowledge of Y =H(Y)−H(Y∣X)-The reduction of uncertain of Y due to knowldge of X