1. Deduce the relationship and size between joint entropy, conditional entropy and mutual information
related definition
Joint Entropy
Random Variable XXX和YYThe joint entropy H of Y ( X , Y ) H(X,Y)H(X,Y)表示二者一起发生时的不确定度:
H ( X , Y ) = ∑ x i ∈ X ∑ y i ∈ Y p ( x i , y i ) I ( x i , y i ) = ∑ x i ∈ X ∑ y i ∈ Y p ( x i , y i ) log 1 p ( x i , y i ) H(X,Y)=\sum\limits_{x_{i}\in X}\sum\limits_{y_{i}\in Y}p(x_{i},y_{i})I(x_{i},y_{i})\\ =\sum\limits_{x_{i}\in X}\sum\limits_{y_{i}\in Y}p(x_{i},y_{i})\log\frac{1}{p(x_{i},y_{i})} H(X,Y)=xi∈X∑yi∈Y∑p(xi,yi)I(xi,yi)=xi∈X∑yi∈Y∑p(xi,yi)logp(xi,yi)1
简记为:
H ( X , Y ) = − ∑ x , y p ( x , y ) log ( x , y ) H(X,Y)=-\sum\limits_{x,y}p(x,y)\log(x,y) H(X,Y)=−x,y∑p(x,y)log(x,y)
Conditional entropy
random variable XXX和YYThe conditional entropy H of Y ( X ∣ Y ) H(X|Y)H ( X ∣ Y ) meansYYAfter Y happens, XXX的不确定度:
H ( X ∣ Y ) = ∑ y j ∈ Y p ( y j ) H ( X ∣ Y = y j ) = − ∑ y j ∈ Y p ( y j ) ∑ x i ∈ X p ( x i ∣ y j ) log p ( x i ∣ y j ) = − ∑ y j ∈ Y ∑ x i ∈ X p ( y j ) p ( x i ∣ y j ) log p ( x i ∣ y j ) = − ∑ x i , y j p ( x i , y j ) l o g p ( x i ∣ y j ) H(X|Y)=\sum\limits_{y_{j}\in Y}p(y_{j})H(X|Y=y_{j})\\ =-\sum\limits_{y_{j}\in Y}p(y_{j})\sum\limits_{x_{i}\in X}p(x_{i}|y_{j})\log p(x_{i}|y_{j})\\ =-\sum\limits_{y_{j}\in Y}\sum\limits_{x_{i}\in X}p(y_{j})p(x_{i}|y_{j})\log p(x_{i}|y_{j})\\ =-\sum\limits_{x_{i},y_{j}}p(x_{i},y_{j})logp(x_{i}|y_{j}) H(X∣Y)=yj∈Y∑p ( andj)H(X∣Y=yj)=−yj∈Y∑p ( andj)xi∈X∑p(xi∣yj)logp(xi∣yj)=−yj∈Y∑xi∈X∑p ( andj)p(xi∣yj)logp(xi∣yj)=−xi,yj∑p(xi,yj)logp(xi∣yj)
简记为:
H ( X ∣ Y ) = − ∑ x , y p ( x , y ) log p ( x ∣ y ) H(X|Y)=-\sum\limits_{x,y}p(x,y)\log p(x|y) H(X∣Y)=−x,y∑p(x,y)logp(x∣y)
Mutual information
random variable XXX和YYY -like communicationI ( X ; Y ) I(X;Y)I(X;Y ) meansYYAfter Y happens, XXThe degree of reduction in the uncertainty of X is defined as the logarithm of the ratio of the posterior probability to the prior probability:
I ( xi ; yj ) = log p ( xi ∣ yj ) p ( xi ) I(x_{i};y_ {j})=\log \frac{p(x_{i}|y_{j})}{p(x_{i})}I(xi;yj)=logp(xi)p(xi∣yj)
简记为:
H ( X ; Y ) = ∑ x , y p ( x , y ) log p ( x ∣ y ) p ( x ) H(X;Y)=\sum\limits_{x,y}p(x,y)\log\frac{p(x|y)}{p(x)} H(X;Y)=x,y∑p(x,y)logp(x)p(x∣y)
Relationship derivation
Joint entropy and conditional entropy relationship
H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y)=H(X,Y)-H(Y)H(X∣Y)=H(X,Y)−H(Y),其推导过程如下
H ( X ∣ Y ) = − ∑ x , y p ( x , y ) log p ( x ∣ y ) = − ∑ x , y p ( x , y ) log p ( x , y ) p ( y ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ y ( ∑ x p ( x , y ) ) log p ( y ) = − ∑ x , y p ( x , y ) log p ( x , y ) + ∑ y p ( y ) log p ( y ) = H ( X , Y ) − H ( Y ) H(X|Y)=-\sum\limits_{x,y}p(x,y)\log p(x|y)\\ =-\sum\limits_{x,y}p(x,y)\log \frac{p(x,y)}{p(y)}\\ =-\sum\limits_{x,y}p(x,y)\log p(x,y)+\sum\limits_{y}(\sum\limits_{x}p(x,y))\log p(y)\\ =-\sum\limits_{x,y}p(x,y)\log p(x,y)+\sum\limits_{y}p(y)\log p(y)\\ =H(X,Y)-H(Y) H(X∣Y)=−x,y∑p(x,y)logp(x∣y)=−x,y∑p(x,y)logp ( and )p(x,y)=−x,y∑p(x,y)logp(x,y)+y∑(x∑p(x,y))logp ( and )=−x,y∑p(x,y)logp(x,y)+y∑p ( and )logp ( and )=H(X,Y)−H(Y)
同理可得: H ( Y ∣ X ) = H ( X , Y ) − H ( X ) H(Y|X)=H(X,Y)-H(X) H(Y∣X)=H(X,Y)−H(X)
Mutual information and conditional entropy relationship
I ( X ; Y ) = H ( X ) − H ( X ∣ Y ) I(X;Y)=H(X)-H(X|Y)I(X;Y)=H(X)−H(X∣Y),其推导过程如下
I ( X ; Y ) = ∑ x , y p ( x , y ) log p ( x ∣ y ) p ( x ) = − ∑ x p ( x ) log p ( x ) + ∑ x , y p ( x , y ) log p ( x ∣ y ) = H ( X ) − H ( X ∣ Y ) I(X;Y)=\sum\limits_{x,y}p(x,y)\log\frac{p(x|y)}{p(x)}\\ =-\sum\limits_{x}p(x)\log p(x) +\sum\limits_{x,y}p(x,y)\log p(x|y)\\ =H(X)-H(X|Y) I(X;Y)=x,y∑p(x,y)logp(x)p(x∣y)=−x∑p(x)logp(x)+x,y∑p(x,y)logp(x∣y)=H(X)−H ( X ∣ Y )
同理可得:I ( Y ; X ) = H ( Y ) − H ( Y ∣ X ) I(Y;X)=H(Y)-H(Y|X)I(Y;X)=H(Y)−H ( Y ∣ X )
joint entropy and mutual information relationship
From the formula (7) (8), it can be obtained that
H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) = H ( X ) − I ( X ; Y ) H(X|Y)=H(X,Y)-H(Y)\\=H(X)-I(X;Y)H(X∣Y)=H(X,Y)−H(Y)=H(X)−I(X;Y)
故 H ( X , Y ) = H ( X ) + H ( Y ) − I ( X ; Y ) H(X,Y)=H(X)+H(Y)-I(X;Y) H(X,Y)=H(X)+H(Y)−I(X;Y)
size comparison
For mutual information and conditional entropy, this paper uses a Venn diagram to illustrate their size relationship.
Joint entropy and mutual information size
As shown in the Venn diagram, the shaded part of the left figure represents H ( X ) H(X)H ( X ) , the shaded part of the right figure representsH ( Y ) H(Y)H ( Y ),而H ( X ) ∪ H ( Y ) = H ( X , Y ) H(X)\cup H(Y)=H(X,Y)H(X)∪H(Y)=H(X,Y) , H ( X ) ∩ H ( Y ) = I ( X ; Y ) H(X)\cap H(Y)=I(X;Y) H(X)∩H(Y)=I(X;Y),易得 H ( X , Y ) > I ( X ; Y ) H(X,Y)>I(X;Y) H(X,Y)>I(X;Y )
joint entropy and conditional entropy size
KnownH ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y)=H(X,Y)-H(Y)H(X∣Y)=H(X,Y)−H ( Y ) , that is, the blank area on the right side of the Venn diagram, it is easy to getH ( X , Y ) > H ( X ∣ Y ) H(X,Y)>H(X|Y)H(X,Y)>H ( X ∣ Y ) . Similarly,H ( Y , X ) > H ( Y ∣ X ) H(Y,X)>H(Y|X)H(Y,X)>H ( Y ∣ X ) .
Mutual information and conditional entropy
cannot be judged due to insufficient conditions.
2. Prove that the continuous random variable XXX first-order center distanceα \alphaα and the second-order center distanceβ \betaUnder which distribution is the differential entropy of β the largest, and find the probability density function
Let X ∼ p ( X ) X\sim p(X)X∼p ( X ) is a continuous random variable, then the problem of this topic is
max p H ( p ) = − ∫ − ∞ + ∞ p ( x ) log p ( x ) dx \max\limits_{p} H( p)=-\int_{-\infty}^{+\infty} p(x)\log p(x)dxpmaxH(p)=−∫−∞+∞p(x)logp(x)dx
s.t.
F ( x ) = ∫ − ∞ + ∞ p ( x ) d x = 1 F(x)=\int_{-\infty}^{+\infty} p(x)dx=1 F(x)=∫−∞+∞p(x)dx=1
E ( X ) = ∫ − ∞ + ∞ x p ( x ) d x = α E(X)=\int_{-\infty}^{+\infty}xp(x)dx=\alpha E ( X )=∫−∞+∞xp(x)dx=a
v a r ( X ) = ∫ − ∞ + ∞ x 2 p ( x ) d x = β var(X)=\int_{-\infty}^{+\infty}x^{2}p(x)dx=\beta v a r ( X )=∫−∞+∞x2p(x)dx=b
Among them, formula (11) is the regularization constraint of this question; formula (12) is the mean value constraint; formula (13) is the variance constraint. So we naturally think of using the Lagrange multiplier method to solve:
\paragraph{proof} introduces Lagrangian multipliers m , n , γ m,n,\gammam,n,γ,由由时间可以可得
L ( p , m , n , γ ) = − ∫ − ∞ + ∞ p ( x ) log p ( x ) dx + m ( ∫ − ∞ + ∞ p ( x ) dx − 1 ) + n ( ∫ − ∞ + ∞ xp ( x ) dx − α ) + γ ( ∫ − ∞ + ∞ x 2 p ( x ) dx − β ) L(p,m,n,\gamma)=-\ int_{-\infty}^{+\infty}p(x)\log p(x)dx\\ +m(\int_{-\infty}^{+\infty}p(x)dx-1)\ \ +n(\int_{-\infty}^{+\infty}xp(x)dx-\alpha)\\ +\gamma(\int_{-\infty}^{+\infty}x^{2} p(x)dx-\beta)L(p,m,n,c )=−∫−∞+∞p(x)logp(x)dx+m(∫−∞+∞p(x)dx−1)+n(∫−∞+∞xp(x)dx−a )+ γ ( ∫−∞+∞x2p(x)dx−b )
to ppTaking the partial derivative of p and setting it to 0 gives
∂ L ∂ p = − ∂ ∂ p ( ∫ − ∞ + ∞ p ( x ) log p ( x ) − mp ( x ) − nxp ( x ) − γ ( x − α ) 2 p ( x ) ) dx = 0 \frac{\partial L}{\partial p}=-\frac{\partial}{\partial p}(\int_{-\infty}^{+\infty} p(x)\log p(x)-mp(x)-n xp(x)-\gamma(x-\alpha)^{2}p(x))dx=0∂p∂L=−∂p∂(∫−∞+∞p(x)logp(x)−m p ( x )−nxp(x)−c ( x−a )2p(x))dx=0
令 W = p ( x ) log p ( x ) − m p ( x ) − n x p ( x ) − γ ( x − α ) 2 p ( x ) W=p(x)\log p(x)-m p(x)-n xp(x)-\gamma(x-\alpha)^{2}p(x) W=p(x)logp(x)−m p ( x )−nxp(x)−c ( x−a )2 p(x), since W is the functional of p(x) and x, there is∂ W ∂ p = 0 \frac{\partial W}{\partial p}=0∂p∂W=0,故:
p ( x ) = e − 1 + m + n x + γ x 2 p(x)=e^{-1+m+nx+\gamma x^{2}} p(x)=e−1+m+nx+γx2
According to constraints (11) and (13), it is easy to get:
p ( x ) = 1 2 π β e − ( x − α ) 2 2 β p(x)=\frac{1}{\sqrt{2 \pi \ beta}}e^{-\frac{(x-\alpha)^2}{2\beta}}p(x)=2 p b1e−2 b( x − α )2
Therefore, in the continuous random variable XXX first-order center distanceα \alphaα and the second-order center distanceβ \betaWhen β is known, the distribution with larger differential entropy is a normal distribution, and its probability density function is shown in formula (17).