Probability theory | Representation, relationship and size between joint entropy, conditional entropy and mutual information

1. Deduce the relationship and size between joint entropy, conditional entropy and mutual information

related definition

Joint Entropy
Random Variable XXXYYThe joint entropy H of Y ( X , Y ) H(X,Y)H(X,Y)表示二者一起发生时的不确定度:
H ( X , Y ) = ∑ x i ∈ X ∑ y i ∈ Y p ( x i , y i ) I ( x i , y i ) = ∑ x i ∈ X ∑ y i ∈ Y p ( x i , y i ) log ⁡ 1 p ( x i , y i ) H(X,Y)=\sum\limits_{x_{i}\in X}\sum\limits_{y_{i}\in Y}p(x_{i},y_{i})I(x_{i},y_{i})\\ =\sum\limits_{x_{i}\in X}\sum\limits_{y_{i}\in Y}p(x_{i},y_{i})\log\frac{1}{p(x_{i},y_{i})} H(X,Y)=xiXyiYp(xi,yi)I(xi,yi)=xiXyiYp(xi,yi)logp(xi,yi)1
简记为:
H ( X , Y ) = − ∑ x , y p ( x , y ) log ⁡ ( x , y ) H(X,Y)=-\sum\limits_{x,y}p(x,y)\log(x,y) H(X,Y)=x,yp(x,y)log(x,y)

Conditional entropy
random variable XXXYYThe conditional entropy H of Y ( X ∣ Y ) H(X|Y)H ( X Y ) meansYYAfter Y happens, XXX的不确定度:
H ( X ∣ Y ) = ∑ y j ∈ Y p ( y j ) H ( X ∣ Y = y j ) = − ∑ y j ∈ Y p ( y j ) ∑ x i ∈ X p ( x i ∣ y j ) log ⁡ p ( x i ∣ y j ) = − ∑ y j ∈ Y ∑ x i ∈ X p ( y j ) p ( x i ∣ y j ) log ⁡ p ( x i ∣ y j ) = − ∑ x i , y j p ( x i , y j ) l o g p ( x i ∣ y j ) H(X|Y)=\sum\limits_{y_{j}\in Y}p(y_{j})H(X|Y=y_{j})\\ =-\sum\limits_{y_{j}\in Y}p(y_{j})\sum\limits_{x_{i}\in X}p(x_{i}|y_{j})\log p(x_{i}|y_{j})\\ =-\sum\limits_{y_{j}\in Y}\sum\limits_{x_{i}\in X}p(y_{j})p(x_{i}|y_{j})\log p(x_{i}|y_{j})\\ =-\sum\limits_{x_{i},y_{j}}p(x_{i},y_{j})logp(x_{i}|y_{j}) H(XY)=yjYp ( andj)H(XY=yj)=yjYp ( andj)xiXp(xiyj)logp(xiyj)=yjYxiXp ( andj)p(xiyj)logp(xiyj)=xi,yjp(xi,yj)logp(xiyj)
简记为:
H ( X ∣ Y ) = − ∑ x , y p ( x , y ) log ⁡ p ( x ∣ y ) H(X|Y)=-\sum\limits_{x,y}p(x,y)\log p(x|y) H(XY)=x,yp(x,y)logp(xy)

Mutual information
random variable XXXYYY -like communicationI ( X ; Y ) I(X;Y)I(X;Y ) meansYYAfter Y happens, XXThe degree of reduction in the uncertainty of X is defined as the logarithm of the ratio of the posterior probability to the prior probability:
I ( xi ; yj ) = log ⁡ p ( xi ∣ yj ) p ( xi ) I(x_{i};y_ {j})=\log \frac{p(x_{i}|y_{j})}{p(x_{i})}I(xi;yj)=logp(xi)p(xiyj)
简记为:
H ( X ; Y ) = ∑ x , y p ( x , y ) log ⁡ p ( x ∣ y ) p ( x ) H(X;Y)=\sum\limits_{x,y}p(x,y)\log\frac{p(x|y)}{p(x)} H(X;Y)=x,yp(x,y)logp(x)p(xy)

Relationship derivation

Joint entropy and conditional entropy relationship
H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y)=H(X,Y)-H(Y)H(XY)=H(X,Y)H(Y),其推导过程如下
H ( X ∣ Y ) = − ∑ x , y p ( x , y ) log ⁡ p ( x ∣ y ) = − ∑ x , y p ( x , y ) log ⁡ p ( x , y ) p ( y ) = − ∑ x , y p ( x , y ) log ⁡ p ( x , y ) + ∑ y ( ∑ x p ( x , y ) ) log ⁡ p ( y ) = − ∑ x , y p ( x , y ) log ⁡ p ( x , y ) + ∑ y p ( y ) log ⁡ p ( y ) = H ( X , Y ) − H ( Y ) H(X|Y)=-\sum\limits_{x,y}p(x,y)\log p(x|y)\\ =-\sum\limits_{x,y}p(x,y)\log \frac{p(x,y)}{p(y)}\\ =-\sum\limits_{x,y}p(x,y)\log p(x,y)+\sum\limits_{y}(\sum\limits_{x}p(x,y))\log p(y)\\ =-\sum\limits_{x,y}p(x,y)\log p(x,y)+\sum\limits_{y}p(y)\log p(y)\\ =H(X,Y)-H(Y) H(XY)=x,yp(x,y)logp(xy)=x,yp(x,y)logp ( and )p(x,y)=x,yp(x,y)logp(x,y)+y(xp(x,y))logp ( and )=x,yp(x,y)logp(x,y)+yp ( and )logp ( and )=H(X,Y)H(Y)
同理可得: H ( Y ∣ X ) = H ( X , Y ) − H ( X ) H(Y|X)=H(X,Y)-H(X) H(YX)=H(X,Y)H(X)

Mutual information and conditional entropy relationship
I ( X ; Y ) = H ( X ) − H ( X ∣ Y ) I(X;Y)=H(X)-H(X|Y)I(X;Y)=H(X)H(XY),其推导过程如下
I ( X ; Y ) = ∑ x , y p ( x , y ) log ⁡ p ( x ∣ y ) p ( x ) = − ∑ x p ( x ) log ⁡ p ( x ) + ∑ x , y p ( x , y ) log ⁡ p ( x ∣ y ) = H ( X ) − H ( X ∣ Y ) I(X;Y)=\sum\limits_{x,y}p(x,y)\log\frac{p(x|y)}{p(x)}\\ =-\sum\limits_{x}p(x)\log p(x) +\sum\limits_{x,y}p(x,y)\log p(x|y)\\ =H(X)-H(X|Y) I(X;Y)=x,yp(x,y)logp(x)p(xy)=xp(x)logp(x)+x,yp(x,y)logp(xy)=H(X)H ( X Y )
同理可得:I ( Y ; X ) = H ( Y ) − H ( Y ∣ X ) I(Y;X)=H(Y)-H(Y|X)I(Y;X)=H(Y)H ( Y X )
joint entropy and mutual information relationship
From the formula (7) (8), it can be obtained that
H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) = H ( X ) − I ( X ; Y ) H(X|Y)=H(X,Y)-H(Y)\\=H(X)-I(X;Y)H(XY)=H(X,Y)H(Y)=H(X)I(X;Y)

H ( X , Y ) = H ( X ) + H ( Y ) − I ( X ; Y ) H(X,Y)=H(X)+H(Y)-I(X;Y) H(X,Y)=H(X)+H(Y)I(X;Y)

size comparison

For mutual information and conditional entropy, this paper uses a Venn diagram to illustrate their size relationship.
insert image description here

Joint entropy and mutual information size
As shown in the Venn diagram, the shaded part of the left figure represents H ( X ) H(X)H ( X ) , the shaded part of the right figure representsH ( Y ) H(Y)H ( Y ),而H ( X ) ∪ H ( Y ) = H ( X , Y ) H(X)\cup H(Y)=H(X,Y)H(X)H(Y)=H(X,Y) H ( X ) ∩ H ( Y ) = I ( X ; Y ) H(X)\cap H(Y)=I(X;Y) H(X)H(Y)=I(X;Y),易得 H ( X , Y ) > I ( X ; Y ) H(X,Y)>I(X;Y) H(X,Y)>I(X;Y )
joint entropy and conditional entropy size
KnownH ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y)=H(X,Y)-H(Y)H(XY)=H(X,Y)H ( Y ) , that is, the blank area on the right side of the Venn diagram, it is easy to getH ( X , Y ) > H ( X ∣ Y ) H(X,Y)>H(X|Y)H(X,Y)>H ( X Y ) . Similarly,H ( Y , X ) > H ( Y ∣ X ) H(Y,X)>H(Y|X)H(Y,X)>H ( Y X ) .
Mutual information and conditional entropy
cannot be judged due to insufficient conditions.

2. Prove that the continuous random variable XXX first-order center distanceα \alphaα and the second-order center distanceβ \betaUnder which distribution is the differential entropy of β the largest, and find the probability density function

Let X ∼ p ( X ) X\sim p(X)Xp ( X ) is a continuous random variable, then the problem of this topic is
max ⁡ p H ( p ) = − ∫ − ∞ + ∞ p ( x ) log ⁡ p ( x ) dx \max\limits_{p} H( p)=-\int_{-\infty}^{+\infty} p(x)\log p(x)dxpmaxH(p)=+p(x)logp(x)dx

s.t.
F ( x ) = ∫ − ∞ + ∞ p ( x ) d x = 1 F(x)=\int_{-\infty}^{+\infty} p(x)dx=1 F(x)=+p(x)dx=1

E ( X ) = ∫ − ∞ + ∞ x p ( x ) d x = α E(X)=\int_{-\infty}^{+\infty}xp(x)dx=\alpha E ( X )=+xp(x)dx=a

v a r ( X ) = ∫ − ∞ + ∞ x 2 p ( x ) d x = β var(X)=\int_{-\infty}^{+\infty}x^{2}p(x)dx=\beta v a r ( X )=+x2p(x)dx=b

Among them, formula (11) is the regularization constraint of this question; formula (12) is the mean value constraint; formula (13) is the variance constraint. So we naturally think of using the Lagrange multiplier method to solve:

\paragraph{proof} introduces Lagrangian multipliers m , n , γ m,n,\gammam,n,γ,由由时间可以可得
L ( p , m , n , γ ) = − ∫ − ∞ + ∞ p ( x ) log ⁡ p ( x ) dx + m ( ∫ − ∞ + ∞ p ( x ) dx − 1 ) + n ( ∫ − ∞ + ∞ xp ( x ) dx − α ) + γ ( ∫ − ∞ + ∞ x 2 p ( x ) dx − β ) L(p,m,n,\gamma)=-\ int_{-\infty}^{+\infty}p(x)\log p(x)dx\\ +m(\int_{-\infty}^{+\infty}p(x)dx-1)\ \ +n(\int_{-\infty}^{+\infty}xp(x)dx-\alpha)\\ +\gamma(\int_{-\infty}^{+\infty}x^{2} p(x)dx-\beta)L(p,m,n,c )=+p(x)logp(x)dx+m(+p(x)dx1)+n(+xp(x)dxa )+ γ ( +x2p(x)dxb )

to ppTaking the partial derivative of p and setting it to 0 gives
∂ L ∂ p = − ∂ ∂ p ( ∫ − ∞ + ∞ p ( x ) log ⁡ p ( x ) − mp ( x ) − nxp ( x ) − γ ( x − α ) 2 p ( x ) ) dx = 0 \frac{\partial L}{\partial p}=-\frac{\partial}{\partial p}(\int_{-\infty}^{+\infty} p(x)\log p(x)-mp(x)-n xp(x)-\gamma(x-\alpha)^{2}p(x))dx=0pL=p(+p(x)logp(x)m p ( x )nxp(x)c ( xa )2p(x))dx=0

W = p ( x ) log ⁡ p ( x ) − m p ( x ) − n x p ( x ) − γ ( x − α ) 2 p ( x ) W=p(x)\log p(x)-m p(x)-n xp(x)-\gamma(x-\alpha)^{2}p(x) W=p(x)logp(x)m p ( x )nxp(x)c ( xa )2 p(x), since W is the functional of p(x) and x, there is∂ W ∂ p = 0 \frac{\partial W}{\partial p}=0pW=0,故:
p ( x ) = e − 1 + m + n x + γ x 2 p(x)=e^{-1+m+nx+\gamma x^{2}} p(x)=e1+m+nx+γx2

According to constraints (11) and (13), it is easy to get:
p ( x ) = 1 2 π β e − ( x − α ) 2 2 β p(x)=\frac{1}{\sqrt{2 \pi \ beta}}e^{-\frac{(x-\alpha)^2}{2\beta}}p(x)=2 p b 1e2 b( x α )2

Therefore, in the continuous random variable XXX first-order center distanceα \alphaα and the second-order center distanceβ \betaWhen β is known, the distribution with larger differential entropy is a normal distribution, and its probability density function is shown in formula (17).

Guess you like

Origin blog.csdn.net/weixin_43427721/article/details/127434206