信息熵和条件熵的计算



表1. 目标值为PlayTennis的14个训练样例

Day Outlook Temperature Humidity Wind PlayTennis
D 1 D_1 D1 Sunny Hot High Weak No
D 2 D_2 D2 Sunny Hot High Strong No
D 3 D_3 D3 Overcast Hot High Weak Yes
D 4 D_4 D4 Rain Mild High Weak Yes
D 5 D_5 D5 Rain Cool Normal Weak Yes
D 6 D_6 D6 Rain Cool Normal Strong No
D 7 D_7 D7 Overcast Cool Normal Strong Yes
D 8 D_8 D8 Sunny Mild High Weak No
D 9 D_9 D9 Sunny Cool Normal Weak Yes
D 10 D_{10} D10 Rain Mild Normal Weak Yes
D 11 D_{11} D11 Sunny Mild Normal Strong Yes
D 12 D_{12} D12 Overcast Mild High Strong Yes
D 13 D_{13} D13 Overcast Hot Normal Weak Yes
D 14 D_{14} D14 Rain Mild High Strong No

如表1所示,目标值是:PlayTennis,也就是是否打球。
表1中有四个特征,分别是天气(Outlook)、温度(Temperature)、湿度(Humidity)以及风(Wind)。

1. 信息熵

信息熵的公式:
H ( X ) = − ∑ x ∈ X p ( x ) log ⁡ p ( x ) H(X) = - \sum_{x \in X} p(x) \log p(x) H(X)=xXp(x)logp(x)
顺带一提,
0 ≤ H ( X ) ≤ log ⁡ n 0 \leq H(X) \leq \log n 0H(X)logn

以表1为例,设是否打球这一随机变量为 Y Y Y,则
p ( Y = Yes ) = 9 14 p(Y = \text{Yes}) = \frac{9}{14} p(Y=Yes)=149
p ( Y = No ) = 5 14 p(Y = \text{No}) = \frac{5}{14} p(Y=No)=145
所以,
H ( Y ) = − ∑ y ∈ Y p ( y ) log ⁡ p ( y ) = − ( p ( Y = Yes ) ∗ log ⁡ p ( Y = Yes ) + p ( Y = No ) ∗ log ⁡ p ( Y = No ) ) = − ( 9 14 ∗ log ⁡ 2 9 14 + 5 14 ∗ log ⁡ 2 5 14 ) = 0.9403 \begin{aligned} H(Y) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes}) \ast \log p(Y=\text{Yes}) + p(Y=\text{No}) \ast \log p(Y=\text{No}) ) \\ &= - ( \frac{9}{14} \ast \log_2 \frac{9}{14} + \frac{5}{14} \ast \log_2 \frac{5}{14}) \\ &= 0.9403 \end{aligned} H(Y)=yYp(y)logp(y)=(p(Y=Yes)logp(Y=Yes)+p(Y=No)logp(Y=No))=(149log2149+145log2145)=0.9403


2. 条件熵

条件熵表示在条件 X X X Y Y Y的信息熵。
公式如下:
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) H(Y|X) = \sum_{x \in X} p(x) H(Y|X=x) H(YX)=xXp(x)H(YX=x)

在表1的例子中,设湿度(Humidity)为随机变量 X X X,则:
p ( X = High ) = 7 14 = 1 2 p(X=\text{High}) = \frac{7}{14} = \frac{1}{2} p(X=High)=147=21

p ( X = Normal ) = 7 14 = 1 2 p(X=\text{Normal}) = \frac{7}{14} = \frac{1}{2} p(X=Normal)=147=21
所以,
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) = p ( X = High ) ∗ H ( Y ∣ X = High ) + p ( X = Normal ) ∗ H ( Y ∣ X = Normal ) \begin{aligned} H(Y|X) &= \sum_{x \in X} p(x) H(Y|X=x) \\ &= p(X=\text{High}) \ast H(Y|X=\text{High}) + p(X=\text{Normal}) \ast H(Y|X=\text{Normal}) \end{aligned} H(YX)=xXp(x)H(YX=x)=p(X=High)H(YX=High)+p(X=Normal)H(YX=Normal)

接下来计算 H ( Y ∣ X = High ) H(Y|X=\text{High}) H(YX=High) H ( Y ∣ X = Normal ) H(Y|X=\text{Normal}) H(YX=Normal)

根据信息熵的计算方法可得:
H ( Y ∣ X = High ) = − ∑ y ∈ Y p ( y ) log ⁡ p ( y ) = − ( p ( Y = Yes ∣ X = High ) ∗ log ⁡ p ( Y = Yes ∣ X = High ) + p ( Y = No ∣ X = High ) ∗ log ⁡ p ( Y = No ∣ X = High ) = − ( 3 7 ∗ log ⁡ 2 3 7 + 4 7 ∗ log ⁡ 2 4 7 ) = 0.9852 \begin{aligned} H(Y|X=\text{High}) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes} | X=\text{High}) \ast \log p(Y=\text{Yes} | X=\text{High} ) \\ &+ p(Y=\text{No} | X=\text{High}) \ast \log p(Y=\text{No} | X=\text{High} ) \\ &= - ( \frac{3}{7} \ast \log_2 \frac{3}{7} + \frac{4}{7} \ast \log_2 \frac{4}{7} ) \\ &= 0.9852 \end{aligned} H(YX=High)=yYp(y)logp(y)=(p(Y=YesX=High)logp(Y=YesX=High)+p(Y=NoX=High)logp(Y=NoX=High)=(73log273+74log274)=0.9852

H ( Y ∣ X = Normal ) = − ∑ y ∈ Y p ( y ) log ⁡ p ( y ) = − ( p ( Y = Yes ∣ X = Normal ) ∗ log ⁡ p ( Y = Yes ∣ X = Normal ) + p ( Y = No ∣ X = Normal ) ∗ log ⁡ p ( Y = No ∣ X = Normal ) = − ( 6 7 ∗ log ⁡ 2 6 7 + 1 7 ∗ log ⁡ 2 1 7 ) = 0.5917 \begin{aligned} H(Y|X=\text{Normal}) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes} | X=\text{Normal}) \ast \log p(Y=\text{Yes} | X=\text{Normal}) \\ &+ p(Y=\text{No} | X=\text{Normal}) \ast \log p(Y=\text{No} | X=\text{Normal}) \\ &= - ( \frac{6}{7} \ast \log_2 \frac{6}{7} + \frac{1}{7} \ast \log_2 \frac{1}{7} ) \\ &= 0.5917 \end{aligned} H(YX=Normal)=yYp(y)logp(y)=(p(Y=YesX=Normal)logp(Y=YesX=Normal)+p(Y=NoX=Normal)logp(Y=NoX=Normal)=(76log276+71log271)=0.5917

因此,
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) = p ( X = High ) ∗ H ( Y ∣ X = High ) + p ( X = Normal ) ∗ H ( Y ∣ X = Normal ) = 1 2 ∗ 0.9852 + 1 2 ∗ 0.5917 = 0.7884 \begin{aligned} H(Y|X) &= \sum_{x \in X} p(x) H(Y|X=x) \\ &= p(X=\text{High}) \ast H(Y|X=\text{High}) + p(X=\text{Normal}) \ast H(Y|X=\text{Normal}) \\ &= \frac{1}{2} \ast 0.9852 + \frac{1}{2} \ast 0.5917 \\ &= 0.7884 \end{aligned} H(YX)=xXp(x)H(YX=x)=p(X=High)H(YX=High)+p(X=Normal)H(YX=Normal)=210.9852+210.5917=0.7884


3. 参考文章

  1. 什么是信息熵、条件熵和信息增益

猜你喜欢

转载自blog.csdn.net/PursueLuo/article/details/95627975