文章目录

决策树作业

决策树作业

1. 题目

you are stranded on a deserted island. Mushrooms of various type grow wildly all over the island,but no other food is anywhere to be found. Some of the mushrooms have been determined as poisonous and others as not determined by your former companions trial and error .You are the only one remaining on the island. You have the following data to consider .

you know whether or not mushrooms A through H are poisonous ,but you do not know about U through W,Build a decision tree to classify mushroom as poisonous or not.

question

(a) What is the entropy of IsPoisonous?
(b) Which attribute should you choose as the root of a decision tree? Hint: You canfigure this out by looking at the data without explicitly computing the informationgain of all four attributes.
© What is the information gain of the attribute you chose in the previous question?

(d) Build a decision tree to classify mushrooms as poisonous or not.
(e) Classify mushrooms U, V, and W using this decision tree as poisonous or notpoisonous.

你被困在一个荒岛上。岛上到处都是各种各样的蘑菇，但找不到其他食物。

有些蘑菇被认为是有毒的，而另一些则不是，这是你以前的同伴反复试验的结果。您需要考虑以下数据。

你知道蘑菇A到H是否有毒，但你不知道蘑菇U到W，建立一个决策树来分类蘑菇是否有毒。

(a) 有毒的信息熵的多少

(b) 应该选择哪个属性作为决策树的根?提示:您可以通过查看数据而不显式地计算所有四个属性的信息。

© 你在上一个问题中选择的属性的信息增益是什么?

(d) 构建一个决策树，将蘑菇分类为有毒或不有毒。

(e) 用这个决策树将蘑菇分类为U、V和W，分别为有毒或不有毒。

Examle(样例名称)	IsHeavy(是否重)	IsSmelly(是否有味道)	IsSpotted(是否有斑点)	IsSmooth(是否光滑)	IsPoisonous(是否有毒)
A	0	0	0	0	0
B	0	0	1	0	0
C	1	1	0	1	0
D	1	0	0	1	1
E	0	1	1	0	1
F	0	0	1	1	1
G	0	0	0	1	1
H	1	1	0	0	1
U	1	1	1	1	?
V	0	1	0	1	?
W	1	1	0	0	?

2. 解答

2.1 有毒的信息熵的多少

信息熵公式 $Entropy(t)=-\sum_{i=1}^{c-1}p(i\mid t)log_2p(i\mid t)$

信息增益的公式 $Gain(D,a)=Entropy(D)-\sum_{i=1}^k\frac{|D_i|}{|D|}Entropy(D_i)$

计算信息熵
$Entropy(isPoisonous)=-\frac{5}{8}\log(\frac{5}{8})-\frac{3}{8}\log(\frac{3}{8})=0.954434002924965 \\ Entropy(isheavy)=-\frac{5}{8}\log(\frac{5}{8})-\frac{3}{8}\log(\frac{3}{8})=0.954434002924965 \\ Entropy(isSmelly)=-\frac{5}{8}\log(\frac{5}{8})-\frac{3}{8}\log(\frac{3}{8})=0.954434002924965 \\ Entropy(isSpotted)=-\frac{5}{8}\log(\frac{5}{8})-\frac{3}{8}\log(\frac{3}{8})=0.954434002924965 \\ \\ Entropy(isSmooth)=-\frac{5}{8}\log(\frac{5}{8})-\frac{3}{8}\log(\frac{3}{8})=1.0$

2.2 应该选择哪个属性作为决策树的根?

一共8组数据


重	3,1个没毒，2个有毒	不重	5，2个没毒，3个有毒
有味道	3,1个没毒，2个有毒	没有味道	5，2个没毒，3个有毒
有斑点	3,1个没毒，2个有毒	无斑点	5, 2个没毒，3个有毒
光滑	4,2个没毒，2个有毒	不光滑	4，1个没毒，3个有毒
有毒	5	无毒	3

首先看重量这个属性的信息增益计算

计算公式如下
$\begin{aligned} Gain(isheavy)&=Entropy(isPoisonous)-\frac{|D_i|}{|D|}Entropy(noheavy)-\frac{|D_i|}{|D|}Entropy(heavy) \\ &=0.954434002924965-\frac{5}{8}[-\frac{2}{5}\log(\frac{2}{5})-\frac{3}{5}\log(\frac{3}{5})]-\frac{3}{8}[-\frac{2}{3}\log(\frac{2}{3})-\frac{1}{3}\log(\frac{1}{3})] \\ &=0.0032289436203635224 \end{aligned}$

计算Smooth

计算 $G a i n (S m o o t h) = 0.048794940695398636$

$G a i n (S m e l l y) = 0.0032289436203635224$

$G a i n (S p o t t e d) = 0.0032289436203635224$

选择光滑的

2.3 信息增益为多少

信息增益为 0.048794940695398636

2.4 构建一个决策树，将蘑菇分类为有毒或不有毒。

先分成2分

然后再开始分

对ABEH，进行算


重	1,0个没毒，1个有毒	不重	3，2个没毒，1个有毒
有味道	2,2个没毒，0个有毒	没有味道	2，0个没毒，2个有毒
有斑点	2,1个没毒，1个有毒	无斑点	2, 1个没毒，1个有毒
有毒	2	无毒	2

$G a i n (h e a v y) = 0.31127812445913283$

$G a i n (s m e l l) = 1$

$G a i n (S p o t) = 0$

所以选择isSmelly这个属性，开始计算

【机器学习之第4章决策树】-作业

文章目录

决策树作业

1. 题目

2. 解答

2.1 有毒的信息熵的多少

2.2 应该选择哪个属性作为决策树的根?

2.3 信息增益为多少

2.4 构建一个决策树，将蘑菇分类为有毒或不有毒。

猜你喜欢

Examle(样例名称)	IsHeavy(是否重)	IsSmelly(是否有味道)	IsSpotted(是否有斑点)	IsSmooth(是否光滑)	IsPoisonous(是否有毒)
A	0	0	0	0	0
B	0	0	1	0	0
C	1	1	0	1	0
D	1	0	0	1	1
E	0	1	1	0	1
F	0	0	1	1	1
G	0	0	0	1	1
H	1	1	0	0	1
U	1	1	1	1	?
V	0	1	0	1	?
W	1	1	0	0	?

Examle(样例名称)	IsHeavy(是否重)	IsSmelly(是否有味道)	IsSpotted(是否有斑点)	IsSmooth(是否光滑)	IsPoisonous(是否有毒)
A	0	0	0	0	0
B	0	0	1	0	0
C	1	1	0	1	0
D	1	0	0	1	1
E	0	1	1	0	1
F	0	0	1	1	1
G	0	0	0	1	1
H	1	1	0	0	1
U	1	1	1	1	?
V	0	1	0	1	?
W	1	1	0	0	?

【机器学习之 第4章决策树】-作业

文章目录

决策树作业

1. 题目

2. 解答

2.1 有毒的信息熵的多少

2.2 应该选择哪个属性作为决策树的根?

2.3 信息增益为多少

2.4 构建一个决策树，将蘑菇分类为有毒或不有毒。

猜你喜欢

【机器学习之第4章决策树】-作业

Examle(样例名称)	IsHeavy(是否重)	IsSmelly(是否有味道)	IsSpotted(是否有斑点)	IsSmooth(是否光滑)	IsPoisonous(是否有毒)
A	0	0	0	0	0
B	0	0	1	0	0
C	1	1	0	1	0
D	1	0	0	1	1
E	0	1	1	0	1
F	0	0	1	1	1
G	0	0	0	1	1
H	1	1	0	0	1
U	1	1	1	1	?
V	0	1	0	1	?
W	1	1	0	0	?