《机器学习实战》笔记--第三章：决策树

知识点1：python set()函数

set() 函数创建一个无序不重复元素集，可进行关系测试，删除重复数据，还可以计算交集、差集、并集等。

>>>x = set('runoob')
>>> y = set('google')
>>> x, y
(set(['b', 'r', 'u', 'o', 'n']), set(['e', 'o', 'g', 'l']))   # 重复的被删除
>>> x & y         # 交集
set(['o'])
>>> x | y         # 并集
set(['b', 'e', 'g', 'l', 'o', 'n', 'r', 'u'])
>>> x - y         # 差集
set(['r', 'b', 'u', 'n'])
>>>

知识点2：递归构建决策树

工作原理如下：得到原始数据集，然后基于最好的属性值划分数据集，由于特征值可能多于两个，因此可能存在大于两个分支的数据集划分。第一次划分之后，数据将被向下传递到树分支的下一个节点，在这个节点上，我们可以再次划分数据。因此我们可以采用递归的原则处理数据集。

递归结束的条件是：程序遍历完所有划分数据集的属性，或者每个分支下的所有实例都具有相同的分类。如果所有实例具有相同的分类，则得到一个叶子节点或者终止块。

知识点3：Python dict() 函数

dict() 函数用于创建一个字典。

>>>dict()                        # 创建空字典
{}
>>> dict(a='a', b='b', t='t')     # 传入关键字
{'a': 'a', 'b': 'b', 't': 't'}
>>> dict(zip(['one', 'two', 'three'], [1, 2, 3]))   # 映射函数方式来构造字典
{'three': 3, 'two': 2, 'one': 1} 
>>> dict([('one', 1), ('two', 2), ('three', 3)])    # 可迭代对象方式来构造字典
{'three': 3, 'two': 2, 'one': 1}
>>>

知识点4：annotate标注文字

  createPlot.ax1.annotate(nodeTxt,xy=parentPt,xycoords='axes fraction',xytext = centerPt,textcoords='axes fraction',\
                           va="center",ha="center",bbox=nodeType,arrowprops=arrow_args)

(1)annotate语法说明 ：annotate(s='str' ,xy=(x,y) ,xytext=(l1,l2) ,..)

s 为注释文本内容
xy 为被注释的坐标点
xytext 为注释文字的坐标位置
xycoords 参数如下:

figure points points from the lower left of the figure 点在图左下方
figure pixels pixels from the lower left of the figure 图左下角的像素
figure fraction fraction of figure from lower left 左下角数字部分
axes points points from lower left corner of axes 从左下角点的坐标
axes pixels pixels from lower left corner of axes 从左下角的像素坐标
axes fraction fraction of axes from lower left 左下角部分
data use the coordinate system of the object being annotated(default) 使用的坐标系统被注释的对象（默认）
polar(theta,r) if not native ‘data’ coordinates t

extcoords 设置注释文字偏移量

| 参数 | 坐标系 |

| 'figure points' | 距离图形左下角的点数量 |

| 'figure pixels' | 距离图形左下角的像素数量 |

| 'figure fraction' | 0,0 是图形左下角，1,1 是右上角 |

| 'axes points' | 距离轴域左下角的点数量 |

| 'axes pixels' | 距离轴域左下角的像素数量 |

| 'axes fraction' | 0,0 是轴域左下角，1,1 是右上角 |

| 'data' | 使用轴域数据坐标系 |

arrowprops #箭头参数,参数类型为字典dict

width the width of the arrow in points 点箭头的宽度
headwidth the width of the base of the arrow head in points 在点的箭头底座的宽度
headlength the length of the arrow head in points 点箭头的长度
shrink fraction of total length to ‘shrink’ from both ends 总长度为分数“缩水”从两端
facecolor 箭头颜色

bbox给标题增加外框，常用参数如下：

boxstyle方框外形
facecolor(简写fc)背景颜色
edgecolor(简写ec)边框线条颜色
edgewidth边框线条大小

bbox=dict(boxstyle='round,pad=0.5', fc='yellow', ec='k',lw=1 ,alpha=0.5) #fc为facecolor,ec为edgecolor,lw为lineweight

《机器学习实战》笔记--第三章：决策树

猜你喜欢