Machine learning in practice - decision tree (2)

Here, we mainly use the annotate function in matplotlib, which was originally used to annotate a point in the image.

- grammar

Annotate syntax description: annotate(s='str' ,xy=(x,y) ,xytext=(l1,l2) ,..)

s is the annotation text content
xy is the source coordinate point
pointing to the annotation xytext is the coordinate position of the annotation text
xycoords The parameters are as follows:

  • figure points points from the lower left of the figure
  • figure pixels pixels from the lower left of the figure
  • figure fraction fraction of figure from lower left
  • axes points points from lower left corner of axes
  • axes pixels pixels from lower left corner of axes
  • axes fraction fraction of axes from lower left
  • data use the coordinate system of the object beinannotated(default) the annotated object (default)
  • polar(theta,r) if not native 'data' coordinates textcoords set annotation text offset

    | Parameters | Coordinate System |

    | 'figure points' | The number of points from the lower left corner of the figure |

    | 'figure pixels' | The number of pixels from the lower left corner of the figure |

    | 'figure fraction' | 0,0 is the lower left corner of the figure, 1,1 is the upper right corner |

    | 'axes points' | number of points from the lower left corner of the axes |

    | 'axes pixels' | The number of pixels from the lower left corner of the axes |

    | 'axes fraction' | 0,0 is the lower left corner of the axes, 1,1 is the upper right corner |

    | 'data' | Use axes data coordinate system |

arrowprops #Arrow parameters, the parameter type is a dictionary dict

  • width the width of the arrow in points
  • headwidth the width of the base of the arrow head in points
  • headlength the length of the arrow head in points
  • shrink fraction of total length to 'shrink' from both ends
  • facecolor arrow color

bbox adds a frame to the title. Common parameters are as follows:

  • boxstyle box shape
  • facecolor (abbreviated fc) background color
  • edgecolor (abbreviated ec) border line color
  • edgewidth border line size

bbox=dict(boxstyle=’round,pad=0.5’, fc=’yellow’, ec=’k’,lw=1 ,alpha=0.5) #fc为facecolor,ec为edgecolor,lw为lineweight

import matplotlib.pyplot as plt

# 为了处理matplotlib中文乱码
from pylab import mpl  
mpl.rcParams['font.sans-serif'] = ['SimHei'] 
#决策树的字典形式
my_tree={'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}  
# feature标签
labels=["no surfacing","flippers"]

# 定义几个结点类型,fc是边框线粗细 
decision_node=dict(boxstyle="sawtooth",fc="0.8")  # 分支结点,锯齿状
leaf_node=dict(boxstyle="round4",fc="0.8") # 叶子结点,圆矩形
arrow_args=dict(arrowstyle="<-")   # 箭头


# 绘制结点
# center_point为文本的中心点,箭头所在的点,parent_point为指向文本的源点,va即vertical align
# ha即horizon align
def plot_node(note_txt,center_point,parent_point,node_type):
    create_plot.ax1.annotate(note_txt,xy=parent_point,xycoords="axes fraction",
                             xytext=center_point,textcoords="axes fraction", 
                             va="center", ha="center",bbox=node_type,arrowprops=arrow_args)


# #  创建图像,这个函数是初始版本,后面会对之进行补充
# def create_plot_1():
#     fig=plt.figure(1,facecolor="white")
#     fig.clf()

#     create_plot_1.ax1=plt.subplot(111,frameon=False)
#     # 画分支结点和叶子结点
#     plot_node("决策点",(0.5,0.1),(0.1,0.5),decision_node)
#     plot_node("叶子结点",(0.8,0.1),(0.3,0.8),leaf_node)
#     plt.show()
# create_plot_1()

We can now implement a tree from parent node to child node. Now the questions to consider are:

How to place a decision tree in a 1*1 unit axis

So we need two very important pieces of information about the tree: 树的深度and树的所有叶子结点树

The tree we have generated:

{‘no surfacing’: {0: ‘no’, 1: {‘flippers’: {0: ‘no’, 1: ‘yes’}}}}

# 求树的深度
def get_tree_depth(my_tree):
    depth=0
    keys_list=list(my_tree.keys())
    first_str=keys_list[0]           # 第一个决策特征
    second_dict=my_tree[first_str]

    for key in second_dict.keys():
        # 如果某个决策特征划分后的集合还是一个字典,继续划分,也就是递归调用get_tree()
        if type(second_dict[key]).__name__=="dict":
            this_depth=1+get_tree_depth(second_dict[key])
        # 如果某个决策特征划分后的集合不再是一个字典,也就是说不需要再划分了,当前子树的层数为1
        else:
            this_depth=1

        if this_depth>depth:
            depth=this_depth

    return depth



# 求树的叶子结点树,和求树的深度思路类似。
def get_leafs_num(my_tree):
    nums=0
    keys_list=list(my_tree.keys())
    first_str=keys_list[0]           # 第一个决策特征
    second_dict=my_tree[first_str]

    for key in second_dict.keys():
        if type(second_dict[key]).__name__=="dict":
            nums+=get_leafs_num(second_dict[key])
        else:
            nums+=1

    return nums
print(my_tree)
print("树的叶子结点数目为:"+str(get_leafs_num(my_tree)))
print("树的深度为:"+str(get_tree_depth(my_tree)))
{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}
树的叶子结点数目为:3
树的深度为:2

### Now, what we need is to control the distribution in the image according to the depth of the tree and the number of leaf nodes, so the above drawing functions are slightly modified

The code is explained in detail . Like the blogger of this blog, I didn't understand this place when I looked at the source code. Anyway, after reading this blog, I suddenly became enlightened :)

# 在箭头中间添加对应的特征值
def plot_mid_text(center_point,parent_point,txt_string):
    x_mid=(parent_point[0]-center_point[0])/2.0+center_point[0]
    y_mid=(parent_point[1]-center_point[1])/2.0+center_point[1]
    create_plot.ax1.text(x_mid,y_mid,txt_string)

# 画决策树,逻辑过程,同样是递归实现的
def plot_tree(my_tree,parent_point,node_txt):
    # 计算宽和高
    leafs_num=get_leafs_num(my_tree)
    depth=get_tree_depth(my_tree)

    keys_list=list(my_tree.keys())
    first_str=keys_list[0]

    # 确定当前结点位置。注意:树根不存在父节点
    center_point=(plot_tree.xOff+(1+float(leafs_num))/2.0/plot_tree.totalW,plot_tree.yOff)
    # 在箭头中间添加对应的特征值
    plot_mid_text(center_point,parent_point,node_txt)
    # first_str一定是一个决策特征
    # 画决策结点
    plot_node(first_str,center_point,parent_point,decision_node)

    # 往下一层,yOff相应减少
    second_dict=my_tree[first_str]
    plot_tree.yOff-=1.0/plot_tree.totalD

    for key in second_dict.keys():
        # 如果还需要划分,递归调用本身,参数改变一下即可
        if type(second_dict[key]).__name__=="dict":
            plot_tree(second_dict[key],center_point,str(key))
        # 如果是叶子结点,直接输出
        else:
            plot_tree.xOff+=1.0/plot_tree.totalW
            plot_node(second_dict[key],(plot_tree.xOff,plot_tree.yOff),center_point,leaf_node)
            plot_mid_text((plot_tree.xOff,plot_tree.yOff),center_point,str(key))
    plot_tree.yOff+=1/plot_tree.totalD


# 真正的画决策树,非逻辑
def create_plot(my_tree):  
    fig = plt.figure(1, facecolor='white')  
    fig.clf()  
    axprops = dict(xticks=[], yticks=[])  
    create_plot.ax1 = plt.subplot(111, frameon=False)    #no ticks  
    #totalW为整树的叶子节点树,totalD为深度  
    plot_tree.totalW = float(get_leafs_num(my_tree))  
    plot_tree.totalD = float(get_tree_depth(my_tree))  
    plot_tree.xOff = -0.5/plot_tree.totalW
    plot_tree.yOff = 1.0
    # 因为开始的根节点并不用划线,因此父节点和当前节点的位置需要重合,利用2中的确定当前节点的位置便为(0.5, 1.0)
    plot_tree(my_tree, (0.5,1.0), '')  
    plt.show()  
# test
my_tree_1={'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}},2:'maybe'}}
create_plot(my_tree_1)

decision tree

We have now transformed the decision tree from a nested dictionary into a tree. In fact, it doesn't matter if we construct this tree without "display", as long as the above nested dictionary is constructed, the test set can be judged

so, the most critical is the part in the decision_tree.py file

Finally, I ran the contact lens dataset, and the effect is not bad

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324665225&siteId=291194637