python实现决策树可视化

一、准备阶段

下面是在jupyter notebook中实现决策树的可视化，所以需要先在Anaconda Powershell Prompt中使用

pip install graphviz

安装相应的包
在这里插入图片描述

然后实现可视化需要用到graphviz文件，可以点击下方进入官网

-官网下载

在这里插入图片描述
下载完成安装之后需要添加环境变量，将graphviz安装目录下的bin文件夹添加到Path环境变量中，步骤如下：

下面验证是否安装成功，同样在Anaconda Powershell Prompt下输入dot -version

需要注意的是，这里的命令行是根据自己使用的编辑环境而定，例如：如果使用的是python的IDLE进行编写的话，就使用windows的命令行。

在这里插入图片描述

二、代码：

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import numpy as np
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt


mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# 为了显示中文
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "decision_trees"

def image_path(fig_id):
    return os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id)

def save_fig(fig_id, tight_layout=True):
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(image_path(fig_id) + ".png", format='png', dpi=300)

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

# 数据划分
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)
# 定义决策树分类器 
clf = DecisionTreeClassifier()
# 训练分类器
clf.fit(train_x, train_y)

# tree_clf = DecisionTreeClassifier(max_depth=2, random_state=42)
# tree_clf.fit(X, y)

在这里插入图片描述

# 可视化决策树
from sklearn.tree import export_graphviz

export_graphviz(
        # 决策树
        clf,
        # 保存文件路径
        out_file=image_path("F:/人工智能与机器学习/1.dot"),
        # 特征名字
        feature_names=iris.feature_names[2:],
        # 类别名
        class_names=iris.target_names,
        # 绘制带有圆角的框
        rounded=True,
        # 当设置为“true”时，绘制节点以指示分类、回归极值或节点纯度用于多输出。
        filled=True
    )

import graphviz

with open(image_path("F:/人工智能与机器学习/1.dot")) as f:
    dot_graph = f.read()
dot=graphviz.Source(dot_graph)
dot.view()

‘Source.gv.pdf’

dot

在这里插入图片描述


#测试集上预测
predict = clf.predict(train_x)
sum = 0
for i in range(len(train_y)):
    if predict[i]==train_y[i]:
        sum +=1
print("sum:%s,len(train_y):%s"%(sum,len(train_y)))
print("预测率：%s"%(sum/len(train_y)))

sum:119,len(train_y):120
预测率：0.9916666666666667

#测试集上预测
predict2 = clf.predict(test_x)
sun = 0
for i in range(len(test_y)):
    if predict[i]==test_y[i]:
        sun +=1
print("sum:%s,len(test_y):%s"%(sum,len(test_y)))
print("预测率：%s"%(sum/len(test_y)))

sum:119,len(test_y):30
预测率：3.966666666666667

from sklearn.metrics import accuracy_score

y_pred = clf.predict(test_x)
print("测试集精确度：%s"%accuracy_score(test_y, y_pred))

测试集精确度：0.9666666666666667

y_pred2 = clf.predict(train_x)
print("训练集精确度：%s"%accuracy_score(train_y, y_pred2))

训练集精确度：0.9916666666666667

score = clf.score(test_x, test_y)
print("\n模型测试集准确率为：", score)

模型测试集准确率为： 0.9666666666666667

score = clf.score(train_x, train_y)
print("\n模型训练集准确率为：", score)

模型训练集准确率为： 0.9916666666666667

Time ??

原创文章 13 获赞 6 访问量 1487

关注私信