[Machine Learning] Python code to build a decision tree (Decision Tree) case

Python code to build a decision tree (Decision Tree) case

Use Python code to build a decision tree (Decision)

tool

  • Python+Jupyter
  • Graphviz 2.38 (Graphviz is an open source graph visualization software)

Graphviz download address:

http://www.graphviz.org/download/

Graphviz is an open source graph visualization software. Graphic visualization is a method of representing structural information as abstract graphics and network diagrams. It has important applications in visual interfaces in networks, bioinformatics, software engineering, database and web design, machine learning, and other technical fields.

The Graphviz layout program uses simple text language to describe graphics and make charts in useful formats, such as images and SVG for web pages; PDF or Postscript for inclusion in other documents; or displayed in interactive graphics browsers in. Graphviz has many useful features for specific graphs, such as color, font, table node layout, line style, hyperlinks, and custom shape options.


Sample data source

https://raw.githubusercontent.com/ffzs/dataset/master/glass.csv

Here you can see what the data set looks like: https://github.com/ffzs/dataset/blob/master/glass.csv

This sample data has 215 rows.


In Jupyter's interactive mode, the source code is as follows

Input

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz
# 导入评估指标模块
from sklearn.metrics import accuracy_score, auc, confusion_matrix, f1_score, precision_score, recall_score, roc_curve
# 导入表格库
import prettytable
# 导入dot插件库
import pydotplus
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# 数据导入
df = pd.read_csv('https://raw.githubusercontent.com/ffzs/dataset/master/glass.csv', usecols=['Na','Ca','Type'])

# 为了决策树图示简洁我们尽量减少分类,和特征值
dfs = df[df.Type < 3]

# 获取特征值
X = dfs[dfs.columns[:-1]].values
# 获取标签值
y = dfs['Type'].values - 1

# 将数据37分为测试集合训练集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2018)
#### 模型训练 ####
# 决策树模型
dt_model = DecisionTreeClassifier(random_state=2018)

# 训练模型
dt_model.fit(X_train, y_train)

# 对测试集做预测
pre_y = dt_model.predict(X_test)

####模型评估####
# 混淆矩阵
confusion_m = confusion_matrix(y_test, pre_y)

df_confusion_m = pd.DataFrame(confusion_m, columns=['0', '1'], index=['0', '1'])

df_confusion_m.index.name = 'Real'
df_confusion_m.columns.name = 'Predict'

df_confusion_m

 Output:

# 获取决策树的预测概率
y_score = dt_model.predict_proba(X_test)

# ROC
fpr, tpr, thresholds = roc_curve(y_test, y_score[:, [1]])

# AUC
auc_s = auc(fpr, tpr)

# 准确率
accuracy_s = accuracy_score(y_test, pre_y)

# 精准度
precision_s = precision_score(y_test, pre_y)

# 召回率
recall_s = recall_score(y_test, pre_y)

# F1得分
f1_s = f1_score(y_test, pre_y) 

# 评估数据制表
df_metrics = pd.DataFrame([[auc_s, accuracy_s, precision_s, recall_s, f1_s]], columns=['auc', 'accuracy', 'precision', 'recall', 'f1'], index=['结果'])

df_metrics

 Output:

#### 可视化ROC##### 
plt.figure(figsize=(8, 7))
plt.plot(fpr, tpr, label='ROC')  # 画出ROC曲线
plt.plot([0, 1], [0, 1], linestyle='--', color='k', label='random chance')  
# 画出随机状态下的准确率线
plt.title('ROC')  # 子网格标题
plt.xlabel('false positive rate')  # X轴标题
plt.ylabel('true positive rate')  # y轴标题
plt.legend(loc=0)
plt.savefig('x.png')

 Output:

 

import os
os.environ["PATH"] += os.pathsep + 'D:\\Program Files (x86)\\Graphviz2.38\\bin'  #注意修改你的路径
####保存决策树桂枝图为pdf####
# 决策树规则生成dot对象
dot_data = export_graphviz(dt_model, max_depth=5, feature_names=dfs.columns[:-1], filled=True, rounded=True)

# 通过pydotplus将决策树规则解析为图形
graph = pydotplus.graph_from_dot_data(dot_data)

# 将决策树规则保存为PDF文件
graph.write_pdf('G:\\Data Scientist Learning\\tree.pdf')
# 保存为jpg图片
graph.write_jpg('G:\Data Scientist Learning\\DecisionTree_sample.jpg')

Please note this line of code in the above source code:

 

import os
os.environ["PATH"] += os.pathsep + 'D:\\Program Files (x86)\\Graphviz2.38\\bin'  #注意修改你的路径

Output:

Open the generated PDF file, or JPG picture, you can see the Decision Tree below:

 

Interpretation

Published 646 original articles · praised 198 · 690,000 views

Guess you like

Origin blog.csdn.net/seagal890/article/details/105186780