【数据挖掘】决策树初步应用

说在前面

数据处理

python code

import numpy as np  
import scipy as sp  
from sklearn import tree  
from sklearn.metrics import precision_recall_curve  
#决策树的基本操作
from sklearn.metrics import classification_report  
from sklearn.model_selection import train_test_split

#数据读入 
data   = []  
labels = []  
with open("file.txt",encoding="utf-8") as ifile:  
        for line in ifile:  
            tokens = line.strip().split(';')  
            bol = 0
            if tokens[0] == '雨':
                bol = 1
            data_elem=[bol,float(tokens[1]),float(tokens[2])]
            #print(data_elem)
            data.append(data_elem)
            labels.append(tokens[3].rstrip())   

x = np.array(data)  
labels = np.array(labels)  
y = np.zeros(labels.shape)  

#标签转换为浮点数  
y[labels=='[0,10)']=0
y[labels=='[10,60)']=1
y[labels=='[60,-)']=2
  
# 拆分训练数据与测试数据   
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.4)  
  
# 核心代码:使用信息熵作为划分标准,对决策树进行训练 
clf = tree.DecisionTreeClassifier(criterion='entropy')   
clf.fit(x_train, y_train)

# 测试结果
answer = clf.predict(x_test) 
print(classification_report(y_test, answer))

结果

  • 测试集结果
    准确率不是很高,可能算法不好,或者数据之间联系不大,俺也木有办法啊
    在这里插入图片描述

  • 可视化
    在这里插入图片描述
    可视化需要

    pip install graphviz
    

    并且安装 这个
    在这里插入图片描述

    import graphviz
    import os
    
    # 可视化决策树,放在最后
    os.environ['PATH'] += os.pathsep + 'D:/Program Files (x86)/Graphviz/bin'
    dot_data = tree.export_graphviz(clf, out_file=None,
                                    feature_names=["PH","elect"],
                                    class_names=["[0,10)","[10,60)","[60,-)"],
                                    filled=True, rounded=True)
    graph = graphviz.Source(dot_data)
    graph.format = 'png'
    graph.render("water", view=True)
    
发布了106 篇原创文章 · 获赞 41 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/qq_33446100/article/details/101925270