决策树(Decision Tree)应用

1.python机器学习的库:scikit-learn

   1.1 特性:

        简单高效的数据挖掘和机器学习分析;对所有用户开放,根据不同需求高度可重用性;基于Numpy,Scipy和matplotlib

   1.2 覆盖问题领域:

       分类;回归;聚类;降维;模型选择;预处理

2.使用scikit-learn

  安装scikit-learn:pip,easy_install,windows installer

 安装必要的package:numpy,scipy和matplotlib,可使用anaconda(包含numpy,scipy等科学计算常用的package)

我用的是Pycharm2017+anaconda3(包含了机器学习的库scikit-learn,和常用的package,可在cmd->conda list中查看)

3.例子:

RID age income student credit_rating class:buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 youth medium no excellent no

安装Graphviz:http://www.graphviz.org/

配置环境变量:cmd->env

转化dot至pdf可视化决策树:安装了Graphviz,但一直没在cmd中转化出来,求指点!

源码:


3.例子

from sklearn.feature_extraction import DictVectorizer
import csv
from sklearn import preprocessing
from sklearn import tree
from sklearn.externals.six import StringIO

# 读取文件
with open(r'E:\PycharmProjects\python\Decesiopn Tree\Allelectronicsdate.csv', 'r') as allElectronicsData:
    allElectronicsData = open(r'E:\PycharmProjects\python\Decesiopn Tree\Allelectronicsdate.csv', 'r')
    reader = csv.reader(allElectronicsData)
    headers = next(reader)
    print(headers)

featureList = []
labelList = []
for row in reader:
    labelList.append(row[len(row) - 1])
    rowDict = {}
    for i in range(1, len(row) - 1):
        rowDict[headers[i]] = row[i]
    featureList.append(rowDict)
print(featureList)
allElectronicsData.close()

# 转化数据为sklearn要求的数据
vec = DictVectorizer()
dummyX = vec.fit_transform(featureList).toarray()
print("dummyX:" + str(dummyX))
print(vec.get_feature_names())
print("labelList:" + str(labelList))

lb = preprocessing.LabelBinarizer()
dummyY = lb.fit_transform(labelList)
print("dummyY:" + str(dummyY))
## 决策树处理
clf = tree.DecisionTreeClassifier(criterion='entropy')
clf = clf.fit(dummyX, dummyY)
print("clf:" + str(clf))
## 将决策树输出位dot文件
with open("E:\PycharmProjects\python\Decesiopn Tree\DecisionTree.dot", 'w') as f:
    f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f)
oneRowX = dummyX[0, :]
print("oneRowX:" + str(oneRowX))
## 给定新的数据进行预测
newRowX = oneRowX
newRowX[0] = 1
newRowX[2] = 0
print("newRowX:" + str(newRowX))

predictedY = clf.predict(newRowX)
print("predictedY:" + str(predictedY))

生成的决策树:


猜你喜欢

转载自blog.csdn.net/weixin_41790863/article/details/80962175
今日推荐