"Introduction to Data Mining" experimental class - third experiment, data mining decision tree

Third experiment, data mining decision tree

First, the purpose of the experiment

1. familiar with the principles of the decision tree,

2. The method of generating the master process of decision tree

Second, the experimental tool

1. Anaconda

2. sklearn

3. pydotplus

Third, the experimental introduction

A decision tree is a unsupervised learning method parameters, primarily used for classification and regression. The goal of the algorithm is characterized by extrapolating data, learn decision rules to create a model to predict the target variable.

Fourth, the experiment content

1. Create your own at least two vectors, each vector of at least one attribute and a class label to generate a vector in accordance with a decision tree, the decision tree using the prediction. Such as:

from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) clf.predict_proba([[2., 2.]]) #计算属于每个类的概率

It requires randomly generated data as required, and a decision tree, and an example prediction.
image.png

2. Building a decision tree for iris data,

(1) method call data is as follows:

from sklearn.datasets import load_iris iris = load_iris()# 从sklearn 数据集中获取鸢尾花数据。

(2) to establish a decision tree using decision tree sklearn iris data in
(3) In order to be able to visually see the decision tree built, mounting pydotplus, as follows:

pip install pydotplus

pydotplus use

import pydotplus #引入pydotplus dot_data = tree.export_graphviz(clf, out_file=None) graph = pydotplus.graph_from_dot_data(dot_data) graph.write_pdf("iris.pdf")#将图写成pdf文件

Code Display

image.png

Tree effect pdf

image.png

(4) (OPTIONAL) without the use of decision tree sklearn, write your own decision tree builder (recommended python language), and to build decision trees iris data.

Five experiments summarized (write harvest this experiment, problems encountered, etc.)

通过本次实验,了解了Python中构建决策树的函数方法,并用鸢尾花数据集的可视化看到了生成的决策树效果图。困难在于不太理解决策时具体的构建过程,经过多次试验,自动构建的决策树和自己预测的规律结果是保持一致的,这说明了决策树的实用性。下一步应该学习理解自己构建出决策树。

Reproduced in: https: //www.cnblogs.com/wonker/p/11062683.html

Guess you like

Origin blog.csdn.net/weixin_34411563/article/details/93709073