A - Scikit-Learn implemented decision tree

 

Select the model class

In Scikit-Learn classification tree algorithm are stored in a tree module, belongs to the class of the specific algorithm DecisionTreeClassifier

In [1]:
from sklearn.tree import DecisionTreeClassifier
In [2]:
DecisionTreeClassifier? # Check method parameters and output attributes
Part of the algorithm parameters:
 

Parameter | explanation

criterion: segmentation index selection, i.e., impurity measure, in addition to the default 'gini' index may also input information entropy 'entropy' for calculating the information gain

splitter: segmentation strategy, default is not a measure of the purity of the fastest decline as the basis for segmentation segmentation, that is 'best'

max_depth: Select the maximum depth of the tree, if the set it is actually equivalent to forcibly set the convergence condition (i.e. stretching tree layers), the default is None, i.e., extended to all the leaf nodes contain only up number min_samples_split

min_samples_split: cutting a leaf node is further desired minimum sample time-sharing, the default value is 2, below which the segmentation will not be further

min_samples_leaf: Minimum Samples leaf node, the default value is 1, if less than this number will prune

 

Property Description section:

 

Attribute interpretation

classes_: dataset tag Column Column Name

the Feature  importances  : Characteristics importance ranking

n-  Features  : number of feature

 

Selection model hyper-parameters

May be considered to strengthen the convergence condition, and other modifications do not measure the purity, of course, you can use with the default parameter settings over directly instantiate operation

In [3]:
clf = DecisionTreeClassifier()
Scikit-Learn module itself provides a stored data set, may also be invoked in sklearn.datasets sklearn built iris data set
In [4]:
from sklearn.datasets import load_iris
iris = load_iris()
In [5]:
iris.data # View Data feature column
 . .
In [6]:
iris.target # View data labels column
Out[6]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
 

Cutting data set

sk-learn the partitioned data set function in model_selection class. Also note that, due to the need to model the array, so the training set and test set feature data and tag data still need to be kept separate

In [7]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(iris.data, iris.target,
random_state=42)
Meanwhile random_state = 42 is used to set several sub-submenus random, may be considered for setting input parameters test_size dataset

Segmentation ratio, if its settings and default to 3: the ratio of training set and a test set of division. Referring more parameters related functions ⽅ method described train_test_split

 

Trainer

In [8]:
clf.fit (xtrain, ytrain) 
score = clf.score (the XTEST, ytest) # return prediction accuracy 
score
Out[8]:
1.0
 

We can make use Graphviz module Lilly Export decision tree model, you need to use Graphviz before the First Time

Installation, if used for Conda Python package management, can be installed conda install python-graphviz instructions directly using the following command line interface

In [9]:
from sklearn import tree
import graphviz

The model uses export_graphviz generated as the DOT type object, which is specifically for drawing graphics rendering of

Like, in which the graphics to a wide range of settings, including label formats, such as coloring decision tree, the method can be viewed more in the help file, which is then further converted into graphic objects Graph may be generated, the object can be directly for presentation graphics

In [11]:
dot_data = tree.export_graphviz(clf, out_file=None)
graphviz.Source(dot_data)

Render method for generating end-use pattern, the method requires passage path parameter input file, address, and then save the graphical object

For the development of the road path, file name of the file in PDF format

In [24]:
graph = graphviz.Source(dot_data)
graph.render("iris")
Out[24]:
'Iris.pdf'
In [12]:
# Beautification pattern 
dot_data = tree.export_graphviz (CLF, the out_file = None, 
feature_names = [ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], # each variable name 
class_names = [ ' setosa ' , ' versicolor ' , ' virginica. ' ], # category name 
Filled = True, # so that each tree has a color, the lighter the color, the higher the impurity denotes 
Rounded = True, # shape of a block tree
special_characters=True)
graphviz.Source(dot_data)

 
It is a binary classification process, and Gini index for segmentation standards, characterized among petal width significantly effect the classification
In [13]:
clf.feature_importances_ # various features columns importance of
Out[13]:
array([0.        , 0.01787567, 0.89974604, 0.08237829])
In [14]:
[*zip(['sepal length', 'sepalwidth', 'petal length', 'petal width'],clf.feature_importances_)]
Out[14]:
[('sepal length', 0.0),
 ('sepalwidth', 0.017875668342510573),
 ('petal length', 0.8997460415815836),
 ('petal width', 0.08237829007590591)]
In [15]:
classes_: Column Name column label data set 
feature _ importances _: Importance Ranking wherein 
n _ features _: number of feature

In [ ]:

clf.classes_

Out[8]:

array([0, 1, 2])
 

Model predictions

In [16]:
y_clf = clf.predict(xtest)
In [17]:
y_clf
Out[17]:
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0])
 

Model Assessment

In [51]:
# Accuracy assessment 
from sklearn.metrics Import accuracy_score 
accuracy_score (ytest, y_clf)
Out[51]:
1.0
 

All function displays

In [53]:
from sklearn.tree Import DecisionTreeClassifier # imported model 
the CLF = DecisionTreeClassifier (Criterion = " Entropy " , random_state = 30) # be convenient, pay attention this time we adjusted the parameters 
from sklearn.model_selection Import train_test_split 
xtrain, the XTEST, ytrain, ytest = train_test_split ( iris.data, iris.target, 42 is random_state =) # training set and test set partitioning in 
clf.fit (xtrain, ytrain) # training model 
score = clf.score (the XTEST, ytest) # return prediction accuracy 
score
Out[53]:
0.9736842105263158
In [54]:
# Beautification pattern 
dot_data = tree.export_graphviz (CLF, the out_file = None, 
feature_names = [ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], # each variable name 
class_names = [ ' setosa ' , ' versicolor ' , ' virginica. ' ], # category name 
Filled = True, # so that each tree has a color, the lighter the color, the higher the impurity denotes 
Rounded = True, # shape of a block tree
special_characters=True)
graphviz.Source(dot_data)
# Save the tree image 
Graph = graphviz.Source (dot_data) 
graph.render ( " IRIS " )
In [55]:
c
lf.feature_importances_ # importance of various characteristics columns 
[* ZIP ([ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], clf.feature_importances_)]

Out[55]:

[('sepal length', 0.0),
 ('sepalwidth', 0.04843665046047095),
 ('petal length', 0.3254214588163632),
 ('petal width', 0.6261418907231658)]
In [56]:
= clf.predict y_clf (the XTEST) # Model Predictive
In [57]:
# Accuracy assessment 
from sklearn.metrics Import accuracy_score 
accuracy_score (ytest, y_clf)
Out[57]:
0.9736842105263158
 

Guess you like

Origin www.cnblogs.com/Koi504330/p/11911153.html