Select the model class
In Scikit-Learn classification tree algorithm are stored in a tree module, belongs to the class of the specific algorithm DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
DecisionTreeClassifier? # Check method parameters and output attributes
Parameter | explanation
criterion: segmentation index selection, i.e., impurity measure, in addition to the default 'gini' index may also input information entropy 'entropy' for calculating the information gain
splitter: segmentation strategy, default is not a measure of the purity of the fastest decline as the basis for segmentation segmentation, that is 'best'
max_depth: Select the maximum depth of the tree, if the set it is actually equivalent to forcibly set the convergence condition (i.e. stretching tree layers), the default is None, i.e., extended to all the leaf nodes contain only up number min_samples_split
min_samples_split: cutting a leaf node is further desired minimum sample time-sharing, the default value is 2, below which the segmentation will not be further
min_samples_leaf: Minimum Samples leaf node, the default value is 1, if less than this number will prune
Property Description section:
Attribute interpretation
classes_: dataset tag Column Column Name
the Feature importances : Characteristics importance ranking
n- Features : number of feature
Selection model hyper-parameters
May be considered to strengthen the convergence condition, and other modifications do not measure the purity, of course, you can use with the default parameter settings over directly instantiate operation
clf = DecisionTreeClassifier()
from sklearn.datasets import load_iris iris = load_iris()
iris.data # View Data feature column
iris.target # View data labels column
Cutting data set
sk-learn the partitioned data set function in model_selection class. Also note that, due to the need to model the array, so the training set and test set feature data and tag data still need to be kept separate
from sklearn.model_selection import train_test_split xtrain, xtest, ytrain, ytest = train_test_split(iris.data, iris.target, random_state=42)
Meanwhile random_state = 42 is used to set several sub-submenus random, may be considered for setting input parameters test_size dataset
Segmentation ratio, if its settings and default to 3: the ratio of training set and a test set of division. Referring more parameters related functions ⽅ method described train_test_split
Trainer
clf.fit (xtrain, ytrain) score = clf.score (the XTEST, ytest) # return prediction accuracy score
We can make use Graphviz module Lilly Export decision tree model, you need to use Graphviz before the First Time
Installation, if used for Conda Python package management, can be installed conda install python-graphviz instructions directly using the following command line interface
from sklearn import tree import graphviz
The model uses export_graphviz generated as the DOT type object, which is specifically for drawing graphics rendering of
Like, in which the graphics to a wide range of settings, including label formats, such as coloring decision tree, the method can be viewed more in the help file, which is then further converted into graphic objects Graph may be generated, the object can be directly for presentation graphics
dot_data = tree.export_graphviz(clf, out_file=None)
graphviz.Source(dot_data)
For the development of the road path, file name of the file in PDF format
graph = graphviz.Source(dot_data) graph.render("iris")
# Beautification pattern dot_data = tree.export_graphviz (CLF, the out_file = None, feature_names = [ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], # each variable name class_names = [ ' setosa ' , ' versicolor ' , ' virginica. ' ], # category name Filled = True, # so that each tree has a color, the lighter the color, the higher the impurity denotes Rounded = True, # shape of a block tree special_characters=True) graphviz.Source(dot_data)
clf.feature_importances_ # various features columns importance of
[*zip(['sepal length', 'sepalwidth', 'petal length', 'petal width'],clf.feature_importances_)]
classes_: Column Name column label data set
feature _ importances _: Importance Ranking wherein
n _ features _: number of feature
In [ ]:
clf.classes_
Out[8]:
Model predictions
y_clf = clf.predict(xtest)
y_clf
Model Assessment
# Accuracy assessment from sklearn.metrics Import accuracy_score accuracy_score (ytest, y_clf)
All function displays
from sklearn.tree Import DecisionTreeClassifier # imported model the CLF = DecisionTreeClassifier (Criterion = " Entropy " , random_state = 30) # be convenient, pay attention this time we adjusted the parameters from sklearn.model_selection Import train_test_split xtrain, the XTEST, ytrain, ytest = train_test_split ( iris.data, iris.target, 42 is random_state =) # training set and test set partitioning in clf.fit (xtrain, ytrain) # training model score = clf.score (the XTEST, ytest) # return prediction accuracy score
# Beautification pattern dot_data = tree.export_graphviz (CLF, the out_file = None, feature_names = [ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], # each variable name class_names = [ ' setosa ' , ' versicolor ' , ' virginica. ' ], # category name Filled = True, # so that each tree has a color, the lighter the color, the higher the impurity denotes Rounded = True, # shape of a block tree special_characters=True) graphviz.Source(dot_data)
# Save the tree image Graph = graphviz.Source (dot_data) graph.render ( " IRIS " )
c
lf.feature_importances_ # importance of various characteristics columns [* ZIP ([ ' sepal length ' , ' sepalwidth ' , ' Petal length ' , ' Petal width ' ], clf.feature_importances_)]
Out[55]:
= clf.predict y_clf (the XTEST) # Model Predictive
# Accuracy assessment from sklearn.metrics Import accuracy_score accuracy_score (ytest, y_clf)