decision tree
1. Classification tree
sklearn decision tree-classification tree:
corresponding code:
form sklearn import tree
tr=tree.DecisionTreeClassifier()
tr=tr.fit(x_train,t_train)//训练
re=tr.score(x_test,y_test)//测试,re可以返回测试的信息
1. Classification tree parameters
- criterion determines the calculation method of impurity (that is, the best node and branch method of the classification tree)
entropy information entropy: the difference between the information entropy of the parent node and the child node
gini Gini coefficient:
the default is gini.
code show as below:
from sklearn import tree
from sklearn.datasets import load_wine as lw
from sklearn_selection import train_test_split as tts
wine=lw()
//wine.data //数据
//wine.target //标签
import pandas as pd
pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axix=1)
//wine.feature_names //特征名
//wine.target_names //标签名
x_train,x_test,y_train,y_test=tts(wine.data,wine.target,test_size=0.3)
//x_train.shape //训练集的数量 (从这里可以得知class_names的数量)
//建模
tr=tree.DecisionTreeClassifier(criterion="entropy")
tr=tr.fit(x_train,y_train)//训练
re=tr.score(x_test,y_test)//测试,re可以返回测试的信息
print("score" +score) //score就是精确值
The classification tree display code is as follows:
import graphviz
feature_name=['jd','dmig','jsduhgv']
dot_data=tree.export_graphviz(tr,feature_names=feature_name,class_names=[自己写],filled=True,rounded=True)
graph=graphviz.Source(dot_data)
print(graph)
Check the importance of attributes (feature).
tr.feature_importances_
\\或者
[*zip(feature_name,tr.feature_importances_)]
- random_state & splitter
control the classification tree to make the model stable.
splitter='random' branches randomly to prevent overfitting
splitter='best' gives priority to more important features and then branches
random_state=30
The code is as follows:
2. Pruning
This makes the decision tree have better generalization.
- max_depth: All branches exceeding the set depth will be cut off
- min_samples_leaf & min_samples_split: Each child node contains min_samples nodes
- max_feature<= & min_impurity_decrease: difference value of information entropy