Decision tree 1. Classification tree 1. Classification tree parameters 2. Pruning

decision tree

1. Classification tree

sklearn decision tree-classification tree:
Insert image description here
corresponding code:

form sklearn import tree

tr=tree.DecisionTreeClassifier()
tr=tr.fit(x_train,t_train)//训练
re=tr.score(x_test,y_test)//测试,re可以返回测试的信息

1. Classification tree parameters

  • criterion determines the calculation method of impurity (that is, the best node and branch method of the classification tree)
    entropy information entropy: the difference between the information entropy of the parent node and the child node
    gini Gini coefficient:
    the default is gini.
    Insert image description here
    code show as below:
from sklearn import tree
from sklearn.datasets import load_wine as lw
from sklearn_selection import train_test_split as tts

wine=lw()
//wine.data //数据
//wine.target //标签

import pandas as pd
pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axix=1)

//wine.feature_names //特征名
//wine.target_names //标签名

x_train,x_test,y_train,y_test=tts(wine.data,wine.target,test_size=0.3)

//x_train.shape //训练集的数量 (从这里可以得知class_names的数量)

//建模
tr=tree.DecisionTreeClassifier(criterion="entropy")
tr=tr.fit(x_train,y_train)//训练
re=tr.score(x_test,y_test)//测试,re可以返回测试的信息

print("score" +score) //score就是精确值

The classification tree display code is as follows:

import graphviz
feature_name=['jd','dmig','jsduhgv']
dot_data=tree.export_graphviz(tr,feature_names=feature_name,class_names=[自己写],filled=True,rounded=True)
graph=graphviz.Source(dot_data)
print(graph)

Check the importance of attributes (feature).

tr.feature_importances_
\\或者
[*zip(feature_name,tr.feature_importances_)]
  • random_state & splitter
    control the classification tree to make the model stable.
    splitter='random' branches randomly to prevent overfitting
    splitter='best' gives priority to more important features and then branches
    random_state=30
    The code is as follows:Insert image description here

2. Pruning

This makes the decision tree have better generalization.

  • max_depth: All branches exceeding the set depth will be cut off
  • min_samples_leaf & min_samples_split: Each child node contains min_samples nodes
  • max_feature<= & min_impurity_decrease: difference value of information entropy

Guess you like

Origin blog.csdn.net/qq_53982314/article/details/131170338