Decision tree - from theory to practice

decision tree

Basic knowledge

basic concept

Definition: Decision tree is a common type of machine learning method, which is a tree structure that describes the classification of instances. (A given instance has a series of characteristics, and the result is judged according to these characteristics)

Description: A decision tree consists of nodes and directed edges. Nodes can be divided into: root nodes, internal nodes, leaf nodes

​ Root node: complete set of samples

​ Internal node: a feature or an attribute test

​ Leaf node: decision result

Decision test sequence: The path from the root node to each leaf node corresponds to a decision test sequence

Purpose: To generate a decision tree with strong generalization ability, that is, strong ability to deal with unseen examples.

Analysis and Thinking

Now use the decision tree of how to judge a good melon or a bad melon in the watermelon book to understand:

Watermelon dataset:

insert image description here

Watermelon decision tree diagram:

insert image description here

Looking at the above data set and the generated decision tree, we will find the following problems:

  • Isn't this if-else, why bother?
  • Why is the watermelon decision tree judged according to the texture for the first time?
  • Why is the watermelon decision tree judging the sense of touch in the second layer, and the feature of judging the sense of touch in the fourth layer tree?
  • In the data set, there are two features of knocking sound and navel, why are they not reflected in the decision tree?
  • How does the size of the watermelon data set affect the decision tree?

Let me talk about the first question first: Isn't this a simple if-else that can be done?

First of all, the origin of the decision tree is so simple, it is derived from the branch structure if-else in the data organization, the reason why we use the tree structure is because it brings efficiency improvements, and can effectively allow us to optimize to get a Nice result.

Then, please continue to look down with the above questions, and I will do my best to help you understand the decision tree.

Decision tree prerequisite knowledge points

Information gain, gain ratio, Gini index

For the second question above: Why is the watermelon decision tree judged according to the texture for the first time ? That must be because the texture can help us better distinguish which is a good melon or a bad melon.

The purpose of decision-making is for better classification . If all the features bring us 50% good melons and 50% bad melons, then the decision-making is meaningless. When buying any watermelon, there are only these two kinds. situation (either good or bad), so the decision class of our decision tree from the root node to the leaf node must be lower and lower . Watermelon is a good melon or a bad melon.

From this, we can know that the key to decision tree learning lies in how to select the most partitioning attributes.

Next, three classic attribute division methods are introduced:

Information Gain (ID3)

To understand information gain, you need to know what information entropy is.

information entropy

Definition: One of the most commonly used indicators to measure the purity of a sample set. Assuming that the proportion of samples of class K in the current sample set D is pk (K = 1, 2, 3, 4...|y|), then the information entropy E nt ( D ) of D
= − ∑ k = 1 ∣ y ∣ pk log ⁡ 2 pk Ent(D) = -\sum_{k=1}^{|y|} p_{k}\log_{2}{p_{k}}Ent(D)=k=1ypklog2pk

  • pk is the number of classification results. If it is a binary classification, |y| = 2. In the watermelon example, it is a good melon and a bad melon.
  • unit: bits
  • The smaller the value of Ent(D), the higher the purity of D
  • p = 0, then plogp = 0 (convention)
  • The minimum value of Ent(D) is 0, and the maximum value is log|y|

After understanding the information entropy, we can understand the information gain.

Information Gain :

Discrete attribute a has V possible values ​​{ , , ..., }, if divided by a, V branch nodes will be generated, and the vth branch node contains all the values ​​of attribute a in D is a sample of , denoted as . a^1a^2a^va^vD^v

To put it bluntly: Which feature color comes out (take watermelon discrimination as an example), there are three types: green (6/17), jet black (6/17), and light white (5/17), and calculate the information of these three Entropy, and then add and sum proportionally.

Then the information gain obtained by dividing the sample set D with attribute a can be calculated: Gain
( D , a ) = E nt ( D ) − ∑ v = 1 V ∣ D v ∣ ∣ D ∣ E nt ( D v ) Gain(D, a) = Ent(D) - \sum_{v=1}^{V}\frac{|D^{v}|}{|D|} Ent(D^{v} )Gain(D,a)=Ent(D)v=1VDDvEnt(Dv )
Generally speaking: the greater the information gain, the greater the "purity improvement" obtained by dividing the practical attribute a. (To put it bluntly: the greater your information gain, the more likely you are to use it to classify first.)

But there is a problem:

There is generally a "number" in the data set (you can scroll up to see some screenshots). If you also use it as a candidate division attribute, then its information gain will be greater than other attributes.

Each number (1, 2, 3...|y|) is an instance in the "number" group, |y| branches, so the information entropy of each number is as follows: E nt (
D v ) = ( − ( 1 log ⁡ 2 1 + 0 log ⁡ 2 0 ) ) Ent(D^{v}) = (-(1\log_{2}{1} + 0\log_{2}{0}))Ent(Dv)=((1log21+0log20))

E n t ( D v ) = 0 Ent(D^{v}) = 0 Ent(Dv)=0

You can try to calculate the following (I won’t go into details here). From the information gain formula, it can be seen that when “Ent(D) - 0” is the time when the information gain is the largest.

Note: Here 0 is not in the domain of the logarithmic function, but this is a convention, so don't worry too much about its mathematical meaning.

It can be seen that information gain has a preference for attributes with a large number of values!

In this situation, a new concept is proposed, called - gain ratio.

Gain ratio (C4.5)

公式:
G a i n r a t i o ( D , a ) = G a i n ( D , a ) I V ( a ) Gain_ratio(D, a) = \frac{Gain(D, a)}{IV(a)} Gainratio(D,a)=I V ( a )Gain(D,a)

I V ( a ) = − ∑ V v = 1 ∣ D v ∣ ∣ D ∣ log ⁡ 2 ∣ D v ∣ ∣ D ∣ IV(a) = -\sum_{V}^{v=1}\frac{|D^{v}|}{|D|} \log_{2}{\frac{|D^{v}|}{|D|}} I V ( a )=Vv=1DDvlog2DDv

IV(a): It is called the inherent value of attribute a, the more possible values ​​of attribute a, the larger IV(a) is usually.

From the formula of the gain rate, the gain rate criterion has a preference for attributes with fewer possible values.

The method of dividing attributes: first find out the attributes whose information gain is higher than the average level from the candidate dividing attributes, and select the one with the highest gain rate.

Gini Index (CART)

Definition of Gini value : Suppose D has K classes, and the probability that the sample point belongs to the kth class is pk, then the Gini value of the probability distribution is defined as: G ini (
D ) = ∑ k = 1 K pk ( 1 − pk ) = 1 − ∑ k = 1 K pk 2 Gini(D) = {\textstyle \sum_{k=1}^{K}}p_{k}(1-p_{k}) = 1- {\textstyle \sum_{k =1}^{K}}p_{k}^2G i n i ( D )=k=1Kpk(1pk)=1k=1Kpk2
Function: It reflects the probability that two samples are randomly drawn and their category labels are inconsistent

Note: The smaller the Gini(D), the higher the purity of the data set D

Given a data set D, the Gini index of attribute a is defined as:
G iniindex ( D , a ) = ∑ v = 1 V ∣ D v ∣ ∣ D ∣ G ini ( D v ) Gini_{index}(D,a) = {\textstyle \sum_{v=1}^{V}}\frac{|D^v|}{|D|}Gini(D^v)Giniindex(D,a)=v=1VDDvG i n i ( Dv )
Division strategy: in the candidate attribute set A, select the attribute that makes the Gini index the smallest after division as the optimal division attribute

I believe that after seeing this, many people also have the answer to the third question: "Why does the watermelon decision tree judge the touch in the second layer, but also judge the touch in the fourth layer tree?"

Because the optimal effect obtained by the above three division methods changes each time, when the texture of the second layer is slightly blurred, the quality of the melon can be directly judged by the touch. In another decision sequence, it may be necessary to judge other things first in order to more accurately determine whether the melon is good or bad.

Pruning

  • Why pruning?

    Combating "overfitting" in decision tree learning algorithms.

  • What are the pruning strategies?

    • Pre-pruning: In the process of decision tree generation, each node is estimated before division. If the division of the current node cannot improve the generalization performance of the decision tree, the division is stopped and the current node is marked as leaf nodes.
    • Post-pruning: first generate a complete decision tree from the training set, and then examine the non-leaf nodes from the bottom up. If the subtree corresponding to the node is replaced with a leaf node, the decision tree can be generalized To improve performance, replace the subtree with a leaf node.
  • How to judge the performance after pruning?

    • Hold-out method: Set aside a portion of the data as a "validation set" for performance evaluation.

Let's take a look at how it works in the watermelon book:

The first is data preparation. The double horizontal line is the training set, and the double horizontal line is the verification set of the hold-out method.

insert image description here

The decision tree generated according to the above figure is as follows:

insert image description here

pre-pruning

insert image description here

I believe you have been inspired when you see this picture: Isn't this just that I take my verification set in for a test, and then look at the accuracy of the divided verification set before and after division. What are his steps?

  • from root - - - > leaf
  • First, according to the decision, turn your child node into an exact classification result (for example: there are 10 data in the navel depression, 7 good, 3 bad, and analyze this child node as a good melon )
  • Then put the verification set data into it for verification. If the accuracy before division < the accuracy after division, then division is required.
    • The accuracy before the root node is divided is determined by the verification set, and the label is directly judged for simple binary classification.
    • The pre-division precision of a child node is the verified precision of its parent node.

effect:

  • Many branches are not "expanded", which reduces the risk of overfitting, and also reduces the training time overhead and test time overhead of the decision tree.
  • On the basis of pre-pruning, it is beneficial for subsequent division to improve performance.

Disadvantages: There may be a risk of underfitting .

post pruning

The process is as follows:
insert image description here

step:

  • from leaf - - - > root
  • First, an internal node is turned into a leaf node, and then according to the classification result of the judgment sequence in the training data set, the larger classification result is used as a leaf node.
  • Then judge the accuracy before and after pruning, if the accuracy after pruning > the accuracy before pruning, then pruning.
    • The verification accuracy of the first leaf node is controlled by the verification set.
    • The pruned accuracy of the parent node is determined by the training set.

effect:

  • More branches are preserved than pre-pruning.
  • The risk of underfitting is small, and the generalization ability is better than pre-pruning.
  • The training overhead is much larger than that of unpruned and pre-pruned.

the code

Pseudocode (basic algorithm)

输入:训练集D={
    
    (x1, y1),(x2, y2),.. . , (xm,ym)};
		属性集A= {
    
    a1, a2,.. . , ad}
过程:过程:函数TreeGenerate(D, A)
1:生成结点node;
2: if D中样本全属于同一类别C then
3:      将node标记为C类叶结点;return
4: end if
5: if A = 空集  OR D中样本在A上取值相同 then
6:		将node标记为叶结点,其类别标记为D中样本数最多的类; return
7: end if
8: 从A中选择最优划分属性a*;
9: for a*的每一个值a*' do
10:		为node 生成一个分支;令Dv表示D中在a*上取值为a*'的样本子集;
11:		if Dv为空 then
12:			将分支结点标记为叶结点,其类别标记为D中样本最多的类; return
13:		else
14:			以 TreeGenerate( Dv, A\{
    
    a*})为分支结点
15:		end if
16: end for
输出:以node为根结点的一棵决策树

It can be seen from the above figure that the decision tree is a recursive process, and the following three situations will lead to recursive return:

  • 1. The samples contained in the current node all belong to the same category and do not need to be divided.
  • 2. The current attribute set is empty, or all samples have the same value on all attributes, and cannot be divided.
  • 3. The sample set contained in the current node is empty and cannot be divided.

Using sklearn to implement a decision tree - taking the iris data set as an example

# 决策树分类器
sklearn.tree.DecisionTreeClassifier(criterion = , max_depth= , random_state =)
# criterion:默认是gini指数,也可以选择其他的,例如信息熵:"entropy"
# max_depth:树的深度
# random——state:随机数种子
# 决策树可视化,导数DOT格式
sklearn.tree.export_graphviz(estimator, out_file='*.dot', feature_names=[","])
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from six import StringIO
import pydotplus
def decisiion_iris():
    iris = load_iris()
    x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22)
    estimator = DecisionTreeClassifier(criterion="entropy")
    estimator.fit(x_train, y_train)

    y_predict = estimator.predict(x_test)
    print("y_predict:\n", y_predict)
    print("直接比较真实值和预测值:\n", y_test == y_predict)

    score = estimator.score(x_test, y_test)
    print("准确率为:\n", score)

    # 可视化
    dot_data = StringIO()
    export_graphviz(estimator, out_file=dot_data)
    graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
    # 将结果存入到png文件中
    graph.write_png('diabetes.png')
    graph.write_pdf('diabetes.pdf')
    # 显示
    Image(graph.create_png())
    return None

Generate .dot files, and convert .dot files to png format and generate pdf files

The generated png file is:

insert image description here

Note: If you want to generate pdf files and png images, you must download them windows_10_cmake_Release_graphviz-install-7.0.1-win64. Don’t try to download them with pip and conda, because this requires configuration of the environment, which is useless. Download link: Click here | Graphviz

It is best to directly click Help to add the path during the download, and the installation process will not be introduced too much.

After downloading, you can print and input. If there is an error, you can first check whether your environment variables are installed, and you can enter the .dot/pdf file through cmd for command line operations. Convert dot files to png images dot -Tpng *.dot -o *.png.

The underlying implementation of the code

Through the book "Machine Learning in Practice" and referring to other people's code, a decision tree code was carried out.

Keyerror:list indices must be integers or slices, not str The data set is built into the code. The reason is that I have been reporting errors because all my files have been entered .

The dataset has three characteristics:

Dataset Explanation:

By establishing three characteristics of 'temperature', 'number of people staying in bed' and 'whether hungry', among them (temperature: 3 hot, 2 moderate, 1 cold); hunger degree: (1 hungry, 0 not hungry) to judge what you want Don't stay in bed! ! !

import numpy as np
import operator
import math
# 按照给定特征划分数据集
#dataSet:待划分数据集   axis:划分数据集的特征   value:需要返回的特征的值
def splitDataSet(dataSet,axis,value):  
    retDataSet=[]                                  
    for featVec in dataSet:                            #遍历元素
        if featVec[axis] == value:                     #符合条件的,抽取出来
            reducedFeatVec = featVec[:axis]
            reducedFeatVec.extend(featVec[axis+1:])
            retDataSet.append(reducedFeatVec)
    return retDataSet
# 计算给定数据集的信息熵
import math
def calcShannonEnt(dataSet):
    numEntries = len(dataSet)                          #获得数据集的行数
    labelCounts = {
    
    }                                   #用于保存每个标签出现次数的字典
    for featVec in dataSet:
        currentLabel = featVec[-1]                     #提取标签信息
        if currentLabel not in labelCounts.keys():     #如果标签未放入统计次数的字典,则添加进去
            labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1                 #标签计数
    shannonEnt = 0.0                                   #熵初始化
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries      #选择该标签的概率
        shannonEnt -= prob*math.log(prob, 2)                #根据信息熵公式计算
    return shannonEnt                                  #返回经验熵
# 选择最好的数据集划分方式
def chooseBestFeatureToSplit(dataSet):
    numFeatures = len(dataSet[0]) - 1       #特征数量
    baseEntropy = calcShannonEnt(dataSet)   #计数数据集的香农熵
    bestInfoGain = 0.0                      #信息增益
    bestFeature = -1                        #最优特征的索引值
    for i in range(numFeatures):            #遍历数据集的所有特征
        featList = [example[i] for example in dataSet]          #获取dataSet的第i个所有特征
        uniqueVals = set(featList)                              #创建set集合{},元素不可重复
        newEntropy = 0.0                                        #信息熵
        for value in uniqueVals:                                #循环特征的值
            subDataSet = splitDataSet(dataSet, i, value)        #subDataSet划分后的子集
            prob = len(subDataSet) / float(len(dataSet))        #计算子集的概率
            newEntropy += prob * calcShannonEnt((subDataSet)) 
        infoGain = baseEntropy - newEntropy                     #计算信息增益
        print("第%d个特征的信息增益为%.3f" % (i, infoGain))      #打印每个特征的信息增益
        if (infoGain > bestInfoGain):                           #计算信息增益
            bestInfoGain = infoGain                             #更新信息增益,找到最大的信息增益
            bestFeature = i                                     #记录信息增益最大的特征的索引值
    return bestFeature                                          #返回信息增益最大特征的索引值
# 统计出现次数最多的元素(类标签)
def majorityCnt(classList):
    classCount={
    
    }  #统计classList中每个类标签出现的次数
    for vote in classList:
        if vote not in classCount.keys():
            classCount[vote] = 0
        classCount[vote] += 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)    #根据字典的值降序排列
    return sortedClassCount[0][0]  #返回出现次数最多的类标签
# 创建决策树
def createTree(dataSet,labels):
    classList = [example[-1] for example in dataSet]           #取分类标签
    if classList.count(classList[0]) == len(classList):        #如果类别完全相同,则停止继续划分
        return classList[0]
    if len(dataSet[0]) == 1:                                   #遍历完所有特征时返回出现次数最多的类标签
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)               #选择最优特征
    bestFeatLabel = labels[bestFeat]                           #最优特征的标签
    myTree = {
    
    bestFeatLabel:{
    
    }}                                #根据最优特征的标签生成树
    del(labels[bestFeat])                                      #删除已经使用的特征标签
    featValues = [example[bestFeat] for example in dataSet]    #得到训练集中所有最优特征的属性值
    uniqueVals = set(featValues)                               #去掉重复的属性值
    for value in uniqueVals:                                   #遍历特征,创建决策树
        subLabels = labels[:]                                  #复制所有标签,这样树就不会弄乱现有标签
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value),subLabels)
    return myTree
import matplotlib
import matplotlib.pyplot as plt
 
# 定义文本框和箭头格式
decisionNode = dict(boxstyle="square", fc="0.8")  #boxstyle文本框样式、fc=”0.8” 是颜色深度
leafNode = dict(boxstyle="round4", fc="0.8")      #叶子节点
arrow_args = dict(arrowstyle="<-")                #定义箭头
 
# 绘制带箭头的注解
def plotNode(nodeTxt, centerPt, parentPt, nodeType):
    #createPlot.ax1是表示: ax1是函数createPlot的一个属性
    createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',xytext=centerPt,
                            textcoords='axes fraction',va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)
 
# 获取叶节点的数目和树的层数
def getNumLeafs(myTree):
    numLeafs = 0                                      # 初始化
    firstStr = list(myTree.keys())[0]                 # 获得第一个key值(根节点)
    secondDict = myTree[firstStr]                     # 获得value值
    for key in secondDict.keys():
        if type(secondDict[key]).__name__ == 'dict':  # 测试节点的数据类型是否为字典
            numLeafs += getNumLeafs(secondDict[key])  # 递归调用
        else:
            numLeafs += 1
    return numLeafs
 
# 获取树的深度
def getTreeDepth(myTree):
    maxDepth = 0                                           # 初始化
    firstStr = list(myTree.keys())[0]                      # 获得第一个key值(根节点)
    secondDict = myTree[firstStr]                          # 获得value值
    for key in secondDict.keys():
        if type(secondDict[key]).__name__ == 'dict':       # 测试节点的数据类型是否为字典
            thisDepth = 1 + getTreeDepth(secondDict[key])  # 递归调用
        else:
            thisDepth = 1
        if thisDepth > maxDepth:
            maxDepth = thisDepth
    return maxDepth
 
# 在父子节点间填充文本信息
def plotMidText(cntrPt, parentPt, txtString):
    xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0]
    yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1]
    createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)
 
# 画树
def plotTree(myTree, parentPt, nodeTxt):
    numLeafs = getNumLeafs(myTree)        # 获取树高
    depth = getTreeDepth(myTree)          # 获取树深度
    firstStr = list(myTree.keys())[0]     # 这个节点的文本标签
    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff) #plotTree.totalW, plotTree.yOff全局变量,追踪已经绘制的节点,以及放置下一个节点的恰当位置
    plotMidText(cntrPt, parentPt, nodeTxt)                #标记子节点属性
    plotNode(firstStr, cntrPt, parentPt, decisionNode)
    secondDict = myTree[firstStr]
    plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalD  #减少y偏移
    for key in secondDict.keys():
        if type(secondDict[key]).__name__ == 'dict':
            plotTree(secondDict[key], cntrPt, str(key))
        else:
            plotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalW
            plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
            plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))
    plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD
 
# 绘制决策树
def createPlot(inTree):
    fig = plt.figure(1, facecolor='white')                          # 创建一个新图形
    fig.clf()                                                       # 清空绘图区
    font = {
    
    'family': 'MicroSoft YaHei'}
    matplotlib.rc("font", **font)
    axprops = dict(xticks=[], yticks=[])
    createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)
    plotTree.totalW = float(getNumLeafs(inTree))
    plotTree.totalD = float(getTreeDepth(inTree))
    plotTree.xOff = -0.5 / plotTree.totalW;
    plotTree.yOff = 1.0;
    plotTree(inTree, (0.5, 1.0), '')
    plt.show()
# 定义文本框和箭头格式
decisionNode = dict(boxstyle="square", fc="0.8")  #boxstyle文本框样式、fc=”0.8” 是颜色深度
leafNode = dict(boxstyle="round4", fc="0.8")      #叶子节点
arrow_args = dict(arrowstyle="<-")                #定义箭头
 
# 绘制带箭头的注解
def plotNode(nodeTxt, centerPt, parentPt, nodeType):
    createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',xytext=centerPt,
                            textcoords='axes fraction',va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)
def createDataSet():
    dataSet = [[3, 2, 1,'yes'],                        #数据集
            [3, 1, 1, 'yes'],
            [3, 0, 1, 'no'],
            [3, 2, 0, 'yes'],
            [3, 1, 0, 'yes'],
            [2, 2, 1, 'yes'],
            [2, 2, 0, 'yes'],
            [2, 1, 0, 'no'],
            [2, 1, 1, 'yes'],
            [2, 0, 1, 'no'],
            [2, 0, 0, 'no'],
            [1, 0, 0, 'no'],
            [1, 1, 0, 'no'],
            [1, 0, 1, 'no'],
            [2, 3, 1, 'no'],
            [3, 3, 0, 'yes'],
            [1, 2, 0, 'yes'],
            [1, 2, 1, 'yes'],]
 
    # 气温:3热,2适中, 1冷; 1饿, 0不饿
    labels = ['气温', '赖床人数', '饿不饿']        #特征标签
    return dataSet, labels                             #返回数据集和分类属性
 
 
if __name__ == '__main__':
    dataSet, labels = createDataSet()
    myTree = createTree(dataSet, labels)
    print(myTree)
    createPlot(myTree)

The resulting decision tree:

insert image description here

Error resolution:

Error:'dict' object has no attribute 'iteritems'
问题:应该数据集有问题,有重复的了,特征一样,但是标签不同,需要回去修改数据集。

Error:UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xc8 in position 0: invalid continuation byte
将csv读取时设置:encoding="unicode_escape"

Summarize

Advantages and disadvantages

advantage shortcoming
Effect visualization, you can intuitively see the decision-making process The decision tree should not be built too complex, otherwise it is easy to overfit
The computational complexity is not high, the output results are easy to understand, insensitive to missing intermediate values, and can handle irrelevant feature data

Decision tree is a machine learning algorithm that comes from life to life. Everyone has a decision tree of their own, and calculates weights for different conditions through their own division. The weight that is greater is the bias we choose.

Guess you like

Origin blog.csdn.net/weixin_51961968/article/details/127839889