knowledge points
decision tree
Logical perspective: A combination of if else statements
Geometrical perspective: Divide the feature space according to a certain criterion
Ultimate goal: Divide the samples into purer
decision trees The key to learning is how to choose the optimal division attribute. Generally speaking, with the division The process continues, and we hope that the samples contained in the branch nodes of the decision tree belong to the same category as much as possible, that is, the purity of the nodes is getting higher and higher.
The purpose of decision tree learning is to generate a decision tree with strong generalization ability, that is, strong ability to deal with unseen examples, and its basic process follows a simple and intuitive "divide and conquer" strategy. As shown below:
ID3 decision tree
Self-information: (the information contained in a random variable)
Conditional entropy:
Information gain:
Generally speaking, the greater the information gain, the greater the "purity improvement" obtained by using attributes for division. Therefore, we can use information The gain is used to select the partition attribute of the decision tree.
C4.5 Decision tree
In fact, the information gain criterion has a preference for attributes with more possible values. For example, if the sample number is used as a candidate partition attribute, the information gain is 0.998. In order to reduce the possible adverse effects of this preference, the C4.5 decision tree uses the gain rate to select the optimal partition attribute.
But only when the number of possible values is large, when the number of possible values is small, the gain rate will still increase. Therefore, C4.5 adopts a heuristic method: attributes with a first-in-first-out information rate higher than the average level, Then choose the one with the highest gain rate.
CART decision tree
The CART decision tree uses the "Gini index" to select partition attributes.
Note: When constructing the CART decision tree, the optimal division attribute will not be selected strictly according to this formula, mainly because the CART decision tree is a binary tree. If the above formula is used to select the optimal division attribute, no further selection can be made. The optimal split point for the optimal split attribute.
The construction process of CART decision tree - watermelon book example: