Difference ID3, C4.5 and CART three kinds of decision tree

The ID3 preference information gain large property to be divided on the sample, but this method of splitting node has a big drawback, when a large number of property values, may sample at a corresponding value of this property is only a or a few months, at which point it will be a very high information gain, ID3 think that this property is suitable to be classified, but poor generalization ability of the model will be called multi-value property practice, so do not use C4.5 information gain as the division basis, instead of using the information gain ratio as a division basis. But still can not completely solve the above problem, but has improved, this time introduced a CART tree, which uses gini coefficient as the basis for the split node.

Guess you like

Origin www.cnblogs.com/pacino12134/p/11221774.html