Comparison of ID3, C4.5, cart decision tree

(1) Official:

 

(2) ID3 algorithm

Disadvantages:

  •  The ID3 algorithm uses information gain as the evaluation criterion when selecting the branch attributes in the root node and each internal node . The disadvantage of information gain is that it tends to choose attributes with more values. In some cases, such attributes may not provide much valuable information.
  • ​ The ID3 algorithm can only construct a decision tree for data sets whose description attributes are discrete attributes .

(3) C4.5 algorithm

    Improvements made (why it is better to use C4.5)

  • ​ Use information gain rate to select attributes
  • ​ Can handle continuous numeric attributes
  • ​ A post-pruning method is used
  • ​ Treatment of missing values

Advantages and disadvantages of the C4.5 algorithm

​ Advantages:

  • ​ The generated classification rules are easy to understand and have a high accuracy rate.

​ Disadvantages:

  • ​ In the process of constructing the tree, the data set needs to be scanned and sorted multiple times, which leads to the inefficiency of the algorithm.
  • ​ In addition, C4.5 is only suitable for data sets that can reside in memory, and the program cannot run when the training set is too large to fit in the memory.

(4) CART algorithm

Compared with the classification method of the C4.5 algorithm, the CART algorithm uses a simplified binary tree model, and the feature selection uses an approximate Gini coefficient to simplify the calculation. C4.5 is not necessarily a binary tree, but CART must be a binary tree.

(5) How to evaluate the quality of the split point

If a split point can divide all current nodes into two categories, so that each category is "pure", that is, there are more records of the same category, then it is a good split point.

 

 

 

Guess you like

Origin blog.csdn.net/qq_39197555/article/details/115321798