02-22 C4.5 decision tree algorithm

Newer and more full of "machine learning" to update the site, more python, go, data structures and algorithms, reptiles, artificial intelligence teaching waiting for you: https://www.cnblogs.com/nickchen121/

C4.5 decision tree algorithm

In order to solve the problem of decision tree ID3 algorithm, the author of ID3 Quinlan based on its lack of improved ID3 decision tree algorithm. But some may doubt, since a decision tree algorithm called the ID3 algorithm, why not improved version called the ID4 or ID5 it? Because the tree was too hot, the second innovation was to ID4, ID5 are used up, the result of an improved version of another way of ID3 algorithm known as C4, C4 algorithm again later upgrade will have now C4.5 algorithm .

First, the decision tree algorithm C4.5 learning objectives

  1. C4.5 algorithm using discrete continuous eigenvalues
  2. Information gain ratio
  3. C4.5 algorithm using the weighted eigenvalue
  4. C4.5 decision tree algorithm steps
  5. C4.5 decision tree algorithm advantages and disadvantages

Second, the decision tree algorithm C4.5 Detailed

Speaking on a decision tree ID3 algorithm has four disadvantages, and this author is also based on this four drawback improved algorithm, which is now C4.5 algorithms.

Suppose an existing training set \ (D \) , the feature set \ (A \) , the training set has \ (m \) samples, each sample has \ (n-\) features, by which we have a training set talk talk of what had been done to C4.5 algorithm improvements.

Wherein successive discrete values ​​of 2.1

A first drawback of ID3: not take into account the case of continuous value.

Suppose an existing feature \ (F. \) Eigenvalues successive values, in descending order of \ (F_1, F_2, \ ldots, f_m \) , wherein C4.5 algorithm between adjacent sample values \ ( f_i, f_ {i + 1} \) taking the average, a total available \ (m-1 \) th division point, wherein the first \ (J \) th division point may be expressed as
\ [S_j = {\ frac { f_i + f_ {i + 1}}
{2}} \] for \ (m-1 \) th division point to calculate the point as a point binary classification information gain ratio , selection information gain than the maximum point as of the discrete continuous classification of the feature point, the change point referred to as \ (F_T \) , the feature value is smaller than \ (F_T \) a point referred to as \ (c_1 and \) ; characteristic value greater than \ (F_T \) point referred to as \ (c_2 \) , thus achieving a continuous discrete eigenvalues.

2.2 Information gain ratio

A second drawback of ID3: to gain information as the feature set is divided into training data, tend to choose the value present in more characteristics of the problem.

As a standard the information gain values tend to more easily characterized, and therefore may be used as the standard information gain than the split node. Concept information gain ratio has been introduced in the "entropy and information gain," a paper, just to give a formula
\ [g_R (D, A) = {\ frac {g (D, A)} {H_A (D)}} \]
because more of the features of the corresponding feature entropy \ (h_A (D) \) greater than the information gain \ (g_R (D, a) \) will become smaller gain correction information can be readily biased in the value of the issue features more.

2.3 Pruning

A third disadvantage of ID3: not considered fitting problem.

Tree pruning commonly used method to solve overfitting, pruning of specific ideas will "CART tree" to go into detail in an article.

Feature-value weighting 2.4

A fourth disadvantage of ID3: consider the case of containing no missing values ​​features.

Suppose a characteristic \ (F. \) Has two eigenvalues \ (F_1, F_2 \) , previously set deletion \ (F. \) Sample feature \ (D_i \) on characteristic \ (F. \) Eigenvalues the weights are 1, namely \ (f_1 \) and \ (f_2 \) . Suppose \ (2 \) the number of samples corresponding to the eigenvalues is no missing values \ (3 \) and \ (5 \) , wherein the current value \ (f_1, f_2 \) reclassified sample \ (D_i \ ) , the sample \ (D_i \) in \ (F_1 \) weight was adjusted to \ ({\ frac {. 3} {. 8}} \) , \ (F_2 \) weight was adjusted to \ ({\ frac { 8} {5}} \) , i.e., the sample \ (D_i \) features \ (F. \) eigenvalues \ ({\ frac {3} {8}} * f_1 and {\ FRAC {8 {5} * f_2}} \) .

Calculating sample \ (D_i \) characteristic \ (F. \) When the information gain ratio, and calculating \ ({\ frac {3} {8}} * f_1 \) and \ ({\ frac {5} {8 }} * f_2 \) information gain ratio.

Third, the C4.5 decision tree algorithm flow

3.1 input

Suppose an existing training set \ (D \) , the feature set \ (A \) , the threshold value \ (\ Epsilon \) .

3.2 Output

C4.5 decision tree algorithm.

3.3 Process

  1. Initialization information gain threshold \ (\ epsilon \)
  2. If \ (D \) in all samples belong to the same category \ (C_k \) , single-node tree is returned \ (T \) , labeled category \ (C_k \)
  3. If \ (A \) is the empty set, then the single tree node returns \ (T \) , labeled category \ (D \) in the maximum number of samples in the class \ (C_k \)
  4. Calculation \ (A \) of the respective characteristics of the output \ (D \) the information gain ratio , selection information gain than the maximum \ (A_g \)
  5. If \ (A_g \) is less than the threshold value \ (\ \ Epsilon) , the single nodes return \ (T \) , labeled category \ (D \) in the maximum number of samples in the class \ (C_k \)
  6. If \ (A_g \) is greater than the threshold value \ (\ Epsilon \) , then in accordance with the characteristic \ (A_g \) different values \ (A_ {g_i} \) to \ (D \) is divided into several subsets \ (D_i \ ) , each subset to generate a child node, the child node corresponding to the characteristic value \ (A_ G_i} {\) , the recursive call \ (2-6 \) step, to give the sub-tree \ (T_i \) and returns

Fourth, the advantages and disadvantages of C4.5 decision tree algorithm

4.1 advantage

  1. Theory is clear, simple
  2. strong learning ability

4.2 shortcomings

  1. It can only be used for classification
  2. Since C4.5 algorithm uses the concept of entropy, which generates a decision tree requires a large amount of entropy calculation, and if the feature is a continuous value, further operation is required to sort
  3. Using the model more complex multi-tree structure

V. Summary

The decision tree algorithm C4.5 decision tree ID3 algorithm processes and very different, but a step in the process of decision tree ID3 algorithm has been optimized, all in all, it is this approach a temporary solution, and still can not return to deal with the problem.

Next we will want to reform the significance of a decision tree, currently scikit-learn algorithm integrated learning and use the tree as a target tree, that tree CART algorithm.

Guess you like

Origin www.cnblogs.com/nickchen121/p/11686760.html