Summary of machine learning algorithms 5: Decision Tree

A decision tree is a basic classification and regression methods. In classification problems, it can be considered if-then rules collection can also be considered to be defined in a feature space and the space of the class conditional probability distribution .
Decision Tree Learning involves three steps: feature selection, generation and pruning of the decision tree of the tree, the algorithm used ID3, C4.5, and the CART .
Decision tree definition: classification tree model is a description of the classification tree instance. The tree nodes and directional edges composition, there are two types of nodes: internal node and leaf node, an internal node represents a feature or property, leaf node represents a class.
Selecting the optimal tree learning algorithm is a recursive characteristic, and the training data is divided based on the feature, such that there is a best classification process for each sub-data set. Decision tree learning strategy is the loss of function of the objective function is minimized.
Tree pruning is to prevent over-fitting, improve the generalization ability of the decision tree. Generating a decision tree corresponding to the local optimal choice model, decision tree pruning corresponding to the global optimum model selection.
1. Feature Selection
characterized in that the selected features having selected classification ability of the training data, feature selection criteria are typically information gain or gain ratio information.
Information gain :
Entropy is a measure of the uncertainty of a random variable.
Here Insert Picture Description
Definition of the number 2 to the bottom or base e (natural logarithm), the units are bits of entropy (bit) or Nath (nat). The larger the entropy, the uncertainty of a random variable is greater.
The conditional entropy H (Y | X) denotes a random variable Y at conditions which are known then the uncertainty of variable X is defined as follows:
Here Insert Picture Description
When the entropy and the conditional entropy estimation data (eg: maximum likelihood estimation) is obtained, corresponding with the entropy the entropy and the conditional entropy empirical test called the conditional entropy.
Information that indicates information gain characteristic such that the degree of class X, Y uncertainty of information is reduced.
Here Insert Picture Description
The entropy H (Y) and the conditional entropy H | difference (Y X) is called the
mutual information
. (Note: the mutual information feature selection of a common method)
Clearly, information gain characteristic having a large capacity of more classification.
Here Insert Picture Description
Here Insert Picture Description
Information gain ratio :
Information gain more difficult than to solve the problem of classification, that is, when a large set of training data experience when entropy, information gain value is too large; on the contrary, the information gain value is too small.
Here Insert Picture Description
2. The generated decision tree
generally used information of the maximum gain, the maximum gain ratio information or the Gini index as the minimum feature selection criterion.
ID3 algorithm :
the core of ID3 information gain criterion is applied to select features on the tree each node, a decision tree recursively.
Here Insert Picture Description
Here Insert Picture Description
C4.5 algorithm :
C4.5 during generation, by selecting feature information gain ratio.
Here Insert Picture Description
3. The decision tree pruning
process in the generated decision tree learning is referred to as a simplified tree pruning. In particular, the pruning cut is generated from the tree or subtrees leaf nodes, and root node as the new parent node or leaf node, thereby simplifying the classification tree model.
Decision Tree pruning tree is achieved by minimizing the overall cost function or loss of function.
Decision tree loss function:
Here Insert Picture Description
Here Insert Picture Description
the parameter a> = 0 is balanced with the model and the degree of complexity of the model fit the training data.
Here Insert Picture Description
Here Insert Picture Description
CART algorithm :
Classification and Regression (Classification and regression tree, CART) model may be used for classification can also be used to return. CART is input and output at a given random variable Y under conditions of a random variable X conditional probability distribution of the learning method, which generates a binary decision tree.
Decision tree generation process is to build a binary decision tree recursive, using regression trees minimize square error criterion, the classification tree using the Gini index minimization criterion, feature selection, generating a binary tree.
(1) generating a regression trees
Here Insert Picture Description
(2) Classification tree generating
Gini index is defined as follows:
Here Insert Picture Description
Here Insert Picture Description
the larger the Gini index, the greater the uncertainty in the sample set.
Here Insert Picture Description
Here Insert Picture Description
CART Pruning :
Here Insert Picture Description
Decision tree advantages and disadvantages:
Here Insert Picture Description

Reference blog

Published 10 original articles · won praise 12 · views 292

Guess you like

Origin blog.csdn.net/qq_35946628/article/details/104449067