AI Machine Learning - Decision Tree Algorithm - Concept and Learning Process

1. Concept

A decision tree is the process of classifying data by a series of rules , which provides a rule-like method of what values ​​will be obtained under what conditions. Decision trees are divided into classification trees and regression trees . Classification trees are used as decision trees for discrete variables , and regression trees are used as decision trees for continuous variables .

A classification decision tree model is a tree structure that describes the classification of instances. A decision tree consists of nodes and directed edges. There are two types of nodes: internal nodes and leaf nodes, where internal nodes represent a feature or attribute, and leaf nodes represent a class.

Intuitively, a decision tree classifier is like a flow chart composed of a judgment module and a termination block, and the termination block represents the classification result (that is, the leaves of the tree). The judgment module represents the judgment on the value of a feature (the feature has several values, and the judgment module has several branches).

Quoting a chestnut on the Internet:

Mother: Let me introduce you to someone.

Daughter: How old are you?

Mother: 26.

Daughter: Is he handsome?

Mother: Very handsome.

Daughter: Is the income high?

Mother: Not very high, moderate condition.

Daughter: Is it a civil servant?

Mother: Yes, I work in the tax office.

Daughter: All right, I'll go see you.

Represented by a decision tree:

clip_image001

As a code farmer, I often keep typing if, else if, else. In fact, I have already used the idea of ​​decision tree. But have you ever thought that with so many conditions, which conditional feature is used to do the if first, and which conditional feature to do the if after? How to accurately quantitatively select this criterion is the key to the decision tree machine learning algorithm.


2. The learning process of decision tree


The generation process of a decision tree is mainly divided into the following three parts:

Feature selection :

Feature selection refers to selecting a feature from the many features in the training data as the splitting criterion of the current node. There are many different quantitative evaluation criteria for how to select a feature, thereby deriving different decision tree algorithms.

1. Why do feature selection

With a limited number of samples, designing a classifier with a large number of features is too computationally expensive and has poor classification performance.

2. The exact meaning of feature selection

The samples in the high-dimensional space are converted to the low-dimensional space by mapping or transformation to achieve the purpose of dimensionality reduction, and then redundant and irrelevant features are removed through feature selection to further reduce the dimensionality.

3. The principle of feature selection

Obtain the smallest possible feature subset without significantly reducing the classification accuracy, without affecting the class distribution, and the feature subset should be stable and adaptable.

Decision tree generation :

According to the selected feature evaluation criteria, the child nodes are recursively generated from top to bottom, and the decision tree stops growing until the data set is inseparable. In terms of tree structure, recursive structure is the easiest way to understand.

Pruning :

Due to the characteristics of the decision tree algorithm, it is easy to over-subdivide the learning of the features, resulting in inaccurate classification. For example, a special feature is used as the criterion for judging the category, so that the data that does not have a special attribute is divided into outside this category. This situation is called overfitting. The literal translation of overfitting in English is over-matching, that is, the matching is too refined and a little too much. To solve this problem, it is necessary to simplify the decision tree and remove some features that are too detailed. The reflection in the tree structure is to remove some branches. The term is called pruning. There are two types of pruning techniques: pre-pruning and post-pruning.


For more articles, pay attention to the WeChat public account "Kicked Schoolmaster"

qrcode_for_gh_be6c076f1d26_258

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324601536&siteId=291194637