Decision tree ID3 algorithm principle

This blog is mainly based on the main notes, and the content comes from Mr. Peng Liang's machine learning course.

1. Decision Tree algorithm (Decision Tree):

The decision tree algorithm is a very classic algorithm in machine learning. The decision tree tree is a tree structure similar to a flowchart (as shown in the figure below): in which, each internal node represents a test on an attribute, and each A branch represents an attribute output, while each leaf node represents a class or class distribution, and the top level of the tree is the root node. So how should we construct a decision tree?

 

 

2. Entropy _ _

Before learning how decisions are constructed, learn the concept of entropy .

In 1948 , Shannon proposed the concept of " entropy " to solve the problem of quantitative measurement of information. Information entropy is understood as the probability of occurrence of a certain information, and the calculation formula is as follows:

represents the output probability function. The greater the uncertainty of the variable, the greater the entropy, and the greater the amount of information needed to figure things out.

3. Construction of decision tree ( taking ID3 algorithm as an example )

    The sample below is an example of predicting whether or not to buy a computer:

    

Step 1: Select the root node

    Question: How to choose the root node? Age? Income? Student? Or credit_rating.

Define Information Gain : Gain(A) = Info(D) - Infor_A(D)

    According to Shannon's formula, we can calculate the information of the attribute Age , and the calculation process is as follows

In the same way, we can also calculate Gain(income) = 0.029, Gain(student) = 0.151, Gain(credit_rating)=0.048 , we can know that the attribute Gain(max)=Gain(age) , so choose the first node as the root node.

Step 2: Draw a tree diagram with Age as the root node, as follows:

 

Step 3: Continue to calculate the acquisition amount of each attribute information of the next node and select the attribute corresponding to the largest amount of information acquisition as the next node, and then cycle through, stopping conditions:

  1. : All samples of a given node belong to the same class ( middle_age in the above figure )
  2. : There are no remaining attributes that can be used to further divide the sample, in this case, the majority rule is followed by the minority.

4. Advantages and disadvantages of decision trees

    Pros: Intuitive, easy to understand, effective on small datasets. 

    Disadvantages: It is not good to deal with continuous variables; when there are many categories, the error increases faster; the scale is general.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325521176&siteId=291194637