Let's take the example of whether to buy a house or not to introduce the use of the decision tree algorithm. The data set is as follows (for demonstration only, does not represent the real situation)
Lot |
near subway |
area |
Unit price (10,000) |
whether to buy |
three rings |
Yes |
60 |
8 |
Yes |
three rings |
Yes |
80 |
8 |
no |
three rings |
no |
60 |
7 |
Yes |
three rings |
no |
80 |
7 |
no |
five rings |
Yes |
60 |
7 |
Yes |
five rings |
Yes |
80 |
7 |
no |
five rings |
no |
60 |
6 |
Yes |
five rings |
no |
80 |
6 |
Yes |
Six Rings |
Yes |
60 |
6 |
Yes |
Six Rings |
Yes |
80 |
5.5 |
Yes |
Six Rings |
no |
60 |
5 |
no |
Six Rings |
no |
80 |
5 |
no |
From the above table, we can see that there are 7 quantities that can be purchased, and 5 quantities that cannot be purchased, for a total of 12. According to the calculation formula of information entropy, we can obtain the information entropy of this data set as:
According to the lot (represented by A1), the third ring (D1), the fifth ring (D2), and the sixth ring (D3) are used to calculate the information gain
According to whether it is close to the subway (represented by A2), yes (D1), no (D2), to calculate the information gain
According to the area (represented by A3), 60 square meters (D1), 80 square meters (D2), to calculate the information gain
According to the unit price (represented by A4), 5w(D1), 5.5w(D2), 6w(D3), 7w(D4), 8w(D5) , to calculate the information gain
From the above results, we can know that the reduction of information entropy (that is , the weight of the determinant factor for people to decide whether to buy a house ) from high to low are: unit price, area, location, and whether it is close to the subway .
The above algorithm is the logic used by the ID3 algorithm in the decision tree algorithm.
Note: The quantity is only used as test data for demonstration and does not represent the basis for real decision-making.
Pay attention to the WeChat public account "Kicked the Scholar" to get more articles on artificial intelligence technology