[] Machine Learning · machine learning algorithm ID3 (Iterative Dichotomiser 3) of the decision tree

@

1. What is a decision tree

Decision tree, is a tree painted the decision node of decision support tools A drawing method to find optimal solutions.
As shown below, from left to right is a simple process using a decision tree, the decision support.
Here Insert Picture Description

2, how to construct a decision tree?

2.1, the basic method

Determined by the prioritization of the different features, preferred features as a high priority partition characteristics. (As shown above, it is assumed priority: Education> Colleges> work experience so we have chosen education as a priority basis for classification, times and the choice of institutions as the basis for classification, and finally chose the project experience as a basis for classification).

So, the next question is, how we determine the characteristics of a priority? Specifically, it is our priority when evaluating a characteristic of what evaluation criteria . The evaluation criteria are very important in the decision tree, an appropriate evaluation criteria, different features can prioritize in a very reasonable manner, so as to be able to build a more perfect tree. The evaluation criteria will result in an inappropriate final decision tree constructed various problems.

2.2 What are the evaluation criteria / characteristics of a quantitative evaluation of how good or bad?

In a different decision tree algorithms, this feature is good or bad evaluation criteria is slightly different. For example, when asked who talked about today oh ID3 algorithm, the evaluation criteria is called information gain (Information Gain) things. In yet another decision tree algorithm C4.5 , the criteria is further changed information gain ratio (Information Gain Ration or Ratio Gain) . Another more mainstream decision tree algorithm CART algorithm is employed Gini coefficient (Gini Index) as a criterion. These things, we will be 11 involved in later articles, the focus in this article, we will focus on the information gain.

Before entering the topic, we need to understand a few simple concepts. We first need to understand the concept called entropy / Entropy . Studied physics high school friends who may have the impression that some of them: entropy, thermodynamic state of matter characterized by one of the parameters, represented by the symbol S, which is the physical meaning of the degree of chaos in the measure.

The concept of entropy was originally derived from thermodynamics, 1948, Shannon introduces information entropy, the probability of which is defined as the occurrence of discrete random events. In information science inside, entropy is a measure of uncertainty. A more orderly system, information entropy lower, whereas a system more confusing, the higher its entropy. Therefore, entropy can be considered a measure of the degree of ordering system. The concept of entropy in thermodynamics and its essence is the same.

This in turn can we talk today and information gain this criteria what does it matter? We might think, when we get a set of information, a set of information similar to the left half of the figure. Wherein a plurality of the set of information (e.g., education, schools, etc.), while results for a comprehensive set of each feature obtained. The figure shows a set of information that only A, B, C, D four instances, if we have thousands of examples, but look at this table consisting of tens of thousands instances, our first experience might be confusing. Yes, so many instances, by virtue of being rude at the same table, it is too confusing. The chaos we have just mentioned into saying: Entropy This set of information is too high. Such a scenario, especially when we want to pass this table to query something, it is very inconvenient. We hope to have some way, at the same time reduce clutter on your system / reduce the entropy of the system, without compromising the integrity of the information. Is there a way to do so? Of course there is a decision tree algorithm we talked about today.

From the left side of the graph, to change the view on the right, we can be very intuitive feel, this set of information not only become less confusing, and this tree representation, more in line with our human thinking habits. Why is there such a difference in the feel of it? In fact, very simple, because we start from the root of the right, each do a feature division, the entropy of the system once it has been reduced. The information system also gives us the feeling more and more standardized. Furthermore, it is easy to gain information leads to the concept: the information that we gain an information systems division to a particular feature, information entropy value overall drop of the system.

Next, we can easily answer of how to judge a good or bad feature: If a feature, compared to all the other features in the information system in accordance with a feature before and after the division, the information entropy can minimize the entire data system (information gain maximum), this feature is a nice feature. We should give priority to use this feature divided system.

2.3 computing, information entropy, information gain

Here, we first give information entropy formula:
Here Insert Picture Description
Knowing the formula for calculating information entropy, information gain formula is also very easy to understand, is that the current system of entropy S, minus the whole system is divided in accordance with a feature , entropy values of each subsystem multiplies each subsystem in the whole system proportion of (Si X Pi), obtained by adding the last. That is: the information gain-S = [Sigma InfoGain (SixPi) .

For example, we have shown below in an information system (the first four of the system as four different characteristics, as a final result.), Then to how to calculate the entropy of the whole system, and a system corresponding to information gain feature of it?
Here Insert Picture Description
First, it can be seen, the results are shown in a total of 14 samples, including nine positive cases and negative cases 5. Then the current information entropy calculation
Here Insert Picture Description
above, we calculated the entropy of the system, that is, the data is not information entropy division. Here we then calculated using a feature of the system is divided, the information entropy of the whole system, and the difference before and after partitioning system entropy (information gain) For example, we classified according to a first feature of Outlook:
Here Insert Picture Description
After partitioning, the data is divided into three parts, then the entropy of each branch ( subsystem information entropy ) is calculated as follows:
Here Insert Picture Description
we use the information entropy for each subsystem, this subsystem is multiplied by the proportion of the entire system are added sequentially obtained after division overall entropy of the system. As shown below:
Here Insert Picture Description
Finally, finally we came to the most exciting step, and calculating the information gain, as shown below:
Here Insert Picture Description

2.4, decision tree construction method

In the previous step, we calculate each feature point information corresponding to the gain, the gain characteristic information selected maximum priority dividing the entire system (the system is divided into a sub-data set / subtree / subsystem).
In each subsystem, we then recalculate the priority has not been used features , and select a priority biggest feature continues to divide the data set constructor trees. Know, the recursive construction sub-tree process is terminated for some reason.

2.4, the structural conditions for termination / when to stop construction sub-tree?

1, when we find a sub-tree, the results are identical, as are all the result of YES, when there is no in the sub-tree continues to go on construction of necessary (other sub-tree construction process is not necessarily terminated). At this point we get the final results of the sub-tree to YES.

2, when we find a sub-tree, although the results are not exactly the same, but has no support features that we continue to construct a sub-tree (in short, all features have been used up), this time we can stop in the sub-tree to continue the construction. In this case, the final result of the sub-tree for the set of subtrees Results column, the highest number of attribute values which occur.

3, the algorithm summary

1, each example of the results is determined whether each sub-tree or whether the same features have been exhausted
  is: the sub-tree structure is completed, the final result of the return of the corresponding subtree.
  No: Continue recursive construction subtree
if a result is NO,:
  1.1, calculates the current subsystem information entropy
  1.2, calculated not been used feature information gain of
  1.3, the maximum information gain selection feature is to be divided, and from remove the characteristic features set
  1.4 the structure divided in 3 sub-tree, and go to step 1.

Guess you like

Origin www.cnblogs.com/jsbia/p/11283275.html