Gini value and Gini index

The CART decision tree [Breiman et al., 1984] uses the "Gini index" to select partition attributes. CART is the abbreviation of Classification and Regression Tree, which is a well-known decision tree learning algorithm, available for classification and regression tasks.

1. Gini value and Gini index

Gini value Gini (D): The probability that two samples are randomly selected from the data set D and their category labels are inconsistent. Therefore, the smaller the Gini(D) value, the higher the purity of the data set D.

The purity of data set D can be measured by the Gini value:

                                                            

among them:

Gini index Gini_index (D): Generally, the attribute with the smallest Gini coefficient after division is selected as the optimal sub-attribute.

                                                             

2. Case

Please make a decision tree according to the Gini index based on the list in the figure below.

1. Calculate their Gini index for the non-sequential label attributes {whether there is a house, marital status, annual income} in the data set, and take the attribute with the smallest Gini index as the root node attribute of the decision tree.

2. The second big cycle

3. After the above process, the decision tree is constructed, as shown in the figure below:

Now summarize the algorithm flow of CART

while(当前节点"不纯"):
    1.遍历每个变量的每一种分割方式,找到最好的分割点
    2.分割成两个节点N1和N2
end while
每个节点足够“纯”为止

 

 

Guess you like

Origin blog.csdn.net/qq_39197555/article/details/115319647