vip visible

CART generation

CART binary tree is assumed, the internal node feature value is "YES" and "NO", the left branch is the value "YES" branch right branch is the value "NO" branch. Such a tree is equivalent to half of each feature recursively

A concrete example

Let's look at a specific example. We use "data mining algorithm of decision tree Detailed ten (1)" the data set shown in FIGS. 4-6 as an example, for convenience of later description, we will again listed below:

About overfitting and pruning

Decision tree is prone to over-fitting, which is due to adapt too well to train the data set, but did not perform well on the test data set. This time we either control termination condition to avoid tree branches too small by the threshold, or is formed by the decision tree has been pruned to avoid over-fitting. Another means of overcoming the over-fitting is based on the idea of ​​establishing Bootstrap Random Forest (Random Forest).

If the property is divided by marital status, marital status attribute has three possible values ​​{married, single, divorced}, are calculated after the divided

Married} {| {SINGLE, Divorced}
{SINGLE} | {Married, Divorced}
{Divorced} | {SINGLE, Married}
the Gini coefficient gain.
When the packet is {married} | when {single, divorced}, SlSl marital status indicates the value of packet married, SrSr marital status value is represented by single or divorced packet
Δ {marital status} = 0.42-410 × 0-610 × [l- (36) 2- (36) 2] = 0.12
[Delta] {marital status} = 0.42-410 × 0-610 × [1- (36) 2- (36) 2] = 0.12

When the packet is a {single} | Married when {,} Divorced,
[Delta] = {marital status 0.42-410 × 0.5-610 ×} [l- (16) 2- (56 is) 2] = 0.053
[Delta] = {0.42} marital status -410 × 0.5-610 × [1- (16 ) 2- (56) 2] = 0.053

When the packet is {divorced} | SINGLE when {,} Married,
[Delta] {

Guess you like

Origin blog.csdn.net/cpongo2/article/details/103400668