04-06 gradient boosting tree

Newer and more full of "machine learning" to update the site, more python, go, data structures and algorithms, reptiles, artificial intelligence teaching waiting for you: https://www.cnblogs.com/nickchen121/

Gradient boosting tree

One of gradient boosting tree (gradien boosting decision tree, GBDT) is widely used in industry, are among the most popular, most practical algorithm, gradient boosting tree can be seen as boosting tree optimized version.

A gradient boosting tree learning objectives

  1. Gradient boosting tree and boosting tree
  2. Return flow gradient boosting tree
  3. Advantages and disadvantages of gradient boosting tree

Second, the gradient boosting tree Explanation

2.1 gradient boosting tree and boosting tree

Gradient boosting tree is actually processes and boosting tree similar, but in terms of loss of function fitting, boosting tree loss function fitted by the loss of the square, while the gradient boosting tree is to use the negative gradient of the loss function to fit an approximation of the current round loss and then fitting a regression tree.

The first \ (T \) round \ (I \) negative gradient of the loss function of the sample is represented as
\ [r_ {ti} = - {[\ frac {\ partial {L (y_i, f (x_i))}} { \ partial {f (x_i)}
}]} _ {f (x) = f_ {t-1} (x)} \] using \ ((x_i, r_ {ti }), \ quad (i = 1,2 , \ cdots, m) \) , we can fit a regression CART tree, to give the first \ (T \) trees, regression trees, it is a leaf node corresponding to area \ (R_ {tj}, \ quad (j = 1,2, \ cdots, J) \) , where \ (J \) is the leaf nodes.

Third, the return flow gradient boosting tree

3.1 input

There \ (m \) data \ (n-\) features a training data set \ (T = \ {(X_, Y_1), (x_2, Y_2), \ cdots, (x_m, Y_M) \} \) , loss of function \ (L (Y, F (X)) \) .

3.2 Output

Regression Tree \ (\ Hat {F (X)} \) .

3.3 Process

  1. Initialization
    \ [f_0 (x) = \ underbrace {arg \, min} _c \ sum_ {i = 1} ^ mL (y_i, c) \]
  2. \(i=1,2,\cdots,m\)
    1. \(i=1,2,\cdots,m\),计算
      \[ r_{mi}=-{[\frac{\partial{L(y_i,f(x_i))}}{\partial{f(x_i)}}]}_{f(x)=f_{m-1}(x)} \]
    2. For \ (r_ {mi} \) fitting a regression tree, to obtain a first \ (m \) leaf nodes of the tree area \ (R_ {mi}, \ quad {j = 1,2, \ cdots, J} \ )
    3. \(j=1,2,\cdots,J\),计算
      \[ c_{mj} = \underbrace{arg\,min}_c\sum_{x_i\in{R_{mj}}}L(y_i,f_{m-1}(x_i)+c) \]
    4. 更新
      \[ f_m(x)=f_{m-1}(x)+\sum_{j=1}^Jc_{mj}I(x\in{R_{mj}}) \]
  3. The regression tree
    \ [\ hat {f (x )} = f_M (x) = \ sum_ {i = 1} ^ M \ sum_ {j = 1} ^ Jc_ {mj} I (x \ in {R_ {mj} }) \]

Fourth, the advantages and disadvantages of gradient boosting tree

4.1 advantage

  1. Compared SVM, parameter adjustment less time you can get a pretty good model accuracy
  2. Compared to enhance tree, using a negative gradient as a function of loss in lift-tree algorithm approximation residual values, i.e. for classification and regression problems found a way to fit a general loss in value of the current model error of

4.2 shortcomings

  1. Since there is a dependency between weak learners, can not support parallel

V. Summary

Gradient boosting tree although to some extent solves the upgrade tree for classification and regression problems the use of different loss function, and the use of negative gradient loss as a function of the difference between tree algorithm to enhance the residual approximations to enhance the accuracy of the model, but he can not do parallel. XgBoost described below as GBDT upgraded version will solve this problem.

Guess you like

Origin www.cnblogs.com/nickchen121/p/11686763.html