GBDT notes

GBDT notes

GBDT is a Boosting algorithm, we talk about lifting scheme familiar with Adaboost, it AdaBoost algorithm and different;

Differences are as follows: the AdaBoost algorithm using the error before one weak learner to update the weight values weight the sample, and then a round of iteration;

GBDT is iterative, but GBDT weak learner must be requested CART model,

And GBDT in model training when it is required as small as possible sample loss predicted by the model.

 

GBDT consists of three parts:

DT(Regression Decistion Tree)、GB(Gradient Boosting)和Shrinkage(衰减)

GBD is composed of multiple decision trees, trees add up the results of all the final result is the difference between the iterative random forest tree: Random Forest drawn using different samples of the different constructs subtree that is constructed and m tree m-1 before the tree is a result there is no relationship between iteration constructs a decision tree when the tree, the build results subtree formed prior to use of residuals as the next input data to build a sub-tree; and when the final prediction according subtree Construction of the predicted sequence (serial), and adding the prediction result.

Training process

We hope to keep the loss of function is reduced, and can be reduced as quickly as possible. 1. Let loss function drops along the gradient direction. This is gb. Lifting the residual tree approximation algorithm to fit a regression tree 2. loss function using a negative gradient the value of the current model as a regression problem. This is dt. 3. round this training when they are able to make to reduce the loss of function as quickly as possible, as quickly as possible to achieve the convergence of local optima or global optimal solution.

Feature Selection

gbdt selection feature detail is actually CART tree generation process. gbdt weak classifier is selected by default CART tree. In fact, you may need to select a weak classifier (provided that the high and low variance bias boosting subject frame to frame.) The CART tree (a binary tree) to generate:

CART tree generation process is actually a feature selection process

  1. Suppose we are currently a total of M feature.

  2. Select a feature j, the binary tree as the first node (Gini coefficient metric is selected).

  3. J feature value selecting a segmentation point m.

  4. Wherein a sample value of j is less than m is divided into a class, if m is greater than another category is divided, so they build a CART tree node.

  5. Other generation process nodes and this is the same iterative generation.

For each round of selection time, how to choose this feature j, j, and how to choose the characteristics of cut points m, the original gbdt approach has been very violent, first of all through each feature, and then iterate through it all possible cut for each feature points, to find the optimal cut point j wherein m is optimal.

Algorithm theory

A plurality of training samples (X1, Y1) is first given input vector X and output variable Y composition, (X2, Y2) ...... ( Xn, Yn), the goal is to find the approximate function F (X), so that the loss function L (Y, F (X) ) the minimum value of the loss. L general loss function using the least squares or the absolute value of the loss function loss function:

 

The optimal solution:

 

The final model assumes that F (x) is an optimal set of basis functions f (x) weighted sum:

 

The idea of ​​using the greedy algorithm results from the expansion Fm (X), find the optimal f:

 

However, difficulties still greedy method each time selecting an optimal basis functions f, using a gradient descent method of approximate calculation

Given a constant function F0 (X):

 

According solving gradient descent learning rate:

 

Using the data (x_i, α_im) (i = 1 ...... n) is calculated to find a fit residuals CART regression trees, to give the m-tree:

 

Update the model:

 

Drawback: GBDT in sklearn the execution speed of the slowest

from sklearn.ensemble Import GradientBoostingRegressor 
# use AdaBoostRegressor; GBDT model CART models support only 
gbdt = GradientBoostingRegressor (n_estimators = 100, learning_rate = 0.01, random_state = 14 ) 
gbdt.fit (x_train, y_train) 
Print ( " the training set R ^ 2 :.% 5F " % gbdt.score (x_train, y_train))
 Print ( " test set 2 ^ R & lt:.% 5F " % gbdt.score (x_test, android.permission.FACTOR.))

Guess you like

Origin www.cnblogs.com/TimVerion/p/11315698.html