gbdt interview focus

Introduction:

Belong to boosting ways to integrate learning. boosting heavy weighting method is the use of iterative training base classifiers, each round of training a classifier based, for each round of the sample is weighted, the weights depend on a classification result. Between the base and the linear weighted classifiers obtained last strong classifier.

Algorithmic process

Linear weighted base classifiers and continuously reduce the error generated during the training.

Several rounds of iterations, each iteration produces a weak classifier, the weak classifiers (cart regression trees) are in the last round of the classifier residuals obtained on the basis of. F1 on the assumption that strong learner get round, the loss is a function of L1, then the cart round to find a tree, so that the loss of function of the value of the current round of even more small.

How to choose the characteristics (that is, how to generate cart tree)?

cart regression trees:

Squared error is minimized and, wherein for A = a, the minimum error of the set D, to select the feature value of the most cutting points.

Classification tree:

When using the Gini coefficient Instead of entropy is calculated for the feature A = a, D Gini index set minimum, characterized in that the choice of the most cutting points

How to build a feature?

gbdt itself can not produce features, combination of features is generated. For example generates two sub-tree (cart Regression Trees) with gbdt, a total of five leaf nodes, then a sample is placed into two sub-tree, certainly in one of the two sub-tree leaf node output of the last two output. Leaf nodes so that the output is 1, otherwise 0, then the vector is composed of [0,1,1,0,0], which is the combination of features of the vector, and then the original features of the input together with logistic regression function training, the final effect has been improved, instead of manually set all of the features.

Loss of function fitting?

Negative gradient of the loss function approximation to fit round loss (residual), and then to fit a cart tree.

T of the i-th round samples as negative gradient:

This is a negative gradient calculated for all samples, and then use the (xi, rti) is equivalent to the negative gradient as a label, use this data to fit a cart tree, the tree has been good, and then into ( xi, the residual i) data into the inside of the tree, calculation of the loss function, such that the minimum loss function to obtain an output value (residual c):

So this round decision function (residual value function) came out:

Then this round of decision-making functions together on a classifier, get this round of the strong classifier:

(PS: If a total of t round, so this first round t get strong classifier is a strong classifier final)

How to classify?

Return time:

1, weak classifiers initialization, start of the first round of training;

2, iterative, negative gradient is calculated by fitting a negative gradient cart tree, then such minimum loss function, best fit values ​​(residuals).

3, update the strong classifiers

4, to obtain the final classification of the most strongest!

Each round is multiplicative decision function:

When classification is not the same:

Since the output is discrete categories, it is not directly from the output to fit the category error class output.

Method: (1) using the exponential function loss, it would adaboost; (2) similar to logistic regression logarithmic likelihood function. Predicted probability value category and the real probability value of difference

Loss function logarithmic likelihood function:

Exponential loss function:

advantage:

Flexible various types of data processing;

Scheduling less time and with high accuracy;

Use robust loss function, is very robust to outliers;

 

Guess you like

Origin www.cnblogs.com/pacino12134/p/11110113.html