Before introduction of the model tree belongs to the weak learners, the algorithm itself is relatively simple, but after the merger and integration algorithm will produce better results. For example: + bagging = random forest tree; + boosting = tree boosting tree
here briefly describes the integration algorithm, and then random forest trees do enhance described.
Integrated algorithms
Integrated algorithms: by more models combined to solve practical problems.
A plurality of integrated model becomes a model called the Integrated evaluator for each model, the composition of the evaluator is called an integrated group evaluator .
The "combination" of this embodiment mainly has the following two: bagging (on Bagging), lifting method (Boosting)
on Bagging and Boosting for individual learner same type, such integration is homogeneous ; and for different types of individual learner, such integration is heterogeneous .
According to the homogeneity of the weak learners if there is a dependency between homogeneity can be integrated classification categories:
- Exists between the weak learner strong dependency, a series of weak learner basically required Serial generated (the weak learner is a result of the processing of weak learners) representative algorithm is a series of Boosting algorithms;
- No difference between the weak learner strong dependencies can be a series of weak learner generated in parallel , (weak learner co-treatment, independent of each other) algorithm is representative of Bagging family of algorithms.
Bagging
By random sampling data back there is , different data will produce different models, T T will produce a data model, and then through some binding policy obtained results of these binding models.
About 1/3 of the data is not checked, the random forest (random forests have OOB), it can be tested, no separate cross-validation set is divided.
Integration Strategy
Average method
Whichever is the mean as output.
Voting Act
Majority.
Learning
stacking: the M primary learner as the new data, and then placed in a new model (secondary learner) in learning, the final result obtained. The results of such random forest into linear regression, multiple models superimposed to obtain a final result.
Boosting
Bagging between the weak learner is no dependency, and there is a dependency relationship between Boosting weak learner, so that weak learner is generated serially, using the results of the previous step.
First with the weak learners to train, the results were compared with the true value of the prediction error values given greater weight, and then placed under a weak learners in training, in order to reciprocate.
Bagging and Boosting compare
On Bagging : small variance (strong generalization), large deviations (accuracy weak learners not), so in order to improve the fitting ability of the weak learner reduce flicker, select a depth larger tree (tree large depth It is over-fitting, just make up a small deviation characteristic).
On Boosting : large variance (generalization weak), small deviations (accuracy weak learners may be), so fitting in order to reduce weak learning variance reduction, decision trees to select a smaller depth (the depth of small depth relative to the the larger is less fit, just to make up a large dispersion characteristic).
When this random forests and GBDT parameter adjustment, the magnitude of the depth of the decision tree can also be achieved .
Random Forests
Bagging is a decision tree and random forest product of the combination. But the addition of some features of its own:
- Bagging compared to only data of the number of random sampling, random forests has increased random sampling features.
- Decision tree is used in CART, because the feature is randomly selected (model into a number less than the number of features characteristic data of the original), and therefore random forest CART tree than normal CART smaller scale, the training set and test set similar results, robust better.
Random Forest algorithm flow is :
where the tree pruning is not due to a large deviation of the characteristics of bagging decision.
For the results of the regression using the mean; classify a majority vote.
Random Forests advantages and disadvantages
1. Parallel Computing, speed
2. Based on the CART tree, can also be classified regression
3. Since the feature is randomly selected, and therefore not all the extracted feature amount, solve the problem of high dimensional feature, additional over-fitting can be prevented . But randomly selected characteristics of both advantages and disadvantages, it will affect the results.
AdaBoost
AdaBoost algorithm: a basic pre-classifier misclassified samples will be strengthened, the entire sample is weighted again to basic training under a classifier. While adding a new weak classifiers in each round, until a predefined sufficiently small error rate or a predetermined maximum number of iterations reached .
- Additive model: the final number of the strong classifier is a weighted average weak classifiers.
- Forward distribution algorithm: After updating the results of the algorithm is to get through a round of weak learners, using the previous round of weak learners a training weights weak learner weights.
Process AdaBoost algorithm:
1. How to calculate an error rate of
2 weak learner weight coefficient [alpha]
3. Update sample weights D
4. Which combination strategy
Adaboost loss function
The k-1 wheel and the k-th wheel strong learner as:
can be obtained:
Since the classification of Adaboost loss function is an exponential function (exponential function may be clear that the data classification correct or error condition), so its loss function as
I (x = y) is the indicator function.
Adaboost classification algorithm
3. Calculate the weak classifiers Gk (x) classification error rate on the training set;
4. weak classifiers of k Gk (x) is the weighting factor:
weight coefficient binary classification, the classification can be seen that if the larger the error ratio e, the right weak classifiers corresponding to the smaller weight coefficient α, i.e., the greater the error rate small weak classifier weight coefficient.
5. Update training data weights
6. Integration Strategy
Adaboost regression algorithm
5. Calculate the k-th low error rate and weights regressor weighting coefficients
above is the people detailed Formula. For AdaBoost to remember:
- Model: additive model, i.e. a plurality of weak classifiers obtained by weighted addition of the final result.
- The objective function is an exponential function.
- Learning algorithm is a forward stepwise algorithm, that is right after a weak learner data before re-training through a weak learner update.
Rights related to the weight of the weak classifiers where a (calculated according to the error rate), there is misclassification of sample weights. Category: error rate of return: the mean square error.
AdaBoost algorithm advantages and disadvantages
Boosting tree
Addition to enhance the tree is a CART model tree model, fit residuals continue to approach the true value.
for example:
Upgrade tree algorithm
Here empirical risk minimization, for different problems, different loss functions. Regression using squared error loss function, classification exponential loss function. General decision problem using a general loss function.
Regression trees enhance following step prior to the algorithm:
For regression problems, according to the true value and the last tree squared error is minimized, M Solution tree parameter θ. Finally Simplifying equation obtained (rT) ^ 2 is minimized, i.e., the last tree before fitting a tree residual minimized.
Return flow boosting tree
Boosting tree and AdaBoost algorithm
AdaBoost algorithm using a forward stepwise algorithm, i.e., by the front a weak learners error rate update training data weights;
is a forward stepwise algorithm, the weak learner only tree model boosting tree used (typically CART )
Enhance the advantages and disadvantages of the tree
For regression trees upgrade, simply fit residuals of the current model .
Gradient boosting tree
GBRT is suitable for having a method to enhance any order derivative of the loss function.
Gradient upgrade
Boosting tree using the additive model to optimize the learning process step by step algorithm ago. When the loss function is quadratic loss and loss index function, the optimization of each step is very simple. But for the general loss function is often not so easy to optimize each step. General loss function
to this problem, proposed Friedman gradient lifting tree algorithm , which is a method using an approximate steepest descent, the key is the use of the negative gradient of the loss function of lifting the residual tree algorithm approximation .
GBDT algorithm principle
GBDT regression case
Data show :
parameter settings :
1. Initialize the weak learner :
2. 1,2 iteration count = m, ..., M :
The first step in the initialization weak learner, the residual value is calculated to be fitted.
Since the maximum depth is set to 3, the depth of the tree is now only 2, you need to make another division, the division should be divided left and right two nodes, respectively.
Calculating a first residual tree prior to fitting results.
Calculated predictive value of the tree.
Learning Rate :
Because the number of iterations is 5, to generate five different trees.
These five trees to integrate, get the final model
GBDT classification computing Case
Since the removal of machine learning algorithms GBDT Interview Highlights Summary - Part I
GBDT difference and improve tree alternate gradient residuals, and each group has learning parameters corresponding weights .
Advantages and disadvantages of gradient boosting tree
For this shortcoming can be used Xgboost to resolve.
references
https://blog.csdn.net/weixin_46032351/article/list/3
https://weizhixiaoyi.com/category/jqxx/2/
https://blog.csdn.net/u012151283/article/details/77622609
https://blog.csdn.net/zpalyq110/article/details/79527653