Data mining: model selection - Algorithms and Integrated tree model

Before introduction of the model tree belongs to the weak learners, the algorithm itself is relatively simple, but after the merger and integration algorithm will produce better results. For example: + bagging = random forest tree; + boosting = tree boosting tree
here briefly describes the integration algorithm, and then random forest trees do enhance described.

Integrated algorithms

Integrated algorithms: by more models combined to solve practical problems.
A plurality of integrated model becomes a model called the Integrated evaluator for each model, the composition of the evaluator is called an integrated group evaluator .
Here Insert Picture Description
The "combination" of this embodiment mainly has the following two: bagging (on Bagging), lifting method (Boosting)
on Bagging and Boosting for individual learner same type, such integration is homogeneous ; and for different types of individual learner, such integration is heterogeneous .

According to the homogeneity of the weak learners if there is a dependency between homogeneity can be integrated classification categories:

  • Exists between the weak learner strong dependency, a series of weak learner basically required Serial generated (the weak learner is a result of the processing of weak learners) representative algorithm is a series of Boosting algorithms;
  • No difference between the weak learner strong dependencies can be a series of weak learner generated in parallel , (weak learner co-treatment, independent of each other) algorithm is representative of Bagging family of algorithms.

Here Insert Picture Description

Bagging

By random sampling data back there is , different data will produce different models, T T will produce a data model, and then through some binding policy obtained results of these binding models.
Here Insert Picture Description
About 1/3 of the data is not checked, the random forest (random forests have OOB), it can be tested, no separate cross-validation set is divided.
Here Insert Picture Description

Integration Strategy

Average method

Whichever is the mean as output.
Here Insert Picture Description

Voting Act

Majority.
Here Insert Picture Description

Learning

stacking: the M primary learner as the new data, and then placed in a new model (secondary learner) in learning, the final result obtained. The results of such random forest into linear regression, multiple models superimposed to obtain a final result.
Here Insert Picture Description

Boosting

Bagging between the weak learner is no dependency, and there is a dependency relationship between Boosting weak learner, so that weak learner is generated serially, using the results of the previous step.
First with the weak learners to train, the results were compared with the true value of the prediction error values given greater weight, and then placed under a weak learners in training, in order to reciprocate.
Here Insert Picture Description

Bagging and Boosting compare

On Bagging : small variance (strong generalization), large deviations (accuracy weak learners not), so in order to improve the fitting ability of the weak learner reduce flicker, select a depth larger tree (tree large depth It is over-fitting, just make up a small deviation characteristic).
On Boosting : large variance (generalization weak), small deviations (accuracy weak learners may be), so fitting in order to reduce weak learning variance reduction, decision trees to select a smaller depth (the depth of small depth relative to the the larger is less fit, just to make up a large dispersion characteristic).
When this random forests and GBDT parameter adjustment, the magnitude of the depth of the decision tree can also be achieved .
Here Insert Picture Description
Here Insert Picture Description

Random Forests

Bagging is a decision tree and random forest product of the combination. But the addition of some features of its own:

  • Bagging compared to only data of the number of random sampling, random forests has increased random sampling features.
  • Decision tree is used in CART, because the feature is randomly selected (model into a number less than the number of features characteristic data of the original), and therefore random forest CART tree than normal CART smaller scale, the training set and test set similar results, robust better.

Random Forest algorithm flow is :
where the tree pruning is not due to a large deviation of the characteristics of bagging decision.
For the results of the regression using the mean; classify a majority vote.
Here Insert Picture Description

Random Forests advantages and disadvantages

1. Parallel Computing, speed
2. Based on the CART tree, can also be classified regression
3. Since the feature is randomly selected, and therefore not all the extracted feature amount, solve the problem of high dimensional feature, additional over-fitting can be prevented . But randomly selected characteristics of both advantages and disadvantages, it will affect the results.
Here Insert Picture Description
Here Insert Picture Description

AdaBoost

AdaBoost algorithm: a basic pre-classifier misclassified samples will be strengthened, the entire sample is weighted again to basic training under a classifier. While adding a new weak classifiers in each round, until a predefined sufficiently small error rate or a predetermined maximum number of iterations reached .

  • Additive model: the final number of the strong classifier is a weighted average weak classifiers.
  • Forward distribution algorithm: After updating the results of the algorithm is to get through a round of weak learners, using the previous round of weak learners a training weights weak learner weights.

Process AdaBoost algorithm:
1. How to calculate an error rate of
2 weak learner weight coefficient [alpha]
3. Update sample weights D
4. Which combination strategy
Here Insert Picture Description

Adaboost loss function

The k-1 wheel and the k-th wheel strong learner as:
Here Insert Picture Description
can be obtained:
Here Insert Picture Description
Since the classification of Adaboost loss function is an exponential function (exponential function may be clear that the data classification correct or error condition), so its loss function as
Here Insert Picture Description
Here Insert Picture Description
I (x = y) is the indicator function.
Here Insert Picture Description
Here Insert Picture Description

Adaboost classification algorithm

Here Insert Picture Description
Here Insert Picture Description
3. Calculate the weak classifiers Gk (x) classification error rate on the training set;
Here Insert Picture Description
4. weak classifiers of k Gk (x) is the weighting factor:
Here Insert Picture Description
weight coefficient binary classification, the classification can be seen that if the larger the error ratio e, the right weak classifiers corresponding to the smaller weight coefficient α, i.e., the greater the error rate small weak classifier weight coefficient.
5. Update training data weights
Here Insert Picture Description
Here Insert Picture Description
6. Integration Strategy
Here Insert Picture Description

Adaboost regression algorithm

Here Insert Picture Description
5. Calculate the k-th low error rate and weights regressor weighting coefficients
Here Insert Picture Description Here Insert Picture Description
Here Insert Picture Description
above is the people detailed Formula. For AdaBoost to remember:

  • Model: additive model, i.e. a plurality of weak classifiers obtained by weighted addition of the final result.
  • The objective function is an exponential function.
  • Learning algorithm is a forward stepwise algorithm, that is right after a weak learner data before re-training through a weak learner update.
    Rights related to the weight of the weak classifiers where a (calculated according to the error rate), there is misclassification of sample weights. Category: error rate of return: the mean square error.

AdaBoost algorithm advantages and disadvantages

Here Insert Picture Description

Boosting tree

Addition to enhance the tree is a CART model tree model, fit residuals continue to approach the true value.
for example:
Here Insert Picture Description

Upgrade tree algorithm

Here Insert Picture Description
Here empirical risk minimization, for different problems, different loss functions. Regression using squared error loss function, classification exponential loss function. General decision problem using a general loss function.

Regression trees enhance following step prior to the algorithm:
Here Insert Picture Description
For regression problems, according to the true value and the last tree squared error is minimized, M Solution tree parameter θ. Finally Simplifying equation obtained (rT) ^ 2 is minimized, i.e., the last tree before fitting a tree residual minimized.
Here Insert Picture Description

Return flow boosting tree

Here Insert Picture Description

Boosting tree and AdaBoost algorithm

AdaBoost algorithm using a forward stepwise algorithm, i.e., by the front a weak learners error rate update training data weights;
is a forward stepwise algorithm, the weak learner only tree model boosting tree used (typically CART )

Enhance the advantages and disadvantages of the tree

For regression trees upgrade, simply fit residuals of the current model .
Here Insert Picture Description

Gradient boosting tree

GBRT is suitable for having a method to enhance any order derivative of the loss function.

Gradient upgrade

Boosting tree using the additive model to optimize the learning process step by step algorithm ago. When the loss function is quadratic loss and loss index function, the optimization of each step is very simple. But for the general loss function is often not so easy to optimize each step. General loss function
to this problem, proposed Friedman gradient lifting tree algorithm , which is a method using an approximate steepest descent, the key is the use of the negative gradient of the loss function of lifting the residual tree algorithm approximation .
Here Insert Picture Description

GBDT algorithm principle

Here Insert Picture Description

GBDT regression case

Data show :
Here Insert Picture Description
parameter settings :
Here Insert Picture Description
1. Initialize the weak learner :
Here Insert Picture Description
2. 1,2 iteration count = m, ..., M :
The first step in the initialization weak learner, the residual value is calculated to be fitted.
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Since the maximum depth is set to 3, the depth of the tree is now only 2, you need to make another division, the division should be divided left and right two nodes, respectively.
Here Insert Picture Description
Here Insert Picture Description
Calculating a first residual tree prior to fitting results.
Here Insert Picture Description
Calculated predictive value of the tree.
Here Insert Picture Description
Learning Rate :
Here Insert Picture Description
Because the number of iterations is 5, to generate five different trees.
These five trees to integrate, get the final model
Here Insert Picture Description
Here Insert Picture Description

GBDT classification computing Case

Since the removal of machine learning algorithms GBDT Interview Highlights Summary - Part I
Here Insert Picture Description

GBDT difference and improve tree alternate gradient residuals, and each group has learning parameters corresponding weights .

Advantages and disadvantages of gradient boosting tree

For this shortcoming can be used Xgboost to resolve.
Here Insert Picture Description

references

https://blog.csdn.net/weixin_46032351/article/list/3
https://weizhixiaoyi.com/category/jqxx/2/
https://blog.csdn.net/u012151283/article/details/77622609
https://blog.csdn.net/zpalyq110/article/details/79527653

Published 26 original articles · won praise 29 · views 10000 +

Guess you like

Origin blog.csdn.net/AvenueCyy/article/details/105142680