How to choose Bagging or Boosting during integrated learning

Integrated learning

  • Not a single machine learning algorithm, but construct multiple and combine multiple weak learners to complete the learning task
  • By training several individual learners and using certain combination strategies, a strong learner can be formed
  • It is mainly divided into two types of algorithms : Bagging and Boosting

BaggIng (withdrawn sample)

  • Bagging algorithm weak learners (base learners) have no dependencies and can be generated in parallel;
  • Typical algorithms are random forest

Boosting (boost tree)

  • Generally it is serial learning, after one round of learning, learn more in the next round
  • Typical algorithms are AdaBoosting, XGBoosting, LightGBM, GBDT

When we train a model, both deviation and variance must be taken care of.
For the Bagging type algorithm , because we consider sampling every time when sampling, and will train many different classifiers in parallel (parallel learning), itsThe main purpose is to reduce variance, So when more independent base learners are selected, the variance is very small,But this algorithm will lead to relatively high biasTherefore, our goal for each base learner is to reduce their deviations, so we use a decision tree with a deep depth or even no pruning (increasing the degree of fit).
For the Boosting type algorithm , it does not change every training set and generally learns from all samples. Each step is based on the previous round in order to better fit the original data (serial learning, learn wrong in the next Rounds to increase the weight and learn more), so this type of algorithmGenerally, there can be a lower deviation, but it also leads to a higher variance, and it is also prone to overfitting; Therefore, for Boosting-type algorithms, we need to consider how to choose a classifier with a smaller variance, that is, a simpler classifier (a simple classifier learns less, the variance is small, and it is not easy to overfit), so the general choice is very deep Shallow decision tree.
Therefore, according to different application scenarios and business requirements, we need to determine which type of algorithm to use. For example, for requirements such as bank reconciliation and high requirements for model accuracy, the Boosting algorithm is often considered. For general demand forecasting, which requires small variance fluctuations, the Bagging algorithm can be considered.

Insert picture description here
Boosting vs. Bagging

  1. Sample selection
    Boosting: The training set of each round is unchanged, but the weight of each sample in the training set changes in the classifier, and the weight value is adjusted according to the previous round of classification results.
    Bagging: The training set is selected with replacement in the original set. Each round of training sets selected from the original set is independent
  2. Sample weights
    Boosting: The sample weights are constantly adjusted according to the error rate. The larger the error rate, the greater the weight. Therefore, the boosting classification accuracy is better than Bagging (when learning is wrong, learn more, so the greater the weight, the higher the accuracy )
    Bagging : Use uniform sampling with equal weight for each sample
  3. Prediction function
    Boosting: each weak classifier has a corresponding weight, and for classifiers with small classification errors, greater weight
    Bagging: all prediction functions have equal weights
  4. Parallel Computing
    Boosting: each function is generated sequentially, because the latter model parameter needs to consider the previous round model results.
    Bagging: each function can be generated in parallel, such a very time-consuming learning method for neural networks, Bagging can save a lot of time in parallel training

The difference and connection between the two:

  • Both Bagging and Boosting can effectively improve the classification accuracy. In most data sets, Boosting accuracy is higher than Bagging
  • In some data sets, boosting will cause degradation due to overfitting

Integrated learning integration strategy:

  1. Voting method
  2. Average method
  3. Learning method: stacking (stacking), that is, the result of the weak learner of the training set is used as the input, and the output of the training set is used as the output, and a trainer is retrained to obtain the final result.

References
https://zhuanlan.zhihu.com/p/33700459

Published 69 original articles · praised 11 · 20,000+ views

Guess you like

Origin blog.csdn.net/weixin_41636030/article/details/101644325