Bagging
There are two main categories of ensemble learning algorithms: one is Boosting, the representative algorithm is AdaBoost; the other is Bagging, the random forest introduced in this article is a variant of it.
Bagging, also known as bootstrap aggregating, re-selects the original data set by sampling with replacement
random forest
Random forest is a variant of Bagging, which introduces random attribute selection in the training of decision trees on the basis of building Bagging ensembles with decision trees as the base learner.
The traditional decision tree selects an optimal division attribute on the attribute set of the current node when selecting the division attribute (assuming that there are
Advantages and disadvantages of random forests
Advantages of random forests
- On many current datasets, it has great advantages over other algorithms and performs well.
- It can handle very high dimensional data without feature selection.
- After training, it can give which features are more important.
- When creating a random forest, an unbiased estimate is used for the generlization error, and the model has a strong generalization ability.
- The training speed is fast, and it is easy to make a parallel method.
- During the training process, the mutual influence between features can be detected.
- The implementation is relatively simple.
- For imbalanced datasets, it can balance the error.
- Accuracy can still be maintained if a significant portion of the features are missing.
Disadvantages of random forests
- Random forests have been shown to overfit on some noisy classification or regression problems.
- For data with attributes with different values, attributes with more value divisions will have a greater impact on random forests, so the attribute weights produced by random forests on such data are not credible.