Integrated Learning - Bagging Algorithm and Random Forest Algorithm

1. Integrated learning

Integrated learning is a method of combining multiple machine learning algorithms, that is, to achieve the effect of strong learning by combining multiple weak learners into a whole. As the saying goes: three cobblers are better than one Zhuge Liang.

2. Bagging algorithm

2.1 Bootstrap self-service sampling

Sampling with replacement is carried out in the sample set D (the number of samples is m), and the number of samples is m, then the probability of being sampled each time is 1/m, then the probability that the sample is not sampled in m times is ( 1 − 1 m ) m → lim ⁡ n → ∞ ( 1 − 1 m ) m → 1 e ≈ 0.368 (1-\frac{1}{m})^m →\lim\limits_{n \to \infty} (1-\frac{1}{m})^m \to \frac{1}{e} ≈ 0.368(1m1)mnlim(1m1)me1Taking the limit on the probability of 0.368 can get 36.8% of the samples in the data set that do not appear in the sampling data set D1. We use D1 as the training set and D-D1 as the test set. The advantage of this is that our model uses m training samples, and the number of samples has not been reduced, but the samples we use for testing are still 1/3 of the total , and these test samples did not appear in the training set.

2.2 Bagging Algorithm

insert image description here

As shown in the figure above, the training set is selected by bootstrap with playback sampling, and each sampling is used for the training of classifier C. Among them, random forest is a special case of Bagging algorithm, which uses a random feature subset to fit a single decision tree.

The process of Bagging algorithm is as follows:

(1) Use Bootstrap with replacement sampling to form a new training set of n samples (n<m);
(2) Repeat (1) for T times to obtain T training sets S i S_iSi;
(3) Use a certain classification algorithm to train independently on each training set, and get T base classifiers C i C^iCi ;
(4) Then for each test sample x, use T classifiers to get T predicted values​​ci ( x ) c_i(x)ci( x ) ;
(5) For each x, the final prediction resultc ∗ ( x ) c^*(x)c (x)
insert image description here
and this base classifier can be a decision tree, or a base classifier such as logistic regression. This kind of integrated learning is more practical for classifiers with poor stability, and the generalization error is reduced through majority voting; but the effect is not obvious for stable classifiers. Among them, the simple voting method is used for classification problems; and the simple average method is used for regression problems to obtain the average value.

Bagging performance

  • is an efficient ensemble learning algorithm;
  • Wide applicability, can be used for multi-classification and regression tasks without modification;
  • Due to the adoption of Bootstrap, the remaining 35.8% of the samples can be used as a validation set for "out-of-package sample evaluation" of generalization performance;
  • Bagging mainly focuses on reducing variance, so the effect is more obvious on learners that are not pruned and neural networks that are more susceptible to sample perturbations.

3. Random Forest

Random forest is Bagging+decision tree, but random forest not only adopts training samples, but also samples features. This is to ensure the independence of each decision tree, so that the voting results are more accurate. Each decision tree is calculated using the CART algorithm without pruning. This is because the random sample selection and random feature selection have been carried out before, so there will be no over-fitting phenomenon.
insert image description here

  • Question 1: Why is the training set randomly selected?

Without random sampling, the training set of each tree may overlap, and the final tree classification result is the same.

  • Question 2: Why is there sampling with replacement?

If there is no replacement, it means that the training samples of each tree are different, which will cause deviations in each tree, and the results of each tree are very different, so there is no way to make a final vote.

Random Forest Summary

  • Classification results are more accurate
  • Process high-latitude features without feature selection
  • Allows for missing data but still maintains high accuracy
  • fast learning
  • Ability to filter out feature importance
  • capable of parallel computing
  • Ability to detect interactions between features

Guess you like

Origin blog.csdn.net/gjinc/article/details/131914878