AdaBoost Bagging Random Forest - Study Notes

Notes from: Machine Learning - Zhou Zhihua Statistical Learning Methods - Li Hang  

Boosting & AdaBoost

Boosting is a family of algorithms that can boost weak learners to strong ones. The working mechanism of this family of algorithms is similar: first train a basic learner from the initial training set, and then adjust the probability distribution of the training samples (the weight distribution of the training data) according to the performance of the basic learner. Invoke a weak learning algorithm to learn a series of weak classifiers.

There are two problems with the boost method:

1. How to change the weights or probability distribution of the training data in each round

2. How to combine weak classifiers into a strong classifier.

The AdaBoost approach is to increase the weights of those samples that were misclassified in the previous round of weak classifiers, and reduce the weights of those that were correctly classified, so that those data that were not correctly classified due to their weights The classification problem is "divided and conquered" by a series of weak classifiers.

For the second question, AdaBost adopts the method of majority voting. Specifically, it increases the weight of the weak classifier with a small classification error rate, so that it plays a greater role in voting and reduces the weak classification of the classification error rate. The weight of the controller makes it play a smaller role in voting.

The characteristics of AdaBoost: do not change the given training data, but constantly change the distribution of training data weights, so that the training data can be rented differently in the basic classifier.

Standard AdaBoost is only suitable for binary classification tasks,

Bagging can be used for multi-classification, regression and other tasks without modification;

Bagging:

Bagging is the most famous representative of the parallel ensemble learning method. Directly based on the self-sampling method.

Self-service sampling method: Given a data set containing m samples, first randomly take a sample and put it into the sampling set, and then put the sample back into the initial data set, so that the sample may still be selected in the next sampling. For m random sampling operations, we get a sampling set containing m samples. Some samples in the initial training set appear many times in the sampling set, and some never appear in the sampling set. It can be seen from the calculation that about 63.2% of the samples in the initial training set appear in the sampling set.

The basic process of Bagging:

T sampling sets containing m training samples are sampled, then a base learner is trained based on each sampling set, and these base learners are combined.

When combining predicted outputs, bagging typically uses simple voting for classification tasks and simple averaging for regression tasks . If the classification prediction is a situation where two classes receive the same number of votes, the simple approach is to randomly select one, or further examine the confidence of the learner's vote to determine the final winner.

random forest

Random Forest is a variant of Bagging. On the basis of building Bagging ensemble with decision tree as the base learner, RF further introduces random attribute selection in the training process of decision tree. Specifically, the traditional decision tree selects an optimal attribute among all the attributes of the current node (assuming there are d attributes) when selecting attributes for division, while in RF, for each node of the base decision tree, First randomly select a subset containing k attributes from the attribute set of the node, and then select an optimal attribute from the subset for partitioning. The parameter k here controls the degree of randomness introduced: if k=d, the construction of the base decision tree is the same as that of the traditional decision tree; if k=1, an attribute is randomly selected for division; in general, Recommended value k=log2d.

Random Senling has only made small changes to Bagging. The "diversity" of base learners in Bagging only comes from sample perturbation (that is, by sampling the initial training set), and the diversity of base learners in random forests comes not only from samples The perturbation also comes from attribute perturbation, which enables the generalization performance of the final ensemble to be further improved by increasing the degree of difference between individual learners.

Diversity: refers to differences between learners.

The diversity between Bagging learners is achieved through sample sampling. Random forest has an additional attribute random selection mechanism to ensure diversity, while Adaboost ensures the diversity of learners by changing the distribution of training data.

Ways to increase diversity:

Data sample perturbation, input attribute perturbation, output representation perturbation, algorithm parameter perturbation

The training efficiency of random forest is often better than that of Bagging, because in the process of constructing individual decision trees, Bagging uses "deterministic" decision trees. A "random" decision tree only needs to examine a subset of attributes.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325528469&siteId=291194637