"Machine Learning in Practice" Chapter 7 adaboost meta-algorithm learning summary

       Both the boosting algorithm and the bagging algorithm are methods of combining some weak classifiers for classification, which are collectively referred to as ensemble methods or meta-algorithms. Boosting is to focus on the data that the classifier misclassifies to obtain a new classifier, and the classification result is based on the weighted sum of all classifiers. In bagging, the weights of the classifiers are equal, while the weights of the classifiers in boosting are not equal, and each weight represents the success of its corresponding classifier in the previous iteration. The lower the error rate of the classifier, the greater the corresponding weight, and the easier it is to affect the prediction result.

      The representative classifier of the boosting method is adaboost. Adaboost is a combination algorithm that relies on the original classifier to build a strong classifier based on several weak classifiers. Weak classifier means that the performance of the classifier will not be very good, and the error rate will be less than 50% in the case of binary classification. In fact, any classifier can be used as a weak classifier, such as KNN, decision tree, logistic regression and SVM introduced in the previous book. The weak classifier used in this chapter is a single-level decision tree, which is a single-node decision tree. It is the most popular weak classifier in adaboost. The classification error rate of the strong classifier is relatively small, and the adaBoost algorithm relies on the combination of these weak classifiers to finally complete the classification prediction.

      The running process of Adaboost: each sample of training data is given a weight, and these weights constitute a weight vector D, whose dimension is equal to the number of samples in the dataset. At the beginning, these weights are all equal , first train a weak classifier on the training dataset and calculate the error rate of that classifier, then train the weak classifier again on the same dataset, but at the second training , the weight of each sample in the dataset will be adjusted according to the error rate of the classifier, the weight of the correctly classified sample will decrease, and the weight of the wrongly classified sample will increase, but the sum of these weights remains unchanged at 1. And, the final classifier will assign different coefficients of determination alpha based on the classification error rates of these trained weak classifiers, and classifiers with low error rates get higher coefficients of determination, thus playing a key role in predicting the data.

       That is to say, Adaboost continues to use the same classifier to classify the same data until all the data are classified, or the number of iterations is reached. This is a serial training process. In each iteration, the weights of the classifier and the value are different, and the next weight will depend on the classification result of the current classifier. For the data classified correctly by the classifier, its weight is reduced, and on the contrary, its weight is increased for the data classified by the classifier, which makes us pay more attention to those wrongly classified data in the next classification process.

The algorithm flow is as follows:


Among them, the left is the dataset, and the different widths of the histogram represent different weights on each example. After going through a classifier, the weighted predictions are weighted by the alpha value in the triangle. The weighted results output in each triangle are summed in the circle to get the final output.

1. Build a weak classifier

       In this book, weak classifiers are constructed using single-layer decision trees. A single-level decision tree can be regarded as a simple decision tree with a root node directly connecting two leaf nodes, making decisions based on only one feature. The pseudocode for a single-level decision tree is as follows:
set minError to infinity
for each attribute in the dataset
    for each step (2nd level loop):
        for each inequality sign:
            build a single-level decision tree and leverage the weighted dataset Test
            it If the error rate is lower than minError, make the current decision tree the best single-level decision tree

Returns the best single-level decision tree

Here, the criterion for evaluation is no longer entropy, but a weighted error rate.

2. Build Adaboost

       According to the obtained weak classifier, construct adaboost, the pseudo code is as follows:

For each iteration:
    Use buildStump to find the best single-level decision
    tree Add the best single-level decision tree to the array
    Calculate the classifier coefficient alpha
    Calculate the new weight D
    Update the cumulative class estimate

    If the error rate is 0.0, break out of the loop

Here, it should be noted that: suppose we set the number of iterations to 9, but the error rate of the algorithm is 0 after the 4th iteration, then the program automatically exits the iteration process, and it is not necessary to execute all 9 iterations at this time.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325792241&siteId=291194637