Boosting method (on) AdaBoost

There is a lot of research on boosting methods, and many algorithms have been proposed. The most representative is the AdaBoost algorithm (AdaBoost algorithm).

The improvement method is to start from the weak learning algorithm and learn repeatedly to obtain a series of weak classifiers (also called basic classifiers), and then combine these weak classifiers to form a strong classifier.

For the boosting method, there are two questions to answer: one is how to change the weight or probability distribution of the training data in each round; the other is how to combine weak classifiers into a strong classifier. Regarding the first question, AdaBoost’s approach is to increase the weights of those misclassified samples by the previous round of weak classifiers, and reduce the weights of those correctly classified samples. In this way, those data that have not been correctly classified will receive more attention from the weak classifier in the next round due to the increase in their weight. Thus, the classification problem is "divide and conquer" by a series of weak classifiers. As for the second question, that is, the combination of weak classifiers, AdaBoost adopts the method of weighted majority voting. Specifically, increase the weight of a weak classifier with a small classification error rate to make it play a larger role in voting, and reduce the weight of a weak classifier with a large classification error rate to make it play a smaller role in voting. role.

Algorithm explanation:

There are N input data, and each data corresponds to a weight parameter wi. Then there are N training data and N weight parameters i∈1-n. The set is denoted as D. m refers to the iteration of the mth round, m ∈ 1-M. G stands for classifier.

In (1), first set all value parameters w to be the same. The value is 1/N. The set is D1.

AdaBoost repeatedly learns the basic classifier, and performs the following 2-4 steps sequentially in each round of m=1,2,...,M.

In step (2), learn with the training data set of weight distribution Dm of each round to obtain the basic classifier, and the classifier Gm(Xi) classifies the training data Xi.

According to the result G(xi) = {+1, -1} and the actual result yi classified by the m-th round classifier Gm.

Generally, the loss function we usually get is, , but here because each Xi is weighted, the real error should be recorded as

 

In step (3), with the error, we can calculate what weight the current classifier should give him.

Calculate its coefficient as:

The logarithm here is the natural logarithm.

This results in a model of the wheel,

 

Step (4), with the coefficient am of the model, we can traverse and calculate the weight coefficient W(m+1,i) of each input data Xi in the next round m+1. 

 

Get the set of data weights for the next round:

 

In this way, you can skip to step (2) for the next cycle until M rounds.

Step (5):

Finally, each weak classifier obtained in these M rounds is multiplied by the weight am, and linearly combined to obtain the final integrated classifier model:

 

So the final classifier is: 

case:

Example 8.1 Given the training data shown in Table 8.1. Assuming that the weak classifier is generated by x<v or x>v, its threshold v makes the classifier have the lowest classification error rate on the training data set. Use the AdaBoost algorithm to learn a strong classifier.

 

untie:

(1) Initialize weight distribution 

W1 = 1/10 = 0.1

(2) In the first round, for m=1

1. On the training data whose weight distribution is D1, the classification error rate is the lowest when the threshold v is 2.5, so the basic classifier is:

The error rate of G1(x) on the training data set

 

2. Calculate the coefficient of G1(x): 

3. Update the weight distribution of the training data Xi: 

The classifier sign[f 1(x)] has 3 misclassified points on the training dataset. 

(3) For the second round m=2:

1. On the training data with a weight distribution of D2, the classification error rate is the lowest when the threshold v is 8.5, and the basic classifier is

2. The error rate e2 of G2(x) on the training data set is 0.2143.

3. Calculate a2=0.6496.

4. Update the training data weight distribution:

 

The classifier sign[f 2(x)] has 3 misclassified points on the training dataset. 

(4) For the third round m=3:

1. On the training data with a weight distribution of D3, the classification error rate is the lowest when the threshold v is 5.5, and the basic classifier is

2. The error rate e3 of G3(x) on the training sample set is 0.1820.

3. Calculate a3=0.7514.

 4. Update the weight distribution of the training data:

So get: 

The number of misclassified points of the classifier sign[f 3(x)] on the training data set is 0.

(5) So the final classifier is:

 

 

 

 

Guess you like

Origin blog.csdn.net/stephon_100/article/details/125359141