Integrated learning boosting and bagging

Idea: Learning tasks are accomplished by building and combining multiple learners

Strong learners: such as neural networks, etc., require a large amount of data, and high server performance, etc., generally speaking, the accuracy is very high.
Weak learner: If the decision tree, logistic regression, etc., the model is simple, the general accuracy is average.
Improve model performance by combining multiple weak learners.

Issues requiring attention

1. How to train an individual learner
Let each single weak learner learn different data.
Change the weight or probability distribution of the training data. Increase the weight of some data in the learning data to be learned
2. How to combine each weak learner

boosting 和 bagging
boosting

Boosting: There is a strong dependency between individual learners and a sequence method that must be generated serially. (Serial)
How to change the learning weights of the training data: increase the weights of samples that were wrongly
classified by weak classifiers in the previous round Build the weights of those samples that were classified by weak classifiers in the previous round In short, focus on how each weak classifier that is wrong is combined: use an additive model to combine these weak classifiers. Such as adaboost, GBDT, xgboost, lightgbm, catboost




bagging

Bagging: There is no strong dependency relationship between individual learners, and a parallelization method that can be generated simultaneously. (Parallel)
How to change the learning weight of the training data:
Each round uses the Bootstraping method from the original sample set (that is, the bootstrap method, which is a sampling method with replacement, and may draw repeated samples). Each round draws
n training samples (in the training set, some samples may be drawn multiple times, and some samples may not be drawn once). A total of k rounds of extraction are performed to obtain k training sets, and the k training sets are independent of each other
.
In random forests, a certain number of features are also randomly sampled.
Example:
insert image description here
Randomly extract two features and two data as a data set.
K training sets are trained separately to obtain k models, reflecting parallelism.
How to combine each weak classifier:
Classification problem: k models use voting to get classification results
Regression problem: calculate the mean of k models as the final result,
such as random forest.

Adaboost algorithm
solved problem

Adaboost solves the problem of two classifications

train of thought

1. In each round of training, record the samples that are [correctly classified] and [misclassified] by the [current weak classifier], and increase the weight of [misclassified samples] and reduce the weight of [correctly classified samples] during a round of training to train a new weak classifier. In this way, those data that have not been correctly classified will receive more attention in the next training due to their increased weight.
How to understand the weight of the sample:
As shown in the figure:
insert image description here
there are three samples in the first place. If the classifier is in training, there is no weight adjustment, which is 1/3. Suppose the first
weak classifier can correctly classify the second and third samples after learning, then the first sample is wrongly classified, so the weight of the first sample needs to be increased.
Equivalent to increasing the number of the first sample

mathematical expression

f ( x ) = ∑ m = 1 M α m G m ( x ) = α 1 G 1 ( x ) + . . . . . . . + α m G m ( x ) f(x)=\sum_{m=1}^{M}\alpha_mG_m(x)=\alpha_1G_1(x)+.......+\alpha_mG_m(x) f(x)=m=1MamGm(x)=a1G1(x)+.......+amGm( x )
whereG m ( x ) , α G_m(x), \alphaGm( x ) and α represent each weak classifier and the weight of each weak classifier respectively. α \alphaα is determined by the classification error rate. Increase the model weight of [small classification error rate] so that it plays a greater role in voting. Reduce the weight of the model with a large classification error rate so that it plays a smaller role in the voting.

Algorithm process

Cycle M times
1. Initialize/update the current [weight distribution of training data]
If it is the first training, then initialize, the weight of each sample is 1 N \frac{1}{N}N1, N is the size of the dataset.
If the first training is not counted, then
insert image description here
D m ​​D_m should be updatedDmRepresents the weight of the sample of the mth weak classifier
insert image description here
When the classification is correct, e has a negative sign on it. Decrease the weight.
When the classification is wrong, there is a positive sign above the e. Add weight.
2. Train the current base classifier Gm(x)
3. Calculate the weight of the current base classifier α m \alpha_mam
α m \alpha_m amand classification error rate em e_memIt is equal to the sum of the weights of misclassified samples. 0 ≤ em ≤ 0.5 0\leq e_m\leq0.50em0.5 because it 's so wrong that it can be reversed.
α m \alpha_mamwith em e_memThe relationship is inversely proportional, so
α m \alpha_mam= 1 2 l o g 1 − e m e m \frac{1}{2}log\frac{1-e_m}{e_m} 21logem1em
4. Update amGm(x) to the additive model f(x)
5. Determine whether the loop exit condition is satisfied
1. Whether the number of classifiers reaches M
2. Whether the error rate of the total classifier f(x) is lower than the set accuracy
f(x) is finally calculated as a tree, and a threshold is used to determine whether it is a positive class or a negative class.

training method

Forward distribution algorithm:
insert image description here
Using the gradient descent algorithm will lead to the problem of too many parameters. Here, the forward distribution algorithm is used to optimize each break, fm − 1 ( xi ) f_{m-1}(x_i)fm1(xi) here is a fixed value.
The traditional gradient descent is expressed as follows:
insert image description here
direct gradient descent will lead to too many parameters, and the forward distribution algorithm makes full use of the results of the previous round.

Guess you like

Origin blog.csdn.net/qq_40920203/article/details/127941635