Transfer learning---TrAdaBoost algorithm introduction

from: https://www.jianshu.com/p/8ed0703db2c7
Author: caokai1073
link: https: //www.jianshu.com/p/8ed0703db2c7
Source: Jane books

Another: Analysis of TrAdaBoost Source Code

Why TrAdaBoost

Traditional machine learning has an assumption: the training set and test set have the same distribution. In fact, in many cases, this single distribution assumption does not hold. If a batch of new data (new data) comes and the distribution is different from the old data of the previous training (old data), then the accuracy of our algorithm will drop a lot. However, there are two problems with using new data and completely discarding old data. First, the amount of new data may not be enough; second, the old data may still be valuable, and it is too wasteful to completely discard it. That's why there is transfer learning, which extracts information from old data and uses it for new model training.

TrAdaBoost algorithm is a method of extracting examples from old data, that is, part of the old labeled data that can be used is combined with new labeled data (maybe a small amount) to build a more accurate model than training with new labeled data alone.

TrAdaBoost algorithm

Taking the test data space as the benchmark, new data has the same data distribution space, denoted as X_s, and old data is a different distribution space, denoted as X_d. Suppose it is a binary classification problem, and labels are Y={0,1}. The entire training data space is
Insert picture description here
so we just need to find the X->Y mapping function c.
Test data set (no label):
Insert picture description here
Insert picture description here
training data set:
Insert picture description here

The training set T can be divided into data T_d from different distributions and data T_s from the same distribution,
Insert picture description here
Insert picture description here
so all training data:
Insert picture description here
there are n data from the X_d space, and m data from the X_s space.

The overall algorithm is as follows:
Insert picture description here

Step 1: Normalize the weight of each data to make it a distribution.
Insert picture description here

Step 2: Call the weak classifier. Regarding the data of T_d and T_s as training data, the process is the same as that of AdaBoost training weak classifier. This step is also where our old data affects the model.
Insert picture description here

Step 3: Calculate the error rate. Note that only the data extracted from T_s is calculated here, that is, new data. The old data does not enter the calculation here. And when calculating the error rate, the weight of the extracted data in T_s needs to be renormalized.
Insert picture description here
Step 4: Calculate the rate of weight adjustment of T_s and T_d respectively. Note that in each iteration, the weight adjustment rate of T_s is different, and the data in T_d is the same. In AdaBoosting, Beta_t is equivalent to how much the voice of each weak classifier is. The larger the Beta_t, the smaller the voice of the weak classifier.
Insert picture description here
Step 5: Update data weights. If the data in T_s is classified incorrectly, the weight value is increased, which is consistent with the traditional AdaBoost algorithm. The data in T_d, on the contrary, if the classification is wrong, reduce the weight value, because the wrong classification is considered that this part of the old data and the new data are too far apart.
Insert picture description here
Output. The votes of the next half of the weak classifiers (N/2~N) shall prevail.
Insert picture description here
References:
[1] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for Transfer Learning,” Proc. 24th Int'l Conf. Machine Learning, pp. 193-200, June 2007.

Guess you like

Origin blog.csdn.net/qq_16488989/article/details/109110947