Boosting method (bottom) boosting tree

The boosting method uses an additive model (ie, a linear combination of basis functions) and a forward step-by-step algorithm. The boosting method based on decision tree is called boosting tree. The decision tree for classification problems is a binary classification tree, and the decision tree for regression problems is a binary regression tree. The basic classifier x<v or x>v seen in the case of AdaBoost can be seen as a simple decision tree with two leaf nodes directly connected by a root node, the so-called decision stump.

A boosted tree model can be expressed as an additive model of a decision tree:

Boosting tree algorithm:

The boosting tree algorithm uses a forward step-by-step algorithm. First determine the initial lifting tree f 0(x)=0, the model of the mth step is:

 

Among them, fm-1(x) is the current model, and the parameter Θm of the next decision tree is determined through empirical risk minimization, 

The following discusses the boosting tree learning algorithms for different problems. The main difference is that the loss functions used are different. These include regression problems with squared error loss functions, classification problems with exponential loss functions, and general decision problems with general loss functions.

For the two-class classification problem, the boosted tree algorithm only needs to limit the basic classifier in the AdaBoost algorithm to the two-class classification tree. It can be said that the boosted tree algorithm at this time is a special case of the AdaBoost algorithm, which will not be described in detail here. The boosted tree for the regression problem is described below.

 

Boosted trees for regression problems use the following forward stepwise algorithm: 

In the mth step of the forward step-by-step algorithm, given the current model fm-1(x), the solution is required: 

Get m, the parameter of the mth tree.

When using the squared error loss function,

Its loss becomes: 

Here, is the residual of the current model fit data.

So, for a boosted tree algorithm for regression problems, simply fit the residuals of the current model. In this way, the algorithm is quite simple.

case:

Given the training data shown in Table 8.2, the value range of x is the interval [0.5,10.5], and the value range of y is the interval [5.0,10.0]. To learn the boosted tree model for this regression problem, consider only using stumps as a basis function.

untie:

(1) Find f 1(x) in the first step, which is the regression tree T1(x).

 First, solve the segmentation point s of the training data through the following optimization problem:

 

It is easy to find c1 and c2 that minimize the square loss error inside R1 and R2. 

Here N1, N2 are the sample points of R1, R2.

Find the segmentation point of the training data.

According to the given data, consider the following cut points: 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5

For each segmentation point, it is not difficult to find the corresponding R1, R2, c1, c2 and

 

For example, when s=1.5, R1={1}, R2={2,3,...,10}, c1=5.56, c2=7.50, 

The calculation results of s and m(s) are listed as follows (see Table 8.3) 

It can be seen from Table 8.3 that m(s) reaches the minimum value when s=6.5, at this time R1={1,2,...,6}, R2={7,8,9,10}, c1=6.24, c2= 8.91, so the regression tree T1(x) is 

The above is the standard procedure for solving a regression tree.

(2) The second round:

The residual error of fitting the training data with f 1(x) is shown in Table 8.4, where r 2i=yi-f 1(xi), i=1,2,…,10.

 

Calculate the squared loss error of the model obtained by fitting the training data with f 1(x): 

Step 2 Calculate T2(x). The method is the same as finding T1(x), except that the fitted data is the residual in Table 8.4. can get: 

 The squared loss error for fitting the training data with f2(x) is:

Continue to obtain: 

 The squared loss error for fitting the training data with f6(x) is:

Assuming that the error requirement has been met at this time, then f(x)=f 6(x) is the desired lifting tree. 

Gradient boosting algorithm:

The boosting tree uses the additive model and the forward step-by-step algorithm to realize the optimization process of learning. When the loss function is square loss and exponential loss function, each step of optimization is simple. But for general loss functions, it is often not so easy to optimize each step. In response to this problem, Freidman proposed a gradient boosting algorithm. This is an approximation method using the steepest descent method, the key is to use the negative gradient of the loss function in the value of the current model

 Fits a regression tree as an approximation to the residuals in the boosted tree algorithm for regression problems.

algorithm:

explain:

(1) Initialize an initial model, and use the following optimization formula to estimate a constant value c that can make the loss relatively small. This constant value is equivalent to a tree model with only a single root node.

 

(2) Calculate the value of the negative gradient of the loss function of the current model, and use it as an estimate of the residual. For the squared loss function, it is what is commonly called the residual; for the general loss function, it is an approximation of the residual. 

(3) With the residual, use the residual as input to split the regression tree, which is to execute the standard regression decision tree split process mentioned before, and perform this split. The regression tree obtained this time is c, and the following formula means this.

 

Thus the model for the round is obtained:

The optimal number of sub-numbers obtained by this segmentation is denoted as Cm, plus the model of the previous rounds is the model of the regression tree obtained in the current round.

 

 The previous model obtained by Cm is,

It is the regression tree model. 

(4) After M rounds of circulation, the regression tree model is finally obtained:

 

Guess you like

Origin blog.csdn.net/stephon_100/article/details/125359464