Statistical learning method 8-lifting method

AdaBoost algorithm

  • The discriminant model is a special case of the forward stepwise addition algorithm, and the loss function is an exponential function.
  • Idea: Multiple weak classifiers (the classification rules are rough and simple) to synthesize a strong classifier.
  • Briefly: Multiple weak classifiers classify the instance, and calculate the weight of multiple results (equivalent to voting, but the weight of each person's vote is different) to get the final result.
  • Two key points: 1. How to obtain multiple weak classifiers. 2. How to weight the results of multiple weak classifiers?
  • Steps:
    1. First initialize the training data weights (giving weights to each training data)
    2. Train the data set with weight distribution to obtain a classifier.
    a. Calculate the classification error rate of the classifier on the training set
    b. Calculate the coefficient of the classifier according to the misclassification rate (the classifier with a large error rate has a small coefficient)
    c. Update the weight distribution of the training data ( misclassified data The point weight is increased)
    3. Repeat 2 and iteratively obtain M classifiers.
    4. According to M classifiers and their coefficients, synthesize a final classifier.

Boost tree

  • The boosted tree model can be expressed as an additive model of decision trees. Instead of having n trees voting for selection (random forest), the input value is input into each tree, and then the results of each tree are accumulated. This also means that, except for the first tree, the values ​​of the trees added afterwards should all be a difference.
  • When the squared error loss is used, the trees after the first tree are fitted with residuals.
  • Specific steps:
  1. First initialize all input and output results, suppose it is a.
  2. Calculate the difference between the actual result and a, and add up the difference between all samples. Get the loss function. Find a that minimizes the loss function.
  3. On the basis of determining a, the output values ​​of all samples are compared with a, and the residual 1 is obtained as the sample basis for the new tree.
  4. The output result of the second tree is initialized as b. Based on the residual 1 obtained, the difference between b and the residual 1 is combined to obtain the loss function this time, and then the output result b that minimizes the loss function is obtained. .
  5. Repeat the similar process until the requirements are met.
  6. The final model should be a+b+...

Gradient Boosting Tree (GBDT)

  • Similar to the boosted tree, the residual is changed to the negative gradient of the loss function.

Reference: https://blog.csdn.net/Smile_mingm/article/details/108441387?spm=1001.2014.3001.5501

Guess you like

Origin blog.csdn.net/weixin_48760912/article/details/114701065