In-depth understanding of Xgboost in machine learning projects (2) Gradient and GB understanding

Xgboost is homologous to GBDT, so what does the so-called gradient mean? What do G and B stand for?

1. Gradient Gradient

The gradient is a vector (vector), indicating that the gradient is the maximum value of the directional derivative at a certain point. It is understood as the directional derivative of a function (can refer to the loss function) at a certain point. Along this direction, the gradient can achieve the maximum value, that is, the function changes the fastest along this direction (the direction of the gradient), and the rate of change (the modulus of the gradient )maximum.

2. The dimension of the gradient

        f\left ( x,y \right )=x^{2}+y^{2}, it is a cone in space, then at the point (x, y), the gradient represents: (2x, 2y), and the gradient dimension is 1 * 2 at this time.

        Assume that the currently used model is GBDT, the feature dimension of each sample is 1 * m, m represents the number of features, and the solution space dimension is 1 * m, so the corresponding gradient should also be 1 * m .
        This is my personal understanding. I hope to discuss this issue with you. If you have any questions, please contact me in time.

3. The connection between gradient and gradient descent

Definition of gradient descent method: If the real-valued function F(x) is differentiable and defined at a , then F(x)  drops fastest along the opposite direction of the gradient at point a.-\bigtriangledown F(a)

Usually, what we need to solve is the minimum value of the loss function , so the direction to choose to adjust the parameters is the reverse of the gradient.

4. The meaning of GB ( Gradient Boosting) 

        It is translated as "gradient boosting", but it is actually a kind of idea of ​​gradient descent . Generally speaking, it is the result of combining multiple weak classifiers to obtain a strong classifier. Unlike adaboost, which increases the weight of misclassified samples each time, GB is the residual error of each fitting of the previous prediction.

        FrideMan proposed: The gradient boosting algorithm uses the approximation method of the steepest descent, that is, uses the value of the negative gradient of the loss function in the current model as an approximation of the residual of the boosting tree algorithm in the regression problem to fit a regression tree.

        The gradient boosting algorithm of the tree model is to search in the function space to find the optimal function, which is also a gradient descent. The so-called boost refers to the improvement of the overall accuracy of the model:

Guess you like

Origin blog.csdn.net/xiao_ling_yun/article/details/128326014