[Machine Learning Core Summary] What is GBDT (Gradient Boosting Tree)

What is GBDT (Gradient Boosting Tree)

Please add a picture description

Although GBDT also consists of many decision trees, it differs from Random Forest in many ways.

One of them is that the trees in GBDT are all regression trees. The trees have classification and regression, and the method of distinguishing them is very simple. It is a classification tree that simply divides apples into good and bad. If you can score apples according to their quality, it is a regression tree.

Please add a picture description

Another difference is that each tree in GBDT builds on the previous one.

Taking apple scoring as an example, we will first train a tree to roughly predict the scores of apples, and then train a tree to predict the gap between them and the real score. If there is still a gap between the two and the real score, we Then train the third tree to predict this part of the gap, repeat this process to continuously reduce the error, and add up the predicted values ​​​​of these trees, which is the score of the apple.

In addition to Apple, what is rated can also be web pages, movies, and merchandise. Sort by predicting the degree of association, click-through rate or user preference. GBDT is widely used in search, advertising, recommendation systems and other fields. It can handle various data such as labels and values, and has strong explanatory power. These are the advantages of GBDT .

However, due to the interdependence between trees and trees, it takes a long time to train. Using multiple models to solve problems together, GBDT naturally belongs to integrated learning.

A model like this depends on the previous model, and the method of jointly approaching the correct answer is called Boosting, which is B in GBDT.

Please add a picture description

Similar to Random Forest, the method of voting independently and jointly among the models is called Bagging.

Please add a picture description

There is also a kind of Stacking, which is to place a higher-level model on the basis of multiple models. It takes the output of the underlying model as its input, and it gives the final prediction result.

Please add a picture description

Guess you like

Origin blog.csdn.net/RuanJian_GC/article/details/131544199