A picture introduces the integrated learning algorithm in machine learning

guide

Machine learning has already become a popular technology at the moment. Among many machine learning algorithms, apart from the latest development directions such as deep learning and reinforcement learning, when it comes to classic machine learning algorithms, then integrated learning algorithms are no longer effective or popular. All deserved focus. Today, this article will briefly introduce those classic integrated learning algorithms.

Three factions of integrated learning algorithms

The so-called ensemble learning, as the name suggests, is the result of integrating multiple base learners, using a certain fusion mechanism to obtain a more accurate and stable result. Among them, an important condition is hidden: the learning results of multiple base learners must be different, otherwise if the results of the base learners are exactly the same, no matter what fusion strategy is used, a better integration result will not be obtained. According to the different integration strategies and fusion mechanisms of multiple base learners, integrated learning mainly includes three factions:

  • Bagging, the full name of which is bootstrap aggregating, mainly trains multiple base learners independently in parallel, and then fuses multiple learning results by voting or weighting in order to obtain more accurate results . this reason;

  • Boosting, translated as the meaning of promotion, is a way to improve the learning effect one by one in a serial way, which means that the future generations stand on the shoulders of the giants in front ;

  • stacking, "stack" is the English word for stack in data structure and algorithm. It is used here to express the way of integration, which actually hides the idea of ​​sequence and progression. What is different from boosting is that stacking uses the results of many basic learners as The process of inputting retraining is somewhat similar to the idea of ​​a multi-layer neural network .

01 bagging

Bagging, the basic routine is to win with more. By training multiple independent base learners in parallel, and then merging the prediction results of all base learners, the final integrated prediction result is obtained. Here, in order to ensure that the ensemble learning result is better than the base learner, multiple base learners are required to have different prediction results, and the method to achieve this goal is generally sampling: different training sets are obtained by row sampling or column sampling, and then each The training results of each base learner are different. According to the different methods of row sampling and column sampling, the bagging genre is divided into the following four methods:

  • Only the sample dimension (reflected as row sampling) is sampled, and the sampling is replaced, which means that there may be repeated samples in the K sampling samples of each weak learner. At this time, it corresponds to the bagging algorithm, where bagging=bootstrap aggregating. It is not surprising that the name of this specific algorithm coincides with the name of the bagging genre, because this is a classic sampling method in bagging, so it is used as the name of the genre. Of course, bagging is both an algorithm and a genre name, so it depends on whether it is bagging in a narrow sense or a broad sense to distinguish it.

  • Only the sample dimension is sampled, but the sampling is non-replaceable, which means that the K sampling samples for each weak learner are completely different, because it is equivalent to every time sampling is performed, the sample is discarded (pass) , so the algorithm is called pasting at this time

  • The randomness of the former two comes from the random sampling of the sample dimension, that is, in the row direction, so is random sampling in the column direction also possible? For example, each weak learner selects all samples to participate in training, but the characteristics selected by each learner to participate in training are inconsistent, and the trained algorithm is naturally random, which can meet the requirements of integration. At this time, the corresponding algorithm is called subspaces, which is translated into subspace in Chinese, specifically the subspace of the feature dimension, and the name is quite vivid

  • It is found that there are both randomness in the sample dimension and randomness in the feature dimension, so it is natural to think whether there is randomness that takes into account both, that is to say, each weak learner performs both row sampling and column sampling, and the resulting weak learner has Algorithmic randomness should be stronger. Of course, this algorithm is called patches, such as the random forest mentioned above. In fact, random forest is the most widely used bagging genre integrated learning algorithm.

02 boosting

If the basic routine of bagging is to win with more, then the basic routine of boosting is to fight with wheels! That is, boosting adopts the method of progressive training of base learners one by one, and continuously improves the overall effect by continuously making up for the shortcomings and problems of the previous base learners. Here, in order to make up for the difference in shortcomings, boosting actually has two methods:

  • Adaboost: For the wrong samples in the previous round, continuously strengthen the weight of these error-prone samples, specifically by assigning different weight coefficients to each sample. This is like saying that a student can create his own set of wrong questions for previous study exams to achieve targeted improvement;

  • GBDT: Continuously make up the gap in the previous training to achieve the approximation of the overall effect. Specifically, the difference between the real value and the predicted value is equivalent to the gradient to achieve improvement.

Both are aimed at the weaknesses of the previous round of training, and make up for them in order to achieve a better learner through training. The methodology is the same, but the popularity of the current development is very different. At present, the application occasions of Adaboost are relatively limited, and there is a tendency to gradually fade out of the public's field of vision, while the idea of ​​GBDT is more popular, and more iterative and improved versions are increasingly derived, such as the three most popular integrated learning algorithms at present. It is all based on the improvement of GBDT:

  • Xgboost, this is the first improvement to GBDT, originated from Chen Tianqi, the great AI god (graduated from ACM of Shanghai Jiaotong University). The biggest difference with GBDT lies in the improvement of the objective function (essentially the expansion of Taylor's first-level expansion to the second-level expansion. Improvement), and some differences in engineering implementation;

  • Lightgbm, which is a further improvement based on Xgboost, originated from the Microsoft team. Just like the light in its name, lightgbm essentially improves the training speed on the basis of achieving similar effects to xgboost, which benefits from three algorithms or mechanisms: ① On the numerical dimension, the histogram-based statistics Binning processing simplifies the selection of data storage and split points; ② In the sample dimension, a unilateral gradient sampling algorithm is adopted to avoid the calculation of low gradient samples; ③ In the feature dimension, mutually exclusive feature bundles are used (well, The name is really a mouthful), which is to merge multiple sparse features to achieve dimensionality reduction of the number of features. In addition, another significant difference between Lightgbm and Xgboost is the adoption of a leaf-wise decision tree growth strategy to ensure that each split is meaningful;

  • Catboost, which debuted a little later than Lightgbm, originated from the Russian Internet giant Yandex, is actually an improvement on Xgboost. The biggest difference is that it supports Cat-type features without converting them into values, and also supports automatic processing of missing values. However, there are not many applications in the competition.

Overall, the boosting integrated learning genre represented by Xgboost, Lightgbm, and Catboost is the focus of the machine learning field, and it basically represents the research direction and focus of classic machine learning algorithms.

From another point of view, these three algorithms can also be regarded as originating from China, the United States, and Russia... Well, the struggle between great powers is really everywhere!

03 Stacking

The idea of ​​Stacking can actually be regarded as a mixture of bagging+boosting, that is, first train multiple base learners in parallel, and then use the training results of multiple base learners as feature data to further train the second-stage learner in order to obtain more for accurate results. Of course, according to this process, you can actually train the third wave and the fourth wave, which is really like a deep neural network... However, it may be because its idea is too close to the deep neural network, so the integrated learning of the Stacking method seems to be a deep neural network. It has not been widely used either.

Related Reading:

Guess you like

Origin blog.csdn.net/weixin_43841688/article/details/120031020