[SkLearn classification, regression algorithm] Random forest introduction



Introduction to Random Forest

I. Overview

  • 随机森林是一种集成算法

  • Ensemble learning It is a very popular machine learning algorithm. It is not a separate machine learning algorithm, butIntegrate the modeling results of all models by building multiple models on the data. Basically all machine learning fields can see integrated learning. In reality, integrated learning also has a considerable effect. It can be used to model marketing simulations, count customer sources, retain and churn, and can also be used Predict the risk of disease and the susceptibility of patients. In the current various algorithm competitions, random forests, gradient boosting trees (GBDT), Xgboost and other integrated algorithms can also be seen everywhere, showing their good effects and wide applications.

Back to top


Two, integrated algorithm

Goal: The ensemble algorithm will consider the modeling results of multiple evaluators, and obtain a comprehensive result after aggregation, so as to obtain better regression or classification performance than a single model.

The model 集成评估器(ensemble estimator)formed by the integration of multiple models is called , and each model that makes up the integrated evaluator is called 基评估器(base estimator). Generally speaking, there areThree types of integrated algorithms: Bagging, Boosting and stacking

Insert picture description here

The core idea of ​​bagging method isBuild multiple independent evaluators, and then use average or majority voting on their predictions to determine the result of the integrated evaluator, The representative model of bagging method is random forest.

In the lifting method, the base evaluator is related and constructed one by one in order. The core idea isCombining the power of the weak evaluator to predict the samples that are difficult to evaluate again and again to form a strong evaluator; The representative models of the lifting method are Adaboost and gradient boosting tree.

Back to top


Third, the integrated algorithm module in skLearn

Integrated algorithm module in sklearn:ensemble

ensemble.AdaBoostClassifier AdaBoost classification
ensemble.AdaBoostRegressor Adaboost is back
ensemble.BaggingClassifier Bagging classifier
ensemble.BaggingRegressor Bagging returner
together. ExtraTreesClassifier Extra-trees classification (super tree, extreme random tree)
together. ExtraTreesRegressor Extra-trees regression
ensemble.GradientBoostingClassifier Gradient Boosting Classification
ensemble.GradientBoostingRegressor Gradient Boosting Regression
ensemble.IsolationForest Isolated forest
ensemble.RandomForestClassifier Random forest classification
ensemble.RandomForestRegressor Random forest regression
ensemble.RandomTreesEmbedding Completely random tree integration
ensemble.VotingClassifier Soft voting/majority rule classifier for inappropriate estimator

Back to top


Guess you like

Origin blog.csdn.net/qq_45797116/article/details/113763093