Article Directory
Introduction to Random Forest
I. Overview
-
随机森林是一种集成算法
。 -
Ensemble learning It is a very popular machine learning algorithm. It is not a separate machine learning algorithm, butIntegrate the modeling results of all models by building multiple models on the data. Basically all machine learning fields can see integrated learning. In reality, integrated learning also has a considerable effect. It can be used to model marketing simulations, count customer sources, retain and churn, and can also be used Predict the risk of disease and the susceptibility of patients. In the current various algorithm competitions, random forests, gradient boosting trees (GBDT), Xgboost and other integrated algorithms can also be seen everywhere, showing their good effects and wide applications.
Two, integrated algorithm
Goal: The ensemble algorithm will consider the modeling results of multiple evaluators, and obtain a comprehensive result after aggregation, so as to obtain better regression or classification performance than a single model.
The model 集成评估器(ensemble estimator)
formed by the integration of multiple models is called , and each model that makes up the integrated evaluator is called 基评估器(base estimator)
. Generally speaking, there areThree types of integrated algorithms: Bagging, Boosting and stacking。
The core idea of bagging method isBuild multiple independent evaluators, and then use average or majority voting on their predictions to determine the result of the integrated evaluator, The representative model of bagging method is random forest.
In the lifting method, the base evaluator is related and constructed one by one in order. The core idea isCombining the power of the weak evaluator to predict the samples that are difficult to evaluate again and again to form a strong evaluator; The representative models of the lifting method are Adaboost and gradient boosting tree.
Third, the integrated algorithm module in skLearn
Integrated algorithm module in sklearn:ensemble
ensemble.AdaBoostClassifier | AdaBoost classification |
ensemble.AdaBoostRegressor | Adaboost is back |
ensemble.BaggingClassifier | Bagging classifier |
ensemble.BaggingRegressor | Bagging returner |
together. ExtraTreesClassifier | Extra-trees classification (super tree, extreme random tree) |
together. ExtraTreesRegressor | Extra-trees regression |
ensemble.GradientBoostingClassifier | Gradient Boosting Classification |
ensemble.GradientBoostingRegressor | Gradient Boosting Regression |
ensemble.IsolationForest | Isolated forest |
ensemble.RandomForestClassifier | Random forest classification |
ensemble.RandomForestRegressor | Random forest regression |
ensemble.RandomTreesEmbedding | Completely random tree integration |
ensemble.VotingClassifier | Soft voting/majority rule classifier for inappropriate estimator |