[SkLearn classification, regression algorithm] Random Forest Regressor RandomForestRegressor



Random Forest Regressor RandomForestRegressor

class sklearn.ensemble.RandomForestRegressor(n_estimators=100, *, criterion='mse', 
max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, 
max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, 
min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None,
verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None)

Almost all parameters, attributes and interfaces are exactly the same as the random forest classifier. The only difference is the difference between the regression tree and the classification tree, as well as the impurity index, and the parameter criterion is inconsistent.


① Important parameters, attributes and interfaces

criterion

  • The regression tree measures the quality of branches. There are three supported standards:

    • 1) Enter " mse" Using the 均方误差mean squared error(MSE)difference between the mean square error, the parent node of the leaf node and will be used as a standard feature selection, by this method using the mean value of the leaf node to minimize the loss of L2
      Insert picture description here
      wherein N是样本数量, i是每一个数据样本, fi是模型回归出的数值, yi是样本点i实际的数值标签. and soThe essence of MSE is the difference between the real data of the sample and the regression result. In the regression tree, MSE is not only our branch quality measurement indicator, but also our most commonly used indicator to measure the regression quality of the regression tree. When we are using cross-validation or other methods to obtain the results of the regression tree, we often choose the average The square error is used as our evaluation (in the classification tree, this indicator is the prediction accuracy represented by score). In the return, we pursued that the smaller the MSE, the better. However,回归树的接口 score返回的是R²,并不是MSE . R-squared is defined as follows:
      Insert picture description here
      wherein u是残差平方和(MSE*N), v是总平方和, N是样本数量, i是每一个数据样本, fi是模型回归出的数值, yi是样本点i实际的数值标签. y帽是真实数值标签的平均数. R-squared can be positive or negative (if the model's residual sum of squares is much greater than the model's total sum of squares, the model is very bad, R-squared will be negative), and the mean square error will always be positive.
      ★ It is worth mentioning that although the mean square error is always positive, when the mean square error is used as the criterion in sklearn, the negative mean square error "(neg mean_squared_error) is calculated. This is because when sklearn calculates the model evaluation index, it will Considering the nature of the indicator itself, the mean square error itself is an error, so it is classified as a loss of the model by sklearn. Therefore, in sklearn, it is expressed as a negative number. The value of the true mean square error MSE is actually neg meansquared_error remove the negative number.

    • 2) Enter " friedman mse"Use 费尔德曼均方误差, this indicator uses Friedman's improved mean square error for problems in potential branches

    • 3) Enter " mae"Use 绝对平均误差mae(mean absolute error), this indicator uses the median value of the leaf node to minimize the L1 loss

  • The most important attribute is still feature_importances_, the interface is still apply、fit 、predict、 scorethe core.

Back to top


② Simple to use ---- Boston house price random forest regression verification

from matplotlib import pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor

#  获取数据集
boston = load_boston()
x = boston.data
y = boston.target

# 建模
regressor = RandomForestRegressor(n_estimators=100,random_state=0)
# 交叉验证
cross_score = cross_val_score(regressor,x,y,cv=10,scoring="neg_mean_squared_error")
cross_score

array([-10.72900447,  -5.36049859,  -4.74614178, -20.84946337,
       -12.23497347, -17.99274635,  -6.8952756 , -93.78884428,
       -29.80411702, -15.25776814])

Supplement — view a list of all model evaluations (scoring)

import sklearn
sorted(sklearn.metrics.SCORERS.keys())

Insert picture description here

Back to top


Guess you like

Origin blog.csdn.net/qq_45797116/article/details/113772178