回归的拟合优度

 1. 评价指标

i 为样本编号,y_{i} 为真实值,\widehat{y}_{i} 为预测值,\overline{y_{i}} 是真实值的平均数,则

1)均方误差MSE(Mean Squared Error) MSE=\frac{1}{n}\sum_{i=1}^{n}\left ( y_{i}-\widehat{y_{i}} \right )^{2}
2)均方根误差RMSE(Root Mean Squard Error) RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left ( y_{i}-\widehat{y_{i}} \right )^{2}}=\sqrt{MSE}
3)平均绝对误差MAE(Mean Absolute Error) MAE=\frac{1}{n}\sum_{i=1}^{n}\left | (y_{i}-\widehat{y_{i}}) \right |

4)回归平方和SSR(Sum of Squares forregression)

                          = ESS (explained sum of squares)

SSR=\sum_{i=1}^{n}\left (\widehat{ y_{i}}-\overline{y} \right )^{2}

5)残差平方和:SSE(Sum of Squares for Error)

                         = RSS(residual sum of squares)

SSE=\sum_{i=1}^{n}\left ( y_{i}-\widehat{y_{i}} \right )^{2}

6)总离差平方和SST(Sum of Squares fortotal)

                             = TSS(total sum of squares)

SST=SSR+SSE=\sum_{i=1}^{n}\left ( y_{i}-\overline{y} \right )^{2}
7)判定系数R2(R-Square) R^{2}=\frac{SSR}{SST}=1-\frac{SSE}{SST}=\frac{ \sum_{i=1}^{n}(\widehat{y_{i}}-\overline{y_{i}})^{2}}{\sum_{i=1}^{n}(y_{i}-\overline{y_{i}})^{2}}=1-\frac{ \sum_{i=1}^{n}(y_{i}-\widehat{y_{i}})^{2}}{\sum_{i=1}^{n}(y_{i}-\overline{y_{i}})^{2}}

注:直线回归的R2取值范围是[0,1],R2值越接近1越好

2. 代码实现

2.1 estimator.score

使用estimator的score函数来苹果模型的性能,默认情况下

  • 分类器对应于准确率:sklearn.accuracy_score
  • 回归器对应于R2得分:sklearn.r2_score

以XGboost回归为例:

reg = XGBR(n_estimators=100).fit(Xtrain,Ytrain) 
reg.score(Xtest,Ytest) 

使用Shift+Tab键,可以看到官方给的解释,可知score函数计算的是R2

 

2.2 cross_validation中的scoring参数

模型选择工具中都有一个参数“scoring”,该参数用来指定网格搜索GridSearchCV()、学习曲线learning_curve()、交叉验证曲线cross_val_score()的度量指标,评价“estimator”的性能。默认情况下,该参数为“None”,则调用“estimator”自己的“score”函数,我们也可以为“scoring”参数指定别的性能度量标准。

  • 学习曲线:(learning curve)

       数据集大小 与 模型score的关系


train_sizes, train_scores, valid_scores = sklearn.model_selection.learning_curve(estimator, 
X, y, *, groups=None, train_sizes=array([0.1, 0.33, 0.55, 0.78, 1. ]), cv=None, 
scoring=None, exploit_incremental_learning=False, n_jobs=None, pre_dispatch='all', 
verbose=0, shuffle=False, random_state=None, error_score=nan, return_times=False
  • 验证曲线:(validation curve)

      模型中某个参数 与 模型score的关系

sklearn.learning_curve.validation_curve(estimator, X, y, param_name, 
param_range, cv=None, scoring=None, n_jobs=1, pre_dispatch='all', verbose=0)
  • 交叉验证曲线 (cross_val_score)

sklearn.model_selection.cross_val_score(estimator, X, y=None, *, groups=None, scoring=None,
 cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', error_score=nan)

scoring参考文档:

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

猜你喜欢

转载自blog.csdn.net/weixin_43217427/article/details/110260497