金融贷款逾期的模型构建3——模型评估

文章目录

一、评价指标

1、基本概念
2、准确率（accuracy）
3、精确率（precision）
4、召回率（recall）
5、F1值
6、roc曲线和 auc值

二、模型评估

1、Logistic Regression
2、SVM
3、决策树
4、随机森林
5、GBDT模型
6、XGBoost
7、lightGBM
8、绘图

目标：记录7个模型（逻辑回归、SVM、决策树、随机森林、GBDT、XGBoost和LightGBM）关于accuracy、precision，recall和F1-score、auc值的评分表格，并画出ROC曲线。

一、评价指标

1、基本概念

对于一个二分类问题，预测与真实结果会出现四种情况。

真实情况 \ 预测情况	正类	负类
正类	TP（True Positive）	FN（False Negative）
负类	FP（False Positive）	TN（True Negative）

我的记忆方法：首先看第一个字母是T则代表预测正确，反之F预测错误；然后看P表示预测的结果是正，N表示预测的结果为负。

2、准确率（accuracy）

accuracy表示所有预测正确的占总的比重。
$accuracy = \dfrac{TP + TN }{TP + TN+FP+FN}$

3、精确率（precision）

precision（查准率）：正确预测为正的占全部预测为正的比例，也就是真正正确的占所有预测为正的比例。
$precision = \dfrac{TP}{TP+FP}$

4、召回率（recall）

recall（查全率）：正确预测为正占全部真实为正的比例，也就是真正正确的占所有实际为正的比例。

例如：召回率在医疗方面非常重要。
$recall = \dfrac{TP}{TP+FN}$

5、F1值

F1值：精确率和召回率的调和均值，越大越好。
$\dfrac{2}{F_1} = \dfrac{1}{precision} + \dfrac{1}{recall}$
==》 $F_1 = \dfrac{2PR}{P + R} = \dfrac{2TP}{2TP+FP+FN}$

6、roc曲线和 auc值

roc曲线：接收者操作特征曲线（receiver operating characteristic curve），是反映敏感性和特异性连续变量的综合指标，ROC曲线上每个点反映着对同一信号刺激的感受性。下图是ROC曲线例子。
在这里插入图片描述

横坐标：1-Specificity，伪正类率(False positive rate，FPR，FPR=FP/(FP+TN))，预测为正但实际为负的样本占所有负例样本的比例；

纵坐标：Sensitivity，真正类率(True positive rate，TPR，TPR=TP/(TP+FN))，预测为正且实际为正的样本占所有正例样本的比例。

真正的理想情况，TPR应接近1，FPR接近0，即图中的（0,1）点。ROC曲线越靠拢（0,1）点，越偏离45度对角线越好。

AUC值。AUC (Area Under Curve) 被定义为ROC曲线下的面积。取值范围 [0.5, 1]，AUC值越大的分类器，正确率越高。

二、模型评估

目标：考察 accuracy、precision，recall和f1-score、auc 的取值，并画出roc曲线图。

1、Logistic Regression

## Logistic Regression
lr = LogisticRegression()
lr.fit(x_train_stand, y_train)
y_pre_lr = lr.predict(x_test_stand)
y_score_lr = lr.predict_proba(x_test_stand)[:,1]
lr_accuracy = accuracy_score(y_test, y_pre_lr)
print('The accuracy of LR', lr_accuracy)
lr_precision = precision_score(y_test, y_pre_lr)
print('The precision of LR', lr_precision)
lr_recall = recall_score(y_test, y_pre_lr)
print('The recall of LR', lr_recall)
lr_f1_score = recall_score(y_test, y_pre_lr)
print('The F1 score of LR', lr_f1_score)
lr_roc_auc_score = roc_auc_score(y_test, y_pre_lr)
print('The AUC of LR', lr_roc_auc_score)
## roc 曲线
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_lr)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(lr_roc_auc_score,5))], loc="lower right")
plt.show()

输出结果

The accuracy of LR 0.7876664330763841
The precision of LR 0.6609195402298851
The recall of LR 0.3203342618384401
The F1 score of LR 0.3203342618384401
The AUC of LR 0.6325454080727781

在这里插入图片描述

2、SVM

## SVM
svm = SVC(random_state=2018, probability=True)
svm.fit(x_train_stand, y_train)
y_pre_svm = svm.predict(x_test_stand)
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
svm_accuracy = accuracy_score(y_test, y_pre_svm)
print('The accuracy of SVM', svm_accuracy)
svm_precision = precision_score(y_test, y_pre_svm)
print('The precision of SVM', svm_precision)
svm_recall = recall_score(y_test, y_pre_svm)
print('The recall of SVM', svm_recall)
svm_f1_score = recall_score(y_test, y_pre_svm)
print('The F1 score of SVM', svm_f1_score)
svm_roc_auc_score = roc_auc_score(y_test, y_pre_svm)
print('The AUC of SVM', svm_roc_auc_score)
## roc 曲线
test_fprs,test_tprs,test_thresholds = roc_curve(y_test, y_score_svm)
plt.plot(test_fprs, test_tprs)
plt.plot([0,1], [0,1],"--")
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.legend(labels=["Test AUC:"+str(round(svm_roc_auc_score,5))], loc="lower right")
plt.show()

输出结果

The accuracy of SVM 0.7806587245970568
The precision of SVM 0.7017543859649122
The recall of SVM 0.22284122562674094
The F1 score of SVM 0.22284122562674094
The AUC of SVM 0.5955030098171158

在这里插入图片描述

3、决策树

## DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=2018)
dt.fit(x_train_stand, y_train)
y_pre_dt = svm.predict(x_test_stand)
dt_accuracy = accuracy_score(y_test, y_pre_dt)
print('The accuracy of DecisionTree', dt_accuracy)
dt_precision = precision_score(y_test, y_pre_dt)
print('The precision of DecisionTree', dt_precision)
dt_recall = recall_score(y_test, y_pre_dt)
print('The recall of DecisionTree', dt_recall)
dt_f1_score = recall_score(y_test, y_pre_dt)
print('The F1 score of DecisionTree', dt_f1_score)
dt_roc_auc_score = roc_auc_score(y_test, y_pre_dt)
print('The AUC of DecisionTree', dt_roc_auc_score)

输出结果

The accuracy of DecisionTree 0.7806587245970568
The precision of DecisionTree 0.7017543859649122
The recall of DecisionTree 0.22284122562674094
The F1 score of DecisionTree 0.22284122562674094
The AUC of DecisionTree 0.5955030098171158

4、随机森林

## 随机森林模型
rfc = RandomForestClassifier()
rfc.fit(x_train_stand, y_train)
y_pre_rf = rfc.predict(x_test_stand)
rf_accuracy = accuracy_score(y_test, y_pre_rf)
print('The accuracy of Random Forest', rf_accuracy)
rf_precision = precision_score(y_test, y_pre_rf)
print('The precision of Random Forest', rf_precision)
rf_recall = recall_score(y_test, y_pre_rf)
print('The recall of Random Forest', rf_recall)
rf_f1_score = recall_score(y_test, y_pre_rf)
print('The F1 score of Random Forest', rf_f1_score)
rf_roc_auc_score = roc_auc_score(y_test, y_pre_rf)
print('The AUC of Random Forest', rf_roc_auc_score)

输出结果

The accuracy of Random Forest 0.7638402242466713
The precision of Random Forest 0.5846153846153846
The recall of Random Forest 0.2116991643454039
The F1 score of Random Forest 0.2116991643454039
The AUC of Random Forest 0.5805686832962974

5、GBDT模型

## GBDT模型
gbdt = GradientBoostingClassifier()
gbdt.fit(x_train_stand, y_train)
y_pre_gbdt = gbdt.predict(x_test_stand)
gbdt_accuracy = accuracy_score(y_test, y_pre_gbdt)
print('The accuracy of GBDT', gbdt_accuracy)
gbdt_precision = precision_score(y_test, y_pre_gbdt)
print('The precision of GBDT', gbdt_precision)
gbdt_recall = recall_score(y_test, y_pre_gbdt)
print('The recall of GBDT', gbdt_recall)
gbdt_f1_score = recall_score(y_test, y_pre_gbdt)
print('The F1 score of GBDT', gbdt_f1_score)
gbdt_roc_auc_score = roc_auc_score(y_test, y_pre_gbdt)
print('The AUC of GBDT', gbdt_roc_auc_score)

输出结果

The accuracy of GBDT 0.7792571829011913
The precision of GBDT 0.6057692307692307
The recall of GBDT 0.35097493036211697
The F1 score of GBDT 0.35097493036211697
The AUC of GBDT 0.6370979520724442

6、XGBoost

## XGBoost模型
xgb = xgb.XGBClassifier()
xgb.fit(x_train_stand, y_train)
y_pre_xgb = xgb.predict(x_test_stand)
xgb_accuracy = accuracy_score(y_test, y_pre_xgb)
print('The accuracy of XGBoost', xgb_accuracy)
xgb_precision = precision_score(y_test, y_pre_xgb)
print('The precision of XGBoost', xgb_precision)
xgb_recall = recall_score(y_test, y_pre_xgb)
print('The recall of XGBoost', xgb_recall)
xgb_f1_score = recall_score(y_test, y_pre_xgb)
print('The F1 score of XGBoost', xgb_f1_score)
xgb_roc_auc_score = roc_auc_score(y_test, y_pre_xgb)
print('The AUC of XGBoost', xgb_roc_auc_score)

输出结果

The accuracy of XGBoost 0.7841625788367204
The precision of XGBoost 0.624390243902439
The recall of XGBoost 0.3565459610027855
The F1 score of XGBoost 0.3565459610027855
The AUC of XGBoost 0.642224291362816

7、lightGBM

## lightGBM
gbm = lgb.LGBMClassifier()
gbm.fit(x_train_stand, y_train)
y_pre_gbm = gbm.predict(x_test_stand)
gbm_accuracy = accuracy_score(y_test, y_pre_gbm)
print('The accuracy of lightGBM', gbm_accuracy)
gbm_precision = precision_score(y_test, y_pre_gbm)
print('The precision of lightGBM', gbm_precision)
gbm_recall = recall_score(y_test, y_pre_gbm)
print('The recall of lightGBM', gbm_recall)
gbm_f1_score = recall_score(y_test, y_pre_gbm)
print('The F1 score of lightGBM', gbm_f1_score)
gbm_roc_auc_score = roc_auc_score(y_test, y_pre_gbm)
print('The AUC of lightGBM', gbm_roc_auc_score)

输出结果

The accuracy of lightGBM 0.7701471618780659
The precision of lightGBM 0.5688888888888889
The recall of lightGBM 0.3565459610027855
The F1 score of lightGBM 0.3565459610027855
The AUC of lightGBM 0.6328609954826662

8、绘图

y_score_lr = lr.predict_proba(x_test_stand)[:,1]
y_score_svm = svm.predict_proba(x_test_stand)[:,1]
y_score_rf = rfc.predict_proba(x_test_stand)[:,1]
y_score_dt = dt.predict_proba(x_test_stand)[:,1]
y_score_gbdt = gbdt.predict_proba(x_test_stand)[:,1]
y_score_xgb = xgb.predict_proba(x_test_stand)[:,1]
y_score_gbm = gbm.predict_proba(x_test_stand)[:,1]
fpr_lr,tpr_lr,thresholds_lr = roc_curve(y_test,y_score_lr,pos_label=1)
fpr_svm,tpr_svm,thresholds_svm = roc_curve(y_test,y_score_svm,pos_label=1)
fpr_rf,tpr_rf,thresholds_rf = roc_curve(y_test,y_score_rf,pos_label=1)
fpr_dt,tpr_dt,thresholds_dt = roc_curve(y_test,y_score_dt,pos_label=1)
fpr_gbdt,tpr_gbdt,thresholds_gbdt = roc_curve(y_test,y_score_gbdt,pos_label=1)
fpr_xgb,tpr_xgb,thresholds_xgb = roc_curve(y_test,y_score_xgb,pos_label=1)
fpr_gbm,tpr_gbm,thresholds_gbm = roc_curve(y_test,y_score_gbm,pos_label=1)
## roc 曲线
plt.figure(figsize=[6,6])
plt.plot(fpr_lr,tpr_lr, color='black')
plt.plot(fpr_svm,tpr_svm, color='red')
plt.plot(fpr_rf,tpr_rf, color='green')
plt.plot(fpr_dt,tpr_dt, color='blue')
plt.plot(fpr_gbdt,tpr_gbdt, color='yellow')
plt.plot(fpr_xgb,tpr_xgb, color='brown')
plt.plot(fpr_gbm,tpr_gbm, color='purple')
plt.title("ROC curve")
plt.xlabel("FPR")
plt.ylabel("TPR")
label = [ "LR Test - AUC:"+ str(round(lr_roc_auc_score,5)),
          "SVM Test - AUC:"+ str(round(svm_roc_auc_score,5)),
          "RF Test - AUC:"+ str(round(rf_roc_auc_score,5)),
          "DT Test - AUC:"+ str(round(dt_roc_auc_score,5)),
          "GBDT Test - AUC:"+ str(round(gbdt_roc_auc_score,5)),
          "XGBoost Test - AUC:"+ str(round(xgb_roc_auc_score,5)),
          "GBM Test - AUC:"+ str(round(gbm_roc_auc_score,5))
          ]
plt.legend(labels=label, loc="lower right")
plt.show()

输出结果
在这里插入图片描述