任务六

模型融合(Stacking)
一、查看默认参数下七个模型的评估结果,代码见任务四上篇文章:
AUC Accuracy F1-score Precision Recall
随机森林 训练集:90.82%;测试集:79.88% 训练集:84.33%;测试集:79.60% 训练集:58.07%;测试集:46.69% 训练集:43.59%;测试集:34.68% 训练集:86.96%;测试集:71.43%
GBDT 训练集:87.87%;测试集:79.15% 训练集:84.26%;测试集:78.78% 训练集:57.17%;测试集:44.01% 训练集:42.18%;测试集:32.37% 训练集:88.68%;测试集:68.71%
XGBoost 训练集:90.41%;测试集:79.28% 训练集:85.06%;测试集:79.23% 训练集:63.03%;测试集:49.18% 训练集:51.15%;测试集:39.02% 训练集:82.10%;测试集:66.50%
LightGBM 训练集:86.70%;测试集:79.53% 训练集:82.41%;测试集:78.93% 训练集:49.77%;测试集:41.41% 训练集:35.00%;测试集:28.90% 训练集:86.12%;测试集:72.99%
逻辑回归 训练集:76.33%;测试集:78.34% 训练集:78.77%;测试集:78.70% 训练集:37.68%;测试集:38.89% 训练集:25.77%;测试集:26.30% 训练集:70.03%;测试集:74.59%
SVM 训练集:80.23%;测试集:74.26% 训练集:80.82%;测试集:77.96% 训练集:43.14%;测试集:34.80% 训练集:29.23%;测试集:22.83% 训练集:82.31%;测试集:73.15%
决策树 训练集:76.63%;测试集:74.19% 训练集:79.29%;测试集:77.14% 训练集:46.41%;测试集:43.46% 训练集:36.03%;测试集:34.10% 训练集:65.20%;测试集:59.90%
四个集成模型(随机森林、GBDT、XGBoost、LightGBM)和逻辑回归明显效果较好,将随机森林、GBDT、逻辑回归、LightGBM作为基础模型,XGBoost作为第二层模型。训练参数暂时采用默认参数。
rnd_clf = RandomForestClassifier(random_state=2018)
gbdt = GradientBoostingClassifier(random_state=2018)
xgb = XGBClassifier(random_state=2018)
lgbm = LGBMClassifier(random_state=2018)
log = LogisticRegression(random_state=2018, max_iter=1000)
svc = SVC(random_state=2018, probability=True)
tree = DecisionTreeClassifier(random_state=2018
base_models = [rnd_clf, gbdt, lgbm, log]
next_train, next_test = get_stacking_data(base_models, X_train, y_train, X_test, y_test, k=10)
二、stacking融合模型训练及评估
stacking_model= XGBClassifier(random_state=2018)
stacking_model.fit(next_test, y_test)
XGBClassifier(base_score=0.5, booster=‘gbtree’, colsample_bylevel=1,
colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective=‘binary:logistic’,
random_state=2018, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=True, subsample=1)
AUC Accuracy F1-score Precision Recall
融合(Stacking)模型 训练集:64.89%测试集:79.04% 训练集:78.58%;测试集:83.54% 训练集:39.06%;测试集:59.15% 训练集:27.56%;测试集:46.24% 训练集:66.98%;测试集:82.05%
五、总结
相比于其他模型,融合模型除了AUC值外的其他指标值都有所上升,而且是在使用默认参数的情况下。相信经过调参后,效果会有进一步的提升。

猜你喜欢

转载自blog.csdn.net/weixin_41741008/article/details/88530082