金融贷款逾期的模型构建2——集成模型

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012736685/article/details/85088360

任务——模型构建

构建随机森林、GBDT、XGBoost和LightGBM这4个模型,并对每一个模型进行评分,评分方式任意,例如准确度和auc值。

1、相关安装资源

Tips:若 pip 安装过程中,网速、超时等 ==》换源

sudo pip install -i http://pypi.douban.com/simple/ --trusted-host=pypi.douban.com/simple lightgbm

2、数据读取 + 标准化

import pandas as pd
from sklearn.model_selection import train_test_split
import xgboost as xgb
import lightgbm as lgb
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor
import warnings
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings(action ='ignore', category = DeprecationWarning)

## 读取数据
data = pd.read_csv("data_all.csv")
x = data.drop(labels='status', axis=1)
y = data['status']
x_train, x_test, y_train, y_test = train_test_split(x, y,test_size=0.3,random_state=2018)
print(len(x))  # 4754

## 数据标准化
scaler = StandardScaler()
scaler.fit(x_train)
x_train_stand = scaler.transform(x_train)
x_test_stand = scaler.transform(x_test)

3、 随机森林模型

思想:通过 Bagging 的思想将多棵树集成的一种算法,它的基本单元是决策树。

rfc = RandomForestClassifier()
rfc.fit(x_train, y_train)
rfc_score = rfc.score(x_test, y_test)
print("The score of RF:",rfc_score)

rfc1 = RandomForestClassifier()
rfc1.fit(x_train_stand, y_train)
rfc1_score = rfc1.score(x_test_stand, y_test)
print("The score of RF(with preprocessing):",rfc1_score)

输出结果

The score of RF: 0.7638402242466713
The score of RF(with preprocessing): 0.7652417659425368

4、GBDT模型

GBDT 的全称是 Gradient Boosting Decision Tree,梯度下降树。
思想:通过损失函数的负梯度来拟合

gbdt = GradientBoostingRegressor()
gbdt.fit(x_train, y_train)
gbdt_score = gbdt.score(x_test, y_test)
print("The score of GBDT:",gbdt_score)

输出结果:

The score of GBDT: 0.18118075405980671

5、XGBoost模型

xgb = xgb.XGBClassifier()
xgb.fit(x_train, y_train)
xgb_score = xgb.score(x_test, y_test)
print("The score of XGBoost:", xgb_score)

输出结果

The score of XGBoost: 0.7855641205325858

遇到的问题

DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:

==》经过在网上查找问题发现:这是一个numpy问题,在空数组上弃用了真值检查。该问题numpy已经修复。
==》解决方案1:忽略警告2

import warnings
warnings.filterwarnings(action ='ignore', category = DeprecationWarning)

6、lightGBM

思想:LightGBM 是一个梯度 boosting 框架,使用基于学习算法的决策树。它可以说是分布式的,高效的,有以下优势:
更快的训练效率 低内存使用 更高的准确率 支持并行化学习 可处理大规模数据

gbm = lgb.LGBMRegressor()
gbm.fit(x_train, y_train)
gbm_score = gbm.score(x_test, y_test)
print("The score of LightGBM:", gbdt_score)

输出结果

The score of LightGBM: 0.18118075405980671

猜你喜欢

转载自blog.csdn.net/u012736685/article/details/85088360
今日推荐