ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

# 1、定义数据集

# 1.1、加载德国信用卡数据集

#1.2、对各个变量进行EDA分析

# 1.3、输出连续型变量的mean、std、min、3种分位数、max

# 2、数据预处理

# 2.1、对类别型目标变量映射成数值型变量

# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

# 2.3、筛选特征:分别基于IV、empty、corr指标

# 2.4、分箱处理

# 2.5、利用badrate图进一步调整分箱

# 2.5.1、自定义调整分箱示例

# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

 # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

 # 2.6、对分箱后的数据进行WOE转换

# 2.7、特征选择

# 3、模型建立、训练、评估

# 3.1、切分训练集、测试集

# 3.2、模型训练

# 3.3、模型评估:F1、KS、AUC

# 4、模型上线评估,并计算信用分

# 4.1、评估变量的稳定性PSI:比较训练集和测试集

# 4.2、训练集等频分箱,观测每组的区别

# 4.3、评分卡分数变换


基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解

# 1、定义数据集

# 1.1、加载德国信用卡数据集

将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 https://archive.ics.uci.edu/ml/datasets/Statlog+

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income personal.status.and.sex other.debtors.or.guarantors present.residence.since property age.in.years other.installment.plans housing number.of.existing.credits.at.this.bank job number.of.people.being.liable.to.provide.maintenance.for telephone foreign.worker creditability
0 ... < 0 DM 6 critical account/ other credits existing (not at this bank) radio/television 1169 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 real estate 67 none own 2 skilled employee / official 1 yes, registered under the customers name yes good
1 0 <= ... < 200 DM 48 existing credits paid back duly till now radio/television 5951 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 real estate 22 none own 1 skilled employee / official 1 none yes bad
2 no checking account 12 critical account/ other credits existing (not at this bank) education 2096 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated none 3 real estate 49 none own 1 unskilled - resident 2 none yes good
3 ... < 0 DM 42 existing credits paid back duly till now furniture/equipment 7882 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated guarantor 4 building society savings agreement/ life insurance 45 none for free 1 skilled employee / official 2 none yes good
4 ... < 0 DM 24 delay in paying off in the past car (new) 4870 ... < 100 DM 1 <= ... < 4 years 3 male : divorced/separated none 4 unknown / no property 53 none for free 2 skilled employee / official 2 none yes bad
5 no checking account 36 existing credits paid back duly till now education 9055 unknown/ no savings account 1 <= ... < 4 years 2 male : divorced/separated none 4 unknown / no property 35 none for free 1 unskilled - resident 2 yes, registered under the customers name yes good
6 no checking account 24 existing credits paid back duly till now furniture/equipment 2835 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 1 skilled employee / official 1 none yes good
7 0 <= ... < 200 DM 36 existing credits paid back duly till now car (used) 6948 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 35 none rent 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes good
8 no checking account 12 existing credits paid back duly till now radio/television 3059 ... >= 1000 DM 4 <= ... < 7 years 2 male : divorced/separated none 4 real estate 61 none own 1 unskilled - resident 1 none yes good
9 0 <= ... < 200 DM 30 critical account/ other credits existing (not at this bank) car (new) 5234 ... < 100 DM unemployed 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 28 none own 2 management/ self-employed/ highly qualified employee/ officer 1 none yes bad
10 0 <= ... < 200 DM 12 existing credits paid back duly till now car (new) 1295 ... < 100 DM ... < 1 year 3 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 25 none rent 1 skilled employee / official 1 none yes bad
11 ... < 0 DM 48 existing credits paid back duly till now business 4308 ... < 100 DM ... < 1 year 3 male : divorced/separated none 4 building society savings agreement/ life insurance 24 none rent 1 skilled employee / official 1 none yes bad
12 0 <= ... < 200 DM 12 existing credits paid back duly till now radio/television 1567 ... < 100 DM 1 <= ... < 4 years 1 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 22 none own 1 skilled employee / official 1 yes, registered under the customers name yes good
13 ... < 0 DM 24 critical account/ other credits existing (not at this bank) car (new) 1199 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 60 none own 2 unskilled - resident 1 none yes bad
14 ... < 0 DM 15 existing credits paid back duly till now car (new) 1403 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 28 none rent 1 skilled employee / official 1 none yes good
15 ... < 0 DM 24 existing credits paid back duly till now radio/television 1282 100 <= ... < 500 DM 1 <= ... < 4 years 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 32 none own 1 unskilled - resident 1 none yes bad
16 no checking account 24 critical account/ other credits existing (not at this bank) radio/television 2424 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 2 skilled employee / official 1 none yes good
17 ... < 0 DM 30 no credits taken/ all credits paid back duly business 8072 unknown/ no savings account ... < 1 year 2 male : divorced/separated none 3 car or other, not in attribute Savings account/bonds 25 bank own 3 skilled employee / official 1 none yes good
18 0 <= ... < 200 DM 24 existing credits paid back duly till now car (used) 12579 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 2 unknown / no property 44 none for free 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes bad
19 no checking account 24 existing credits paid back duly till now radio/television 3430 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 31 none own 1 skilled employee / official 2 yes, registered under the customers name yes good

#1.2、对各个变量进行EDA分析

# 数值型变量:数据类型、缺失率、唯一值、均值、标准差、分位数等,
# 分类型变量:数据类型、缺失率、唯一值、top1(占比第一的数据类)等。

type size missing unique mean_or_top1 std_or_top2 min_or_top3 1%_or_top4 10%_or_top5 50%_or_bottom5 75%_or_bottom4 90%_or_bottom3 99%_or_bottom2 max_or_bottom1
status.of.existing.checking.account category 1000 0.00% 4 no checking account:39.40% ... < 0 DM:27.40% 0 <= ... < 200 DM:26.90% ... >= 200 DM / salary assignments for at least 1 year:6.30% no checking account:39.40% ... < 0 DM:27.40% 0 <= ... < 200 DM:26.90% ... >= 200 DM / salary assignments for at least 1 year:6.30%
duration.in.month int64 1000 0.00% 33 20.903 12.05881445 4 6 9 18 24 36 60 72
credit.history category 1000 0.00% 5 existing credits paid back duly till now:53.00% critical account/ other credits existing (not at this bank):29.30% delay in paying off in the past:8.80% all credits at this bank paid back duly:4.90% no credits taken/ all credits paid back duly:4.00% existing credits paid back duly till now:53.00% critical account/ other credits existing (not at this bank):29.30% delay in paying off in the past:8.80% all credits at this bank paid back duly:4.90% no credits taken/ all credits paid back duly:4.00%
purpose object 1000 0.00% 10 radio/television:28.00% car (new):23.40% furniture/equipment:18.10% car (used):10.30% business:9.70% education:5.00% repairs:2.20% domestic appliances:1.20% others:1.20% retraining:0.90%
credit.amount int64 1000 0.00% 921 3271.258 2822.736876 250 425.83 932 2319.5 3972.25 7179.4 14180.39 18424
savings.account.and.bonds category 1000 0.00% 5 ... < 100 DM:60.30% unknown/ no savings account:18.30% 100 <= ... < 500 DM:10.30% 500 <= ... < 1000 DM:6.30% ... >= 1000 DM:4.80% ... < 100 DM:60.30% unknown/ no savings account:18.30% 100 <= ... < 500 DM:10.30% 500 <= ... < 1000 DM:6.30% ... >= 1000 DM:4.80%
present.employment.since category 1000 0.00% 5 1 <= ... < 4 years:33.90% ... >= 7 years:25.30% 4 <= ... < 7 years:17.40% ... < 1 year:17.20% unemployed:6.20% 1 <= ... < 4 years:33.90% ... >= 7 years:25.30% 4 <= ... < 7 years:17.40% ... < 1 year:17.20% unemployed:6.20%
installment.rate.in.percentage.of.disposable.income int64 1000 0.00% 4 2.973 1.118714674 1 1 1 3 4 4 4 4
personal.status.and.sex category 1000 0.00% 4 male : single:54.80% female : divorced/separated/married:31.00% male : married/widowed:9.20% male : divorced/separated:5.00% female : single:0.00% male : single:54.80% female : divorced/separated/married:31.00% male : married/widowed:9.20% male : divorced/separated:5.00% female : single:0.00%
other.debtors.or.guarantors category 1000 0.00% 3 none:90.70% guarantor:5.20% co-applicant:4.10% none:90.70% guarantor:5.20% co-applicant:4.10%
present.residence.since int64 1000 0.00% 4 2.845 1.103717896 1 1 1 3 4 4 4 4
property category 1000 0.00% 4 car or other, not in attribute Savings account/bonds:33.20% real estate:28.20% building society savings agreement/ life insurance:23.20% unknown / no property:15.40% car or other, not in attribute Savings account/bonds:33.20% real estate:28.20% building society savings agreement/ life insurance:23.20% unknown / no property:15.40%
age.in.years int64 1000 0.00% 53 35.546 11.37546857 19 20 23 33 42 52 67.01 75
other.installment.plans category 1000 0.00% 3 none:81.40% bank:13.90% stores:4.70% none:81.40% bank:13.90% stores:4.70%
housing category 1000 0.00% 3 own:71.30% rent:17.90% for free:10.80% own:71.30% rent:17.90% for free:10.80%
number.of.existing.credits.at.this.bank int64 1000 0.00% 4 1.407 0.577654468 1 1 1 1 2 2 3 4
job category 1000 0.00% 4 skilled employee / official:63.00% unskilled - resident:20.00% management/ self-employed/ highly qualified employee/ officer:14.80% unemployed/ unskilled - non-resident:2.20% skilled employee / official:63.00% unskilled - resident:20.00% management/ self-employed/ highly qualified employee/ officer:14.80% unemployed/ unskilled - non-resident:2.20%
number.of.people.being.liable.to.provide.maintenance.for int64 1000 0.00% 2 1.155 0.362085772 1 1 1 1 1 2 2 2
telephone category 1000 0.00% 2 none:59.60% yes, registered under the customers name:40.40% none:59.60% yes, registered under the customers name:40.40%
foreign.worker category 1000 0.00% 2 yes:96.30% no:3.70% yes:96.30% no:3.70%
creditability object 1000 0.00% 2 good:70.00% bad:30.00% good:70.00% bad:30.00%

# 1.3、输出连续型变量的mean、std、min、3种分位数、max

duration.in.month credit.amount installment.rate.in.percentage.of.disposable.income present.residence.since age.in.years number.of.existing.credits.at.this.bank number.of.people.being.liable.to.provide.maintenance.for
mean 20.903 3271.258 2.973 2.845 35.546 1.407 1.155
std 12.05881445 2822.736876 1.118714674 1.103717896 11.37546857 0.577654468 0.362085772
min 4 250 1 1 19 1 1
25% 12 1365.5 2 2 27 1 1
50% 18 2319.5 3 3 33 1 1
75% 24 3972.25 4 4 42 2 1
max 72 18424 4 4 75 4 2

# 2、数据预处理

# 2.1、对类别型目标变量映射成数值型变量

# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等

iv gini entropy unique
creditability 12.22649152 0 0 2
status.of.existing.checking.account 0.666011503 0.368037204 0.545196341 4
duration.in.month 0.354783574 0.406755043 0.609659161 33
credit.amount 0.351454966 0.408679834 0.610864302 921
credit.history 0.293233547 0.394089613 0.580630747 5
age.in.years 0.21119662 0.41433928 0.610863206 53
savings.account.and.bonds 0.196009557 0.40483845 0.591376694 5
purpose 0.169195066 0.405990292 0.593609415 10
property 0.112638262 0.410037788 0.599091068 4
present.employment.since 0.086433631 0.412285325 0.601782464 5
housing 0.083293434 0.412356067 0.602024467 3
other.installment.plans 0.057614542 0.414607541 0.604712572 3
foreign.worker 0.043877412 0.417170441 0.606828112 2
other.debtors.or.guarantors 0.032019322 0.417208946 0.607539261 3
installment.rate.in.percentage.of.disposable.income 0.02632209 0.417699747 0.60811103 4
number.of.existing.credits.at.this.bank 0.013266524 0.418878097 0.609493027 4
personal.status.and.sex 0.008839919 0.419238171 0.609944287 4
job 0.008762766 0.419208234 0.609937317 4
telephone 0.006377605 0.419441491 0.610196344 2
present.residence.since 0.003588773 0.419685295 0.610488269 4
number.of.people.being.liable.to.provide.maintenance.for 4.34E-05 0.419996182 0.61085975 2

# 2.3、筛选特征:分别基于IV、empty、corr指标

drop_cols: 
 {'empty': array([], dtype=float64), 'iv': array(['personal.status.and.sex', 'present.residence.since',
       'number.of.existing.credits.at.this.bank', 'job',
       'number.of.people.being.liable.to.provide.maintenance.for',
       'telephone'], dtype=object), 'corr': array([], dtype=object)}

# 2.4、分箱处理

对数值型变量和分类型变量进行分箱,分箱方法支持卡方chi、决策树、百分位、等频、等距分箱

data_df_s2bins_dict: 
 {'status.of.existing.checking.account': [['no checking account'], ['... >= 200 DM / salary assignments for at least 1 year'], ['0 <= ... < 200 DM'], ['... < 0 DM']], 'duration.in.month': [9, 12, 13, 16, 36, 45], 'credit.history': [['critical account/ other credits existing (not at this bank)'], ['delay in paying off in the past', 'existing credits paid back duly till now'], ['all credits at this bank paid back duly', 'no credits taken/ all credits paid back duly']], 'purpose': [['retraining', 'car (used)'], ['radio/television'], ['furniture/equipment'], ['domestic appliances', 'business', 'repairs'], ['car (new)'], ['others', 'education']], 'credit.amount': [3556], 'savings.account.and.bonds': [['... >= 1000 DM', '500 <= ... < 1000 DM', 'unknown/ no savings account'], ['100 <= ... < 500 DM'], ['... < 100 DM']], 'present.employment.since': [['4 <= ... < 7 years'], ['... >= 7 years'], ['1 <= ... < 4 years'], ['unemployed'], ['... < 1 year']], 'installment.rate.in.percentage.of.disposable.income': [2, 3, 4], 'other.debtors.or.guarantors': [['guarantor', 'none', 'co-applicant']], 'property': [['real estate'], ['building society savings agreement/ life insurance'], ['car or other, not in attribute Savings account/bonds'], ['unknown / no property']], 'age.in.years': [26, 35, 37, 49], 'other.installment.plans': [['none'], ['stores', 'bank']], 'housing': [['own'], ['rent'], ['for free']], 'foreign.worker': [['no', 'yes']], 'creditability': [['good'], ['bad']]}

# 2.5、利用badrate图进一步调整分箱

# 2.5.1、自定义调整分箱示例

# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

 

 # 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

 # 2.6、对分箱后的数据进行WOE转换

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income other.debtors.or.guarantors property age.in.years other.installment.plans housing foreign.worker creditability creditability_map
0 0.818098706 -1.280933845 -0.733740578 -0.410062817 -0.153492135 -0.762140052 -0.235566071 0.157300289 0 -0.461034959 -0.194156014 -0.121178625 -0.194156014 0 -5.703782475 0
1 0.401391783 1.134979933 0.087868755 -0.410062817 0.31563815 0.271357844 0.032103245 -0.155466469 0 -0.461034959 0.48083491 -0.121178625 -0.194156014 0 6.551080335 1
2 -1.176263223 -0.128416292 -0.733740578 0.587786665 -0.153492135 0.271357844 -0.394415272 -0.155466469 0 -0.461034959 -0.266352306 -0.121178625 -0.194156014 0 -5.703782475 0
3 0.818098706 0.524524468 0.087868755 0.095556516 0.31563815 0.271357844 -0.394415272 -0.155466469 0 0.028573372 -0.266352306 -0.121178625 0.472604411 0 -5.703782475 0
4 0.818098706 0.108688306 0.087868755 0.359200488 0.31563815 0.271357844 0.032103245 -0.064538521 0 0.586082361 -0.266352306 -0.121178625 0.472604411 0 6.551080335 1
5 -1.176263223 0.524524468 0.087868755 0.587786665 0.31563815 -0.762140052 0.032103245 -0.155466469 0 0.586082361 -0.044353168 -0.121178625 0.472604411 0 -5.703782475 0
6 -1.176263223 0.108688306 0.087868755 0.095556516 -0.153492135 -0.762140052 -0.235566071 -0.064538521 0 0.028573372 -0.266352306 -0.121178625 -0.194156014 0 -5.703782475 0
7 0.401391783 0.524524468 0.087868755 -0.805625164 0.31563815 0.271357844 0.032103245 -0.155466469 0 0.034191365 -0.044353168 -0.121178625 0.40444522 0 -5.703782475 0
8 -1.176263223 -0.128416292 0.087868755 -0.410062817 -0.153492135 -0.762140052 -0.394415272 -0.155466469 0 -0.461034959 -0.266352306 -0.121178625 -0.194156014 0 -5.703782475 0
9 0.401391783 0.108688306 -0.733740578 0.359200488 0.31563815 0.271357844 0.31923043 0.157300289 0 0.034191365 -0.044353168 -0.121178625 -0.194156014 0 6.551080335 1
10 0.401391783 -0.128416292 0.087868755 0.359200488 -0.153492135 0.271357844 0.470820289 -0.064538521 0 0.034191365 -0.044353168 -0.121178625 0.40444522 0 6.551080335 1
11 0.818098706 1.134979933 0.087868755 0.233288 0.31563815 0.271357844 0.470820289 -0.064538521 0 0.028573372 0.48083491 -0.121178625 0.40444522 0 6.551080335 1
12 0.401391783 -0.128416292 0.087868755 -0.410062817 -0.153492135 0.271357844 0.032103245 -0.251314428 0 0.034191365 0.48083491 -0.121178625 -0.194156014 0 -5.703782475 0
13 0.818098706 0.108688306 -0.733740578 0.359200488 -0.153492135 0.271357844 -0.235566071 0.157300289 0 0.034191365 -0.266352306 -0.121178625 -0.194156014 0 6.551080335 1
14 0.818098706 -0.665290226 0.087868755 0.359200488 -0.153492135 0.271357844 0.032103245 -0.155466469 0 0.034191365 -0.044353168 -0.121178625 0.40444522 0 -5.703782475 0
15 0.818098706 0.108688306 0.087868755 -0.410062817 -0.153492135 0.13955188 0.032103245 0.157300289 0 0.034191365 -0.044353168 -0.121178625 -0.194156014 0 6.551080335 1
16 -1.176263223 0.108688306 -0.733740578 -0.410062817 -0.153492135 -0.762140052 -0.235566071 0.157300289 0 0.028573372 -0.266352306 -0.121178625 -0.194156014 0 -5.703782475 0
17 0.818098706 0.108688306 1.234070835 0.233288 0.31563815 -0.762140052 0.470820289 -0.155466469 0 0.034191365 -0.044353168 0.477550835 -0.194156014 0 -5.703782475 0
18 0.401391783 0.108688306 0.087868755 -0.805625164 0.31563815 0.271357844 -0.235566071 0.157300289 0 0.586082361 -0.044353168 -0.121178625 0.472604411 0 6.551080335 1
19 -1.176263223 0.108688306 0.087868755 -0.410062817 -0.153492135 -0.762140052 -0.235566071 -0.064538521 0 0.034191365 -0.044353168 -0.121178625 -0.194156014 0 -5.703782475 0

# 2.7、特征选择

# 通过向前、向后、双向选择来进行特征选择,使用aic、bic、ks、auc 作为选择标准

final_data: 
 (1000, 3)
final_data: 
 Index(['status.of.existing.checking.account', 'creditability',
       'creditability_map'],
      dtype='object')

# 3、模型建立、训练、评估

# 3.1、切分训练集、测试集

# 3.2、模型训练

# 3.3、模型评估:F1、KS、AUC

# 4、模型上线评估,并计算信用分

# 4.1、评估变量的稳定性PSI:比较训练集和测试集

cal PSI 0.012897491574571578

# 4.2、训练集等频分箱,观测每组的区别

min max bads goods total bad_rate good_rate odds bad_prop good_prop total_prop cum_bad_rate cum_bad_rate_rev cum_bads_prop cum_bads_prop_rev cum_goods_prop cum_goods_prop_rev cum_total_prop cum_total_prop_rev ks lift
0 0.000194976 0.000204106 0 292 292 0 1 0 0 0.5583174 0.389333333 0 0.302666667 0 1 0.5583174 1 0.389333333 1 0.5583174 1
1 0.000214122 0.000214122 0 125 125 0 1 0 0 0.239005736 0.166666667 0 0.495633188 0 1 0.797323136 0.4416826 0.556 0.610666667 0.797323136 1.637554585
2 0.000219486 0.000219486 0 106 106 0 1 0 0 0.202676864 0.141333333 0 0.681681682 0 1 1 0.202676864 0.697333333 0.444 1 2.252252252
3 0.999484936 0.99950797 48 0 48 1 0 inf 0.211453744 0 0.064 0.084063047 1 0.211453744 1 1 0 0.761333333 0.302666667 0.788546256 3.303964758
4 0.99953098 0.99953098 78 0 78 1 0 inf 0.343612335 0 0.104 0.194144838 1 0.555066079 0.788546256 1 0 0.865333333 0.238666667 0.444933921 3.303964758
5 0.999542439 0.999542439 101 0 101 1 0 inf 0.444933921 0 0.134666667 0.302666667 1 1 0.444933921 1 0 1 0.134666667 0 3.303964758

# 4.3、评分卡分数变换

name value score
0 status.of.existing.checking.account no checking account 261.94
1 status.of.existing.checking.account ... >= 200 DM / salary assignments for at least 1 year 258.87
2 status.of.existing.checking.account 0 <= ... < 200 DM 255.66
3 status.of.existing.checking.account ... < 0 DM 254.01
4 creditability good 744.2
5 creditability bad -302.01

猜你喜欢

转载自blog.csdn.net/qq_41185868/article/details/125418213