ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

目录

基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 1.1、查看部分数据

# 1.2、统计所有变量类型、个数等信息

# 2、数据预处理

# 2.1、变量筛选

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

# T2、手动分箱—利用自定义breaks_list参数即可

# 2.3、分析变量分箱后可视化—观察是否存在单调性

# 2.4、对变量执行woe分箱变换

# 3、模型训练

# 3.1、切分数据集

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

# 3.4、模型评估

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

# 4.2、线上模型评估—评分稳定性评估PSI


基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解

# 1、定义数据集

# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 
数据集UCI Machine Learning Repository: Data Set

# 1.1、查看部分数据

status.of.existing.checking.account duration.in.month credit.history purpose credit.amount savings.account.and.bonds present.employment.since installment.rate.in.percentage.of.disposable.income personal.status.and.sex other.debtors.or.guarantors present.residence.since property age.in.years other.installment.plans housing number.of.existing.credits.at.this.bank job number.of.people.being.liable.to.provide.maintenance.for telephone foreign.worker creditability
0 ... < 0 DM 6 critical account/ other credits existing (not at this bank) radio/television 1169 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 real estate 67 none own 2 skilled employee / official 1 yes, registered under the customers name yes good
1 0 <= ... < 200 DM 48 existing credits paid back duly till now radio/television 5951 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 real estate 22 none own 1 skilled employee / official 1 none yes bad
2 no checking account 12 critical account/ other credits existing (not at this bank) education 2096 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated none 3 real estate 49 none own 1 unskilled - resident 2 none yes good
3 ... < 0 DM 42 existing credits paid back duly till now furniture/equipment 7882 ... < 100 DM 4 <= ... < 7 years 2 male : divorced/separated guarantor 4 building society savings agreement/ life insurance 45 none for free 1 skilled employee / official 2 none yes good
4 ... < 0 DM 24 delay in paying off in the past car (new) 4870 ... < 100 DM 1 <= ... < 4 years 3 male : divorced/separated none 4 unknown / no property 53 none for free 2 skilled employee / official 2 none yes bad
5 no checking account 36 existing credits paid back duly till now education 9055 unknown/ no savings account 1 <= ... < 4 years 2 male : divorced/separated none 4 unknown / no property 35 none for free 1 unskilled - resident 2 yes, registered under the customers name yes good
6 no checking account 24 existing credits paid back duly till now furniture/equipment 2835 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 1 skilled employee / official 1 none yes good
7 0 <= ... < 200 DM 36 existing credits paid back duly till now car (used) 6948 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 35 none rent 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes good
8 no checking account 12 existing credits paid back duly till now radio/television 3059 ... >= 1000 DM 4 <= ... < 7 years 2 male : divorced/separated none 4 real estate 61 none own 1 unskilled - resident 1 none yes good
9 0 <= ... < 200 DM 30 critical account/ other credits existing (not at this bank) car (new) 5234 ... < 100 DM unemployed 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 28 none own 2 management/ self-employed/ highly qualified employee/ officer 1 none yes bad
10 0 <= ... < 200 DM 12 existing credits paid back duly till now car (new) 1295 ... < 100 DM ... < 1 year 3 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 25 none rent 1 skilled employee / official 1 none yes bad
11 ... < 0 DM 48 existing credits paid back duly till now business 4308 ... < 100 DM ... < 1 year 3 male : divorced/separated none 4 building society savings agreement/ life insurance 24 none rent 1 skilled employee / official 1 none yes bad
12 0 <= ... < 200 DM 12 existing credits paid back duly till now radio/television 1567 ... < 100 DM 1 <= ... < 4 years 1 male : divorced/separated none 1 car or other, not in attribute Savings account/bonds 22 none own 1 skilled employee / official 1 yes, registered under the customers name yes good
13 ... < 0 DM 24 critical account/ other credits existing (not at this bank) car (new) 1199 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 60 none own 2 unskilled - resident 1 none yes bad
14 ... < 0 DM 15 existing credits paid back duly till now car (new) 1403 ... < 100 DM 1 <= ... < 4 years 2 male : divorced/separated none 4 car or other, not in attribute Savings account/bonds 28 none rent 1 skilled employee / official 1 none yes good
15 ... < 0 DM 24 existing credits paid back duly till now radio/television 1282 100 <= ... < 500 DM 1 <= ... < 4 years 4 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 32 none own 1 unskilled - resident 1 none yes bad
16 no checking account 24 critical account/ other credits existing (not at this bank) radio/television 2424 unknown/ no savings account ... >= 7 years 4 male : divorced/separated none 4 building society savings agreement/ life insurance 53 none own 2 skilled employee / official 1 none yes good
17 ... < 0 DM 30 no credits taken/ all credits paid back duly business 8072 unknown/ no savings account ... < 1 year 2 male : divorced/separated none 3 car or other, not in attribute Savings account/bonds 25 bank own 3 skilled employee / official 1 none yes good
18 0 <= ... < 200 DM 24 existing credits paid back duly till now car (used) 12579 ... < 100 DM ... >= 7 years 4 male : divorced/separated none 2 unknown / no property 44 none for free 1 management/ self-employed/ highly qualified employee/ officer 1 yes, registered under the customers name yes bad
19 no checking account 24 existing credits paid back duly till now radio/television 3430 500 <= ... < 1000 DM ... >= 7 years 3 male : divorced/separated none 2 car or other, not in attribute Savings account/bonds 31 none own 1 skilled employee / official 2 yes, registered under the customers name yes good

# 1.2、统计所有变量类型、个数等信息

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                                                    Non-Null Count  Dtype   
---  ------                                                    --------------  -----   
 0   status.of.existing.checking.account                       1000 non-null   category
 1   duration.in.month                                         1000 non-null   int64   
 2   credit.history                                            1000 non-null   category
 3   purpose                                                   1000 non-null   object  
 4   credit.amount                                             1000 non-null   int64   
 5   savings.account.and.bonds                                 1000 non-null   category
 6   present.employment.since                                  1000 non-null   category
 7   installment.rate.in.percentage.of.disposable.income       1000 non-null   int64   
 8   personal.status.and.sex                                   1000 non-null   category
 9   other.debtors.or.guarantors                               1000 non-null   category
 10  present.residence.since                                   1000 non-null   int64   
 11  property                                                  1000 non-null   category
 12  age.in.years                                              1000 non-null   int64   
 13  other.installment.plans                                   1000 non-null   category
 14  housing                                                   1000 non-null   category
 15  number.of.existing.credits.at.this.bank                   1000 non-null   int64   
 16  job                                                       1000 non-null   category
 17  number.of.people.being.liable.to.provide.maintenance.for  1000 non-null   int64   
 18  telephone                                                 1000 non-null   category
 19  foreign.worker                                            1000 non-null   category
 20  creditability                                             1000 non-null   object  
dtypes: category(12), int64(7), object(2)
memory usage: 84.0+ KB

# 2、数据预处理

# 2.1、变量筛选

#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y

var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95,  
               identical_limit=0.95, var_rm=None, var_kp=None, 
               return_rm_reason=False, positive='bad|1')
'''
函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。
体参数如下:可跳到该函数查询
varrm:可设置强制保留的变量,默认为空;
varkp:可设置强制剔除的变量,默认为空;
return_rm_reason:可设置是否返回剔除原因,默认为不返回(False);
positive:可设置坏样本对应的值,默认为“bad|1”。
'''
age.in.years other.debtors.or.guarantors savings.account.and.bonds credit.amount installment.rate.in.percentage.of.disposable.income status.of.existing.checking.account credit.history present.employment.since purpose housing property other.installment.plans duration.in.month creditability
0 67 none unknown/ no savings account 1169 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own real estate none 6 0
1 22 none ... < 100 DM 5951 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own real estate none 48 1
2 49 none ... < 100 DM 2096 2 no checking account critical account/ other credits existing (not at this bank) 4 <= ... < 7 years education own real estate none 12 0
3 45 guarantor ... < 100 DM 7882 2 ... < 0 DM existing credits paid back duly till now 4 <= ... < 7 years furniture/equipment for free building society savings agreement/ life insurance none 42 0
4 53 none ... < 100 DM 4870 3 ... < 0 DM delay in paying off in the past 1 <= ... < 4 years car (new) for free unknown / no property none 24 1
5 35 none unknown/ no savings account 9055 2 no checking account existing credits paid back duly till now 1 <= ... < 4 years education for free unknown / no property none 36 0
6 53 none 500 <= ... < 1000 DM 2835 3 no checking account existing credits paid back duly till now ... >= 7 years furniture/equipment own building society savings agreement/ life insurance none 24 0
7 35 none ... < 100 DM 6948 2 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years car (used) rent car or other, not in attribute Savings account/bonds none 36 0
8 61 none ... >= 1000 DM 3059 2 no checking account existing credits paid back duly till now 4 <= ... < 7 years radio/television own real estate none 12 0
9 28 none ... < 100 DM 5234 4 0 <= ... < 200 DM critical account/ other credits existing (not at this bank) unemployed car (new) own car or other, not in attribute Savings account/bonds none 30 1
10 25 none ... < 100 DM 1295 3 0 <= ... < 200 DM existing credits paid back duly till now ... < 1 year car (new) rent car or other, not in attribute Savings account/bonds none 12 1
11 24 none ... < 100 DM 4308 3 ... < 0 DM existing credits paid back duly till now ... < 1 year business rent building society savings agreement/ life insurance none 48 1
12 22 none ... < 100 DM 1567 1 0 <= ... < 200 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 12 0
13 60 none ... < 100 DM 1199 4 ... < 0 DM critical account/ other credits existing (not at this bank) ... >= 7 years car (new) own car or other, not in attribute Savings account/bonds none 24 1
14 28 none ... < 100 DM 1403 2 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years car (new) rent car or other, not in attribute Savings account/bonds none 15 0
15 32 none 100 <= ... < 500 DM 1282 4 ... < 0 DM existing credits paid back duly till now 1 <= ... < 4 years radio/television own car or other, not in attribute Savings account/bonds none 24 1
16 53 none unknown/ no savings account 2424 4 no checking account critical account/ other credits existing (not at this bank) ... >= 7 years radio/television own building society savings agreement/ life insurance none 24 0
17 25 none unknown/ no savings account 8072 2 ... < 0 DM no credits taken/ all credits paid back duly ... < 1 year business own car or other, not in attribute Savings account/bonds bank 30 0
18 44 none ... < 100 DM 12579 4 0 <= ... < 200 DM existing credits paid back duly till now ... >= 7 years car (used) for free unknown / no property none 24 1
19 31 none 500 <= ... < 1000 DM 3430 3 no checking account existing credits paid back duly till now ... >= 7 years radio/television own car or other, not in attribute Savings account/bonds none 24 0

# 2.2、分析Woe变量分箱

# T1、自动分箱—利用woebin()函数

woebin(dt, y, x=None, 
           var_skip=None, breaks_list=None, special_values=None, 
           stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8, 
           # min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8, 
           positive="bad|1", no_cores=None, print_step=0, method="tree",
           ignore_const_cols=True, ignore_datetime_cols=True, 
           check_cate_num=True, replace_blank=True, 
           save_breaks_list=None, **kwargs)
'''
函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。
具体参数如下:可跳到该函数查询
var_skip: 设置需要跳过分箱操作的变量;
breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理;
special_values: 设置需要单独分箱的值,默认为空;
count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05;
stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1;
bin_num_limit: 该参数为整数,代表最大分箱数。
positive: 指定样本中正样本对应的标签,默认为"bad|1";
no_cores: 设置用于并行计算的 CPU 数目;
print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息;
method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree";
ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列;
ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列;
check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度;
replace_blank: 设置是否将空值填为None,默认为True。
'''

data_df_woebin['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,26.0) 190 0.19 110 80 0.421052632 0.528844129 0.057921024 0.130498542 26 FALSE
1 age.in.years [26.0,28.0) 101 0.101 74 27 0.267326733 -0.160930367 0.002528906 0.130498542 28 FALSE
2 age.in.years [28.0,35.0) 257 0.257 172 85 0.3307393 0.14245464 0.005359008 0.130498542 35 FALSE
3 age.in.years [35.0,37.0) 79 0.079 67 12 0.151898734 -0.872488109 0.048610052 0.130498542 37 FALSE
4 age.in.years [37.0,inf) 373 0.373 277 96 0.257372654 -0.212371454 0.016079553 0.130498542 inf FALSE

# T2、手动分箱—利用自定义breaks_list参数即可

data_df_woebin_DIY['age.in.years']

variable bin count count_distr good bad badprob woe bin_iv total_iv breaks is_special_values
0 age.in.years [-inf,25.0) 149 0.149 88 61 0.409395973 0.48083491 0.037321948 0.086291678 25 FALSE
1 age.in.years [25.0,35.0) 399 0.399 268 131 0.328320802 0.131508203 0.007076394 0.086291678 35 FALSE
2 age.in.years [35.0,45.0) 251 0.251 193 58 0.231075697 -0.354949318 0.029241063 0.086291678 45 FALSE
3 age.in.years [45.0,inf) 201 0.201 151 50 0.248756219 -0.257958971 0.012652273 0.086291678 inf FALSE

# 2.3、分析变量分箱后可视化—观察是否存在单调性

对各变量分箱的count distribution和bad probability进行可视化

# 2.4、对变量执行woe分箱变换

creditability savings.account.and.bonds_woe housing_woe age.in.years_woe other.debtors.or.guarantors_woe purpose_woe credit.amount_woe credit.history_woe installment.rate.in.percentage.of.disposable.income_woe other.installment.plans_woe present.employment.since_woe property_woe status.of.existing.checking.account_woe duration.in.month_woe
0 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 -0.461034959 0.614203978 -1.312186389
1 1 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 -0.461034959 0.614203978 1.134979933
2 0 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 -0.733740578 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
3 0 0.271357844 0.472604411 -0.257958971 0.005115101 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 -0.394415272 0.028573372 0.614203978 0.524524468
4 1 0.271357844 0.472604411 -0.257958971 -0.000525072 0.279920067 0.390539458 0.085157808 -0.064538521 -0.121178625 0.032103245 0.586082361 0.614203978 0.108688306
5 0 -0.762140052 0.472604411 -0.354949318 -0.000525072 0.279920067 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.586082361 -1.176263223 0.524524468
6 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 0.279920067 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
7 0 0.271357844 0.40444522 -0.354949318 -0.000525072 -0.805625164 0.390539458 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 0.524524468
8 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.190472769 -0.121178625 -0.394415272 -0.461034959 -1.176263223 -0.346624608
9 1 0.271357844 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 -0.733740578 0.157300289 -0.121178625 0.431137463 0.034191365 0.614203978 0.108688306
10 1 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 0.033661283 0.088318617 -0.064538521 -0.121178625 0.431137463 0.034191365 0.614203978 -0.346624608
11 1 0.271357844 0.40444522 0.48083491 -0.000525072 0.279920067 0.390539458 0.088318617 -0.064538521 -0.121178625 0.431137463 0.028573372 0.614203978 1.134979933
12 0 0.271357844 -0.194156014 0.48083491 -0.000525072 -0.410062817 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
13 1 0.271357844 -0.194156014 -0.257958971 -0.000525072 0.279920067 0.033661283 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.034191365 0.614203978 0.108688306
14 0 0.271357844 0.40444522 0.131508203 -0.000525072 0.279920067 -0.7282385 0.088318617 -0.190472769 -0.121178625 0.032103245 0.034191365 0.614203978 -0.346624608
15 1 0.13955188 -0.194156014 0.131508203 -0.000525072 -0.410062817 0.033661283 0.088318617 0.157300289 -0.121178625 0.032103245 0.034191365 0.614203978 0.108688306
16 0 -0.762140052 -0.194156014 -0.257958971 -0.000525072 -0.410062817 -0.258307464 -0.733740578 0.157300289 -0.121178625 -0.235566071 0.028573372 -1.176263223 0.108688306
17 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 0.279920067 0.390539458 1.234070835 -0.190472769 0.477550835 0.431137463 0.034191365 0.614203978 0.108688306
18 1 0.271357844 0.472604411 -0.354949318 -0.000525072 -0.805625164 1.170071253 0.088318617 0.157300289 -0.121178625 -0.235566071 0.586082361 0.614203978 0.108688306
19 0 -0.762140052 -0.194156014 0.131508203 -0.000525072 -0.410062817 -0.258307464 0.088318617 -0.064538521 -0.121178625 -0.235566071 0.034191365 -1.176263223 0.108688306

# 3、模型训练

# 3.1、切分数据集

train2woe输出如下所示

age.in.years_woe credit.amount_woe credit.history_woe creditability duration.in.month_woe housing_woe installment.rate.in.percentage.of.disposable.income_woe other.debtors.or.guarantors_woe other.installment.plans_woe present.employment.since_woe property_woe purpose_woe savings.account.and.bonds_woe status.of.existing.checking.account_woe
0 -0.257958971 0.033661283 -0.733740578 0 -1.312186389 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 -0.461034959 -0.410062817 -0.762140052 0.614203978
1 0.48083491 0.390539458 0.088318617 1 1.134979933 -0.194156014 -0.190472769 -0.000525072 -0.121178625 0.032103245 -0.461034959 -0.410062817 0.271357844 0.614203978
2 -0.257958971 -0.258307464 -0.733740578 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 0.279920067 0.271357844 -1.176263223
6 -0.257958971 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.028573372 0.279920067 -0.762140052 -1.176263223
7 -0.354949318 0.390539458 0.088318617 0 0.524524468 0.40444522 -0.190472769 -0.000525072 -0.121178625 0.032103245 0.034191365 -0.805625164 0.271357844 0.614203978
8 -0.257958971 -0.258307464 0.088318617 0 -0.346624608 -0.194156014 -0.190472769 -0.000525072 -0.121178625 -0.394415272 -0.461034959 -0.410062817 -0.762140052 -1.176263223
11 0.48083491 0.390539458 0.088318617 1 1.134979933 0.40444522 -0.064538521 -0.000525072 -0.121178625 0.431137463 0.028573372 0.279920067 0.271357844 0.614203978
13 -0.257958971 0.033661283 -0.733740578 1 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.034191365 0.279920067 0.271357844 0.614203978
16 -0.257958971 -0.258307464 -0.733740578 0 0.108688306 -0.194156014 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.028573372 -0.410062817 -0.762140052 -1.176263223
18 -0.354949318 1.170071253 0.088318617 1 0.108688306 0.472604411 0.157300289 -0.000525072 -0.121178625 -0.235566071 0.586082361 -0.805625164 0.271357844 0.614203978
19 0.131508203 -0.258307464 0.088318617 0 0.108688306 -0.194156014 -0.064538521 -0.000525072 -0.121178625 -0.235566071 0.034191365 -0.410062817 -0.762140052 -1.176263223

# 3.2、划分自变量和因变量

# 3.3、模型建立、训练、预测:建立逻辑回归模型

coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811
  0.         0.23090027 0.7965086  0.22792681 1.07066195 0.83836441
  0.72843684]]
intercept_: [-0.83437247]

# 3.4、模型评估

利用perf_eva函数进行评估

perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], 
 show_plot=True, positive="bad|1", seed=186)
'''
函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr
perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186)
perf_eva()函数可以从
'''

# 4、模型上线并监控

# 4.1、模型推理—计算信用得分

利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分

scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False)
'''
函数功能:概率进行映射,转换成评分卡得分
具体参数如下
bins:分箱信息。woebin()返回的结果。
model:模型对象。
points0:基础分,默认为600。 odds:好坏比,默认为1:19
pdo:比率翻番的倍数,默认为50。
basepoints_eq0:如果为True,则将基础分分散到每个变量中。
'''

print('card_dict_age.in.years \n',card_dict['age.in.years'])
print('card_dict_credit.amount \n',card_dict['credit.amount'])
print('card_dict_credit.historyt \n',card_dict['credit.history'])
print('card_dict_duration.in.month \n',card_dict['duration.in.month'])
print('card_dict_housing \n',card_dict['housing'])

card_dict_age.in.years 
         variable          bin  points
10  age.in.years  [-inf,25.0)   -12.0
11  age.in.years  [25.0,35.0)    -3.0
12  age.in.years  [35.0,45.0)     9.0
13  age.in.years   [45.0,inf)     6.0
card_dict_credit.amount 
          variable              bin  points
31  credit.amount    [-inf,1400.0)    -2.0
32  credit.amount  [1400.0,1800.0)    41.0
33  credit.amount  [1800.0,4000.0)    15.0
34  credit.amount  [4000.0,9200.0)   -22.0
35  credit.amount     [9200.0,inf)   -66.0
card_dict_credit.historyt 
           variable                                                bin  points
17  credit.history  no credits taken/ all credits paid back duly%,...   -51.0
18  credit.history           existing credits paid back duly till now    -4.0
19  credit.history                    delay in paying off in the past    -4.0
20  credit.history  critical account/ other credits existing (not ...    30.0
card_dict_duration.in.month 
              variable          bin  points
23  duration.in.month   [-inf,8.0)    85.0
24  duration.in.month   [8.0,16.0)    22.0
25  duration.in.month  [16.0,34.0)    -7.0
26  duration.in.month  [34.0,44.0)   -34.0
27  duration.in.month   [44.0,inf)   -74.0
card_dict_housing 
    variable       bin  points
42  housing      rent   -20.0
43  housing       own    10.0
44  housing  for free   -23.0

# 4.2、线上模型评估—评分稳定性评估PSI

# 利用scorecard_ply()函数计算train和test数据集的信用分数

scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True, 
 var_kp=None):
'''
函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。
    
dt:原始数据
card: 从`scorecard`生成的记分卡。
only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。
print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。
replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。
var_kp:强制保留变量的名称,如id列。 默认为无。
'''

猜你喜欢

转载自blog.csdn.net/qq_41185868/article/details/125400249