从0到1建立一张评分卡之模型建立

　评分卡建模常用逻辑回归模型，将逻辑回归输出的概率值映射成分数，最后得到标准评分卡。关于评分卡映射的逻辑，可以看之前的文章逻辑回归评分卡映射逻辑。
　下面接着分箱之后的数据开始建模。首先将每个变量分箱的结果进行转换，得到每个变量对应的woe结果表。

# 变量woe结果表
def woe_df_concat(bin_df):
    """
    bin_df:list形式，里面存储每个变量的分箱结果
    
    return :woe结果表
    """
    woe_df_list =[]
    for df in bin_df:
        woe_df = df.reset_index().assign(col=df.index.name).rename(columns={df.index.name:'bin'})
        woe_df_list.append(woe_df)
    woe_result = pd.concat(woe_df_list,axis=0)
    # 为了便于查看，将字段名列移到第一列的位置上
    woe_result1 = woe_result['col']
    woe_result2 = woe_result.iloc[:,:-1]
    woe_result_df = pd.concat([woe_result1,woe_result2],axis=1)
    woe_result_df = woe_result_df.reset_index(drop=True)
    return woe_result_df

df_woe_cat=woe_df_concat(bin_df_cat)
df_woe_num=woe_df_concat(bin_df_num)
df_woe=pd.concat([df_woe_cat,df_woe_num],axis=0)

　这一步的目的是将变量分箱之后的结果进行整理汇总，看一下变量分箱之后的大致情况。

　整理之后可以看到每个变量的分箱情况、每一箱的好坏占比、WOE、IV值。注意，最好每一箱的WOE值不要超过1。附上检验WOE值是否大于1的代码。

# 检查某个区间的woe是否大于1
def woe_large(bin_df):
    """
    bin_df:list形式，里面存储每个变量的分箱结果
    
    return:
    woe_large_col: 某个区间woe大于1的变量，list集合
    woe_judge_df :df形式，每个变量的检验结果
    """
    woe_large_col=[]
    col_list =[]
    woe_judge =[]
    for woe_df in bin_df:
        col_name = woe_df.index.name
        woe_list = list(woe_df.woe)
        woe_large = list(filter(lambda x:x>=1,woe_list))
        if len(woe_large)>0:
            col_list.append(col_name)
            woe_judge.append('True')
            woe_large_col.append(col_name)
        else:
            col_list.append(col_name)
            woe_judge.append('False')
    woe_judge_df = pd.DataFrame({'col':col_list,
                                 'judge_large':woe_judge})
    return woe_large_col,woe_judge_df

　接着就是将变量的值映射为变量的WOE值准备入模。

# woe转换
def woe_transform(df,target,df_woe):
    """
    df:数据集
    target:目标变量的字段名
    df_woe:woe结果表
    
    return:woe转化之后的数据集
    """
    df2 = df.copy()
    for col in df2.drop([target],axis=1).columns:
        x = df2[col]
        bin_map = df_woe[df_woe.col==col]
        bin_res = np.array([0]*x.shape[0],dtype=float)
        for i in bin_map.index:
            lower = bin_map['min_bin'][i]
            upper = bin_map['max_bin'][i]
            if lower == upper:
                x1 = x[np.where(x == lower)[0]]
            else:
                x1 = x[np.where((x>=lower)&(x<=upper))[0]]
            mask = np.in1d(x,x1)
            bin_res[mask] = bin_map['woe'][i]
        bin_res = pd.Series(bin_res,index=x.index)
        bin_res.name = x.name
        df2[col] = bin_res
    return df2

　以上是转化后的数据，所有变量的值都转化成了对应的WOE值，后面就是进行建模。

feature_list=num_features+cat_features
x = df_train[feature_list]
y = df_train['y']

lr_model = LogisticRegression(C=0.1)
lr_model.fit(x,y)
df_train['prob'] = lr_model.predict_proba(x)[:,1]

　建模的代码很简单，以上几行就完成逻辑回归建模的过程了。然后进行评分映射。以下代码需要理解评分卡分数映射的逻辑。

# 评分卡刻度 
def cal_scale(score,odds,PDO,model):
    """
    odds：设定的坏好比
    score:在这个odds下的分数
    PDO: 好坏翻倍比
    model:逻辑回归模型
    
    return :A,B,base_score
    """
    B = PDO/np.log(2)
    A = score+B*np.log(odds)
    # base_score = A+B*model.intercept_[0]
    print('B: {:.2f}'.format(B))
    print('A: {:.2f}'.format(A))
    # print('基础分为：{:.2f}'.format(base_score))
    return A,B
cal_scale(50,0.05,10,lr_model)

　假定在5%的好坏比之下的分值为50分，PDO为10分，计算A和B两个参数。A为14.43，B为6.78。

def Prob2Score(prob, A,B):
    #将概率转化成分数且为正整数
    y = np.log(prob/(1-prob))
    return float(A-B*y)
df_train['score'] = df_train['prob'].map(lambda x:Prob2Score(x,6.78,14.43))

　分数的计算公式即A-B*log(odds)。到此评分卡就建立完成了，后续还需要对模型进行评估、对评分卡的分数进行监控，在之前的文章中已经写过。

　评分卡系列是我自己从网上找的代码和数据集，一点点实现了一遍，记录一下自己学习的过程。后续会找一份机器学习模型的代码练习。作为初学者，本文理解尚有不到位之处，欢迎大家多多指正。量化成长轨迹，共同交流与成长。

【作者】：Labryant
【原创公众号】：风控猎人
【简介】：某创业公司策略分析师，积极上进，努力提升。乾坤未定，你我都是黑马。
【转载说明】：转载请说明出处，谢谢合作！~

从0到1建立一张评分卡之模型建立

猜你喜欢