Credit score card (WOE and IV values) _Supplement

Customer life cycle: used to describe the stages that customers undergo when they receive different products or services, including the consideration stage, the purchase stage, and the post-purchase behavior stage
Insert picture description here

Customer lifetime value: the total value generated by customers in the entire customer life cycle in the future, CLV can be used as an indicator to measure the level of customer relationship

Insert picture description here
Customer information: used for customer analysis, the purpose of customer analysis is to find an accurate perspective to formulate strategies, so as to optimally acquire and retain customers, and define high-value customers

  • Descriptive information: basic attribute information of customers, such as gender, age, geographic location and income, etc.
  • Behavior information: customer behavior information, that is, the general patterns of customers when they use products and services, such as purchase, registration, browsing, and use of different devices, etc.
  • Interactive information: customer and website interactive information, used for practical performance testing of the website or software (the most important thing is the conversion rate)
  • Attitude information: customer preference information, such as preferences, choices, desires, brand recognition and feelings, etc.

The classification basis of fico scores is based on the importance of each classification in general individuals. For certain groups (such as people who are just beginning to use credit cards), the importance of each classification may be different

Credit score card process (mining)
raw data-> extract subset-> read data-> wash data

Credit score card Y must be 1 or 0

Insert picture description here
WOE (Evidence Weight): The weight of evidence describes how much evidence a box has for prediction; the larger the value, the stronger the evidence that predicts the box to 1, and the smaller the value, the stronger the evidence that predicts the box to 0. If it is equal to 0, it means that there is no evidence in this box; it is mainly used to determine the classification effect of the box when dividing the box. For
Insert picture description here
WOE, it is worth dealing with outliers. Non-infinite value

Insert picture description here
The WOE graph can also be drawn through the WOE value , from which it can be seen that the relationship between a single X and Y, and can reflect the change trend of X to Y

Insert picture description here
Convert categorical variables into continuous variables through binning, and obtain a new WOE column by calculating the WOE value after binning. This WOE column can be regarded as a continuous variable, because it is a measure of each box after binning For the degree of discrimination that Y is 0/1, finally feed a column of WOE values ​​into the logistic regression;
(Important) Each column put in the last logistic regression in the credit score card is a column of WOE values

Insert picture description here
IV value: The information value of each box. The information here is its prediction ability.
Note: The information here is different from the information in the previous decision tree. The information in the decision tree indicates the degree of confusion of information, and the information here refers to Is its predictive ability; the larger the value, the stronger the predictive ability of the box, and the sum of all the IV values ​​in this column indicates the total predictive ability of the variable, so the greater the IV value, the more predictive information The stronger, the more important the variable

Insert picture description here
Insert picture description here
IV <0.02, almost no help for prediction, 0.02 <= IV <0.1, has certain help
0.1 <= IV <0.3, has great help for prediction, IV> = 0.3, has great help
IV> 0.5, need to be treated with caution , It may be too good, IV> 1, the variable must not be required

How to check whether the variable can be used when IV> 0.5?
The check method is to use new data to detect whether it is applicable; In addition, under normal circumstances, when there are a large number of missing values, IV> 0.5 may also appear.
Therefore: IV value ranking can also be used as a variable importance ranking

Supplement: In addition to IV values, random forests can also calculate variable importance rankings

Summary: WOE value is used for logistic regression, and IV value is used to calculate the ranking of variable importance to filter variables

Insert picture description here
odds: P (good) / P (bad)
In (odds) in one person / category : the proportion of people in the same category, the ratio of good to bad
WOE: the proportion of this category of contributions to all people

odds=P(good)/P(bad)

The
odds ratio and the score are in a proportional relationship In (odds) =-In (P / (1-P))
Insert picture description here
Insert picture description here

Published 69 original articles · praised 11 · 20,000+ views

Guess you like

Origin blog.csdn.net/weixin_41636030/article/details/90269621