One article thoroughly understands the development of the scorecard - the determination of Y (Vintage analysis, rolling rate analysis, etc.)

The scorecard has been used in business in major banks and companies, and many predecessors have elaborated on it in detail. This article will compare and analyze the differences in the determination of the dependent variable Y in the establishment of scorecards in different industries from the perspective of payment and credit scorecard establishment. Let friends who want to understand the scorecard have a deeper understanding. And can draw inferences from one instance and apply the scorecard to more industries.
  
insert image description here
  
  

1. What is a scorecard?

  
In the field of risk control, the scorecard is a means to measure the risk of customers in the form of scores. Similar to the familiar Sesame Credit score, ranging from 350 to 950 points, the higher the score, the better the credit.
  
Users with 350-550 points have poor credit and it is difficult to enjoy the benefits of Alipay.
  
Users with 550-600 points have medium credit and can enjoy some benefits, such as staying in hotels without deposit, opening Huabei service, etc.
  
Users with 600-650 points and good credit can enjoy more benefits. In addition to the previously mentioned benefits, they can also use some travel services without a deposit.
  
Users with a score of 650-700 have excellent credit and can enjoy the high-amount treatment of Huabei and Borrowing.
  
Users with a score of 700-950 have excellent credit and can get convenience when applying for visas from some countries.
  
This article focuses on how to determine Y when establishing scorecards in the payment and credit fields. For the scorecard principle and python implementation, see Scorecard Principle and Python Implementation.

  
  

2. How to determine the dependent variable Y in the scorecard?

  
Different industries have different definitions of Y in scorecard modeling, but also have similarities. This article takes the payment field and the credit field as the entry point, and introduces the determination of the dependent variable Y.

  

1 The payment field determines the dependent variable Y

  
For the payment field, the definition of the dependent variable Y is relatively simple. For example, the company has the network access and transaction data of 100 million merchants, and now wants to build a model to evaluate which types of merchants have gambling risks. When defining the dependent variable Y, you can define a merchant with the word "gambling" in the system's historical closure reason and the current merchant status as closed as 1 (bad sample).
  
Why is the merchant status required to be closed? The reason is that some merchants' transactions are similar to gambling transactions, and there will be cases of false audits. If the merchant has made a material complaint after being audited as gambling, it proves that it is in normal operation and meets the business scope of the merchant. The merchant account will be reopened, and the merchant status will be adjusted to normal. When there are enough bad sample data, sometimes for the convenience of calculation, only the merchants defined as gambling and closed in the last two years are intercepted to model the bad samples.
  
After defining the bad sample, what is defined as 0 (good sample)?
  
There are two ways, one is that the merchants whose current status is normal are defined as 0, and the other is that the merchants whose current status is normal and have not been audited as gambling by history are defined as 0. Generally, the amount of data of normal merchants is too large, and it is necessary to draw some normal samples in proportion and time according to the number of bad samples to build a model.

  

2 Determining the dependent variable Y in the field of credit

  
For the fraud model in the credit field, the dependent variable Y can usually be defined by the overdue performance of the first period of repayment (first overdue). As with the payment domain scorecard model, the definition is relatively simple. However, for the credit model in the credit field, the determination of the dependent variable Y is relatively complicated, and generally requires the combination of rolling rate analysis and vintage analysis.
  
Rollover rate analysis determines to what extent overdue customers are defined as bad, and vintage analysis determines how long performance customers can be included in the model. In order to let everyone understand the determination of the dependent variable Y more clearly, first define some nouns that need to be used.

  

1. Definition of terms

  
For simplicity, a single person is used as an example. Suppose a person borrows a credit loan of 10,000 yuan on an online platform at 10:08 am on April 12, 2021, and repays it in equal installments with principal and interest in the next 12 months. In order to show some nouns more clearly, put these nouns in the following figure for display:

insert image description here

  
1. Observation point (obs_date): The time point of the customer's loan (April 12, 2021 at 10:08 am). We use data from a period of time up to the point of loan application to predict the likelihood of a customer's future delinquency.
  
2. Observation period: the time interval used to generate customer characteristics (independent variables).
  
3. Performance period: The time interval used to define whether a customer is good or bad. Strictly speaking, customers with 12 installments can only define good or bad after all the money has been repaid. However, through the Vintage analysis, it can be seen how long it takes for the lending customers to go bad, and the rest can basically be repaid on time, so that the performance period can be shortened and the number of customers who can enter the modeling can be increased.
  
4. Performance point: By how long can customers be defined as "good customers" and "bad customers".
  
5. Aging MOB (Month on Book): The month of asset loan.
  
MOB0: From the date of disbursement to the end of the current month, the example refers to April 12, 2021 to April 30, 2021.
  
MOB1: The second month of disbursement, the example refers to May 1, 2021 to May 31, 2021.
  
MOB2: The third month of disbursement, the example refers to May 1, 2021 to May 31, 2021.
  
MOB3: The fourth month of disbursement, the example refers to June 1, 2021 to June 30, 2021.
  
And so on,
  
MOB12: Refers to the 13th month of disbursement, the example refers to March 31, 2022 to April 30, 2022.
  
If the product has 12 periods, then the life cycle of the asset is 12 periods, and the maximum MOB is MOB12. If the product is 24 issues, the maximum MOB is MOB24.
  
6. Overdue: If the customer fails to repay the monthly repayment amount in full on the repayment date, then the contract is overdue.
  
7. Definition of DPD (Days Past Due)
  
: the customer has not yet paid the due date, and the number of overdue days is the number of days from the next day after the due date to the actual repayment date (inclusive). If the actual return date is used, the actual return date shall be replaced by the statistical date.

Expression: DPDN+ means customers with overdue days ≥ N days, such as DPD60+ means customers with overdue days ≥ 60 days.
  
Example:
  
insert image description here
  
If the customer fails to repay on the first repayment date (May 12, 2021), then May 13, 2021 will be one day overdue, and the customer repays on May 17, and the customer has exceeded the first 5 days .

other instructions:

①Any number of overdue days can be used according to requirements during analysis, such as 3 days/7 days/15 days/30 days overdue, etc.

②Setting according to the number of overdue days in the analysis depends on the reminder method and recall rate. 8. Calculation method of
  
overdue periods : the specified number of overdue days is used as the overdue period. For example, 1 30 days overdue corresponds to M1, 31 60 days overdue corresponds to M2, and so on. There is a certain correspondence between overdue periods and overdue days. The number of overdue periods is directly calculated by the number of days overdue (Note: Different institutions may have different divisions). Definition: The number of installments from the day after the due repayment date to the actual repayment date (inclusive). If the customer fails to repay the loan in the current period without the actual repayment date, the actual repayment date shall be replaced by the statistical date. Expression: M0: Normal assets, not overdue at present (C can also be used to represent). M1: 1-30 days overdue, one period overdue. M2: 31-60 days overdue, two installments overdue. M3: 61-90 days overdue, three installments overdue. M4: 91-120 days overdue, four installments overdue. M5: 121-150 days overdue, five installments overdue. M6: 151-180 days overdue, six installments overdue. Mn: 30 n-29~30 n days overdue, N periods overdue. Similarly, M3+: more than 90 days overdue, more than 3 periods (exclusive) overdue. M4+: More than 120 days overdue, more than 4 periods (exclusive) overdue. M6+:
  

  

  

  

  

  

  

  

  

  

  

  

  

  

  
More than 180 days overdue, more than 6 periods overdue (not included), also known as bad debts, will cancel the account.
  
Mn+: Overdue for more than 3*n days, overdue for more than n periods (not included). 9. Calibration of the number of orders for
  
overdue rate : overdue rate = number of overdue orders/total number of lending orders Amount caliber: overdue rate = remaining overdue principal/total lending principal.
  

  

  
  

2. Roll rate analysis

  
1. Purpose: In order for the risk control model to have a better ability to distinguish, we need to determine how long the overdue customer is defined as 1 (bad customer). Because some customers who are overdue for a few days probably forgot to repay, and repaid after being reminded, it is not without repayment willingness and repayment ability. If all customers with overdue performance are defined as 1, the definition of bad customers in the model will be unclear, which will affect the ability of the model to distinguish. Rollover rate analysis can show the situation of customers moving from one status to another in different time periods, so that the development and changes of customers in different overdue status can be analyzed.
  
2. Definition: Transfer from the worst state of a period of time before observation point 1 (observation period 1) to the worst state of a period of time after observation point 1 (observation period 2).
  

insert image description here
  

3. The specific steps of rolling rate analysis:
  
step1: Select observation point 1, take observation point 1 as the cut-off time, and calculate the longest overdue period of the customer in observation period 1 (such as the past 6 months) according to the repayment schedule, and press the most The bad overdue status divides customers into different levels, such as C, M1, M2, M3, M4+, etc.
  
step2: Starting from observation point 1, count the longest overdue periods of customers in observation period 2 (such as the next 6 months), and divide users into different levels according to the worst overdue status, such as C, M1, M2, M3, M4+, etc.
  
Step3: Cross count the number of customers in the transfer matrix.
  
step4: Calculate the proportion according to the number of customers in the transfer matrix.
  
Step5: Select different observation points, repeat step1~step4, and compare the scroll rate values.
  
For example, the observation point is selected as 12 o'clock in the evening on June 30, 2021, and 20,000 customers are taken as observation objects, and the maximum overdue status changes of these customers from observation period 1 to observation period 2 are counted. First, calculate the following detailed list of customers' overdue status (only for understanding business needs, not real data):
  
insert image description here
  

According to the overdue status detailed table, the following rollover rate analysis matrix is ​​calculated:
  
insert image description here
  
Observing the rollover rate analysis matrix shows that:
  
① For customers whose overdue status is C (normal) in observation period 1, 95.29% of them will continue to maintain a normal status in the next 6 months , 4.71% will be converted into overdue customers.
  
② For the customers whose overdue status is M1 in the observation period 1, 81.16% will return to the normal status in the future, that is, the yield rate is 81.86%, 11.96% are still in the M1 status, and 6.88% will further deteriorate.
  
③Observation period 1 For customers who are overdue in M2 status, the yield rate is 25.96%, 6.41% are converted to M1 status, 26.12% are still in M2 status, and 41.51% will deteriorate further.
  
④ For the customers whose overdue status was M3 in observation period 1, the yield rate was 19.77%, 10.6% converted to M1 and M2, 11.46% were still in M3 status, and 58.17% would deteriorate further.
  
⑤ For the customers whose overdue status was M3+ in observation period 1, the yield rate was 3.36%, 24.16% were converted to M1, M2 and M3, and 72.48% were still in M3+ status.
  
According to the number of yields, customers with an overdue status of M3+ will hardly be good. In order to allow the risk control model to have a better ability to distinguish, bad customers can be defined as customers with an overdue status of M3+ (more than 90 days overdue).
  
In actual credit modeling, due to constraints such as business scale and product launch time, the number of modeling samples may be small, resulting in fewer bad samples. Sometimes it is artificially defined as 1 (bad sample) if it is overdue for more than n days, 0 (good sample) if it is not overdue, and gray sample (discarded) if it is within n days. Now there is a standard for measuring the number of days overdue customers are defined as bad customers. Need to determine how long the performance period of the customer can be included in the evaluation.
  
Assuming that the loan period of a product is 12 periods, do we need to complete all 12 periods to define whether a customer is a bad customer? Strictly speaking, it is. Otherwise, we can only say that the customer is not a bad customer so far, but we cannot know whether it will become a bad customer overdue in the next few periods. And some accounts reached M3+ in the first few periods, and some only reached M3+ in the later periods. Therefore, we only need to determine an appropriate performance period to cover enough bad customers. Vintage analysis is to determine how long it is appropriate to set the performance period.

  

3. Vintage Analysis

  
1. Purpose: To count the overdue situation in each MOB after the new loan is added every month, compare the overdue situation of the monthly loan, judge the effectiveness of the strategy and model, and analyze the risk maturity period of the customer.
  
2. Expression method: The abscissa of the Vintage curve is MOB, and the ordinate is the overdue rate. The overdue rate can be calculated in the amount dimension or in the order dimension.
  
3. Overdue rate calculation and statistical method (amount):
  
overdue rate = overdue remaining principal/total loan principal.
  
The denominator is the total principal of the month when the loan is issued, that is, the contract amount, which does not change over time (not reduced due to settlement or write-off).
  
The numerator is the principal balance when the definition of Bad is overdue. Assuming that Bad is defined as M3+, there are two calculation methods for the numerator.
  
insert image description here
  
4. Calculation and statistics of overdue rate (number of orders):
  
overdue rate = number of overdue orders/total number of lending orders
  
The denominator is the total number of orders in the current month of lending, which does not change with time (not reduced due to settlement or write-off) .
  
The numerator is the number of orders when the overdue Bad definition is reached. Assuming Bad is defined as M3+, there are also two calculation methods for the numerator.
  
insert image description here
  
5. Establishment of vintage table Now
  
assume that there is a cash loan product with a rate of 36% to customers, with a product term of 12 periods, an average of 2,000 yuan per loan, and an average monthly loan number of about 10,000. The repayment method is equal principal and interest. From the analysis of rolling rate, it can be obtained that the overdue Mn+ customers will hardly be good, so bad customers can be defined as Mn+ customers. This article assumes that customers whose overdue status is M3+ after the product passes the roll rate analysis will hardly be rehabilitated. Count the loan performance of this product from March 2021 to May 2022 (now), and the following table can be obtained: according to the
  
insert image description here
  
dimension of MOB, the orders of different loan months can be rearranged, and the following table can be obtained:
  
insert image description here
  
draw the table of MOB dimension as Line chart, the following Vintage table can be obtained:
  

insert image description here
  
It is known from vintage:
  
①The horizontal axis indicates the life cycle of customers, which reflects the changes in the process of customer maturity.
  
②The vertical axis reflects the changes of customers with the same age over time, and shows the changes in the default rate in different months.
  
③Since the product term is 12 periods, the maximum MOB (age of account) is 12 months, and vice versa.
  
④ The data statistics are Ever M3+ overdue rate, so the aging MOB1 and MOB2 are both 0.
  
⑤ The overdue rate of accounts in the lending month from March 2021 to November 2021 is decreasing, indicating that asset quality is constantly improving. It may be that risk control has a more comprehensive understanding of the risk dimension of the product, and the level of risk control is constantly improving. promote.
  
⑥The overdue rate M3+ of customers who made loans in different months tended to be stable after 9 MOBs, indicating that the maturity period of the account is 9 months.
  
⑦Since the statistics are based on the Ever M3+ overdue rate, the overdue rate value in a single month will only increase but not decrease. From the Vintage table, if we want to build a pre-credit card model (A card), the loan month with complete performance (12 installments) is March 2021 to June 2021.
  
If only data with complete performance is used for modeling, samples can only be taken from customers whose loan month is from March 2021 to May 2021. If the data of the account maturity period of 9 months is used for modeling, the samples can be taken from March 2021 to August 2021, with an additional three months of sample data. Since the data in the Vintage table is fabricated, it looks relatively clear. In reality, some lending data may suddenly increase in overdue performance in a certain month due to factors such as traffic flow, external environment, and risk control strategy adjustments.

For example, there is an e-commerce customer group loan product with a product term of 12 periods, an average of 5,000 yuan per piece, and a cash loan product with a customer rate of 36%. Vintage’s performance is as follows (data has been processed):
  

insert image description here
  
From the Vintage table of this product, it can be seen that the loan overdue rate in October 2018 increased sharply compared with the previous month, which may be due to factors such as traffic flow, external environment, and risk control strategy adjustments.

  

Fourth, the determination of the dependent variable Y

  
1. Definition: The dependent variable Y is the label variable of good or bad customers.
  
2. Method: Use rolling rate analysis to define the quality of customers, and Vintage analysis to determine the appropriate performance period.
  
3. Specific operation steps:
  
Step1: Use the rolling rate to define bad customers. For example, in the above case, it is defined that customers with an overdue rate of M3+ are bad customers.
  
Step2: Take M3+ as the statistical indicator of asset quality, count the Vintage data table, draw the Vintage curve, and analyze the maturity period of the account. For example, the above case confirms that the account maturity period is 9 months.
  
Step3: Samples whose performance period is greater than the maturity period can be used for modeling, and samples whose performance period is less than the maturity period cannot accurately define the Y variable, so they are temporarily discarded.
  
4. Conclusion: According to the above case, the customer whose performance period exceeds 9 months and whose M3+ is overdue is defined as 1, the customer whose performance period exceeds 9 months and is not overdue is defined as 0, and other customers are discarded.
  
So far, the determination of the dependent variable Y in the payment field and the credit field has been analyzed. Welcome to share pictures with more friends in need.
  
references

http://t.zoukankan.com/zjfjava-p-14213026.html
https://blog.csdn.net/eroswang/article/details/117735703
https://vip.kingdee.com/article/243694728837810944?productLineId=1
https://baijiahao.baidu.com/s?id=1703345218390615519&wfr=spider&for=pc

You may be interested in:
Draw Pikachu with Python
Draw a word cloud map
with Python Draw 520 eternal heart beats with Python Python face recognition - you are the only one
in my eyes With sound and text) Use the py2neo library in Python to operate neo4j and build a relationship map Python romantic confession source code collection (love, rose, photo wall, confession under the stars)



Long press (scan) to recognize the QR code above to learn more Python and modeling knowledge, making your study and work more brilliant.

Guess you like

Origin blog.csdn.net/qq_32532663/article/details/125461299