1. Determine the observation time window
Late user Information table Data / CreditSampleWindow.csv:
- the CID: User ID
- STAGE_BEF: late stage before this phase
- STAGE_AFT: this phase to the late stage of
- the CID: User ID
- STAGE_BEF: late stage before this phase
- STAGE_AFT: this phase to the late stage of
- wherein the meaning of Late Stage: M0: 0-3 days overdue; M1: 3-30 days overdue; M2: 30-60 days overdue; M3: 60-90 days overdue; so
- START_DATE: entering this stage of time
- CLOSE_DATE: This phase end time
- CLOSE_DATE: This phase end time
The data is taken from order approval date of January 1, 2015 to October 31, 2017 all orders number that corresponds to the order of these details overdue, the last deadline for May 31, 2018
1.1 guide package
1.2 reads the data and descriptive statistics
Results can be seen from the description that is the end of the last stage of a minimum time is 0, and missing values, they need to handle missing data reprocessing outliers 0
1.3 Data Cleaning
1.3.1 deduplication
drop_duplicates data block deduplication function, may (subset =) to a plurality of columns based on the specified weight
1.3.2 Processing missing values
After four missing values proportion is about the same as 0.08, if the missing values on the same line, then consider removing. So verify whether the columns are missing values in the same row
In the same line, delete
1.3.3 Processing outlier