Project - Summary credit score card

 1. Determine the observation time window

Late user Information table Data / CreditSampleWindow.csv:
- the CID: User ID
- STAGE_BEF: late stage before this phase
- STAGE_AFT: this phase to the late stage of
- wherein the meaning of Late Stage: M0: 0-3 days overdue; M1: 3-30 days overdue; M2: 30-60 days overdue; M3: 60-90 days overdue; so
- START_DATE: entering this stage of time
- CLOSE_DATE: This phase end time
The data is taken from order approval date of January 1, 2015 to October 31, 2017 all orders number that corresponds to the order of these details overdue, the last deadline for May 31, 2018

1.1 guide package

 

1.2 reads the data and descriptive statistics

 

 

 

 

 Results can be seen from the description that is the end of the last stage of a minimum time is 0, and missing values, they need to handle missing data reprocessing outliers 0

 1.3 Data Cleaning

1.3.1 deduplication

  drop_duplicates data block deduplication function, may (subset =) to a plurality of columns based on the specified weight

1.3.2 Processing missing values

After four missing values ​​proportion is about the same as 0.08, if the missing values ​​on the same line, then consider removing. So verify whether the columns are missing values ​​in the same row

In the same line, delete

1.3.3 Processing outlier

 

Guess you like

Origin www.cnblogs.com/lvzw/p/11613218.html