Learning immediately: https://edu.csdn.net/course/play/26990/361139?utm_source=blogtoedu
Data cleaning;
numpy, PANDAS;
Outline:
Common tools (numpy, pandas-series, dataframe)
File operations (csv, excel, mysql)
Data table processing (filtering, additions and deletions, sorting)
Data conversion (string, date, format conversion)
Statistics (packet groupby, aggregate functions, apply function)
Data preprocessing (duplicate values, default values, outliers, discrete data)
It requires data cleaning problems;
1. Missing data - attribute value space;
2. Noise - unreasonable data values;
3. inconsistent - there is a contradiction and data;
4. The data redundancy - attribute data of two or more than the number of required data analysis;
The discrete points / outlier
6. Repeat data