Here only selected parts of the video, details or to download and see Datacastle
Data clean-up some elements:
Format Conversion:
For example, Excel and database records with respect to time; it is a format string to be saved, if you want to make some time operation, then it must use Python in some packages.
Missing Data:
Arguably the most important issue of data cleaning.
So, how to deal with missing data problem?
Using the average, the most frequently occurring value is filled. (This is the direction of a very large study)
abnormal data:
Does not meet the common sense values appear
standardization:
Data cleansing practice
Required packages:
pandas:pip install pandas
seaborn:pip install seaborn
introduced:
user.describe
user.shape
user.loc wait method
Data cleansing:
The method of converting into a date type to_datetime
Date subtraction:
Abnormal processing age:
Using dropna () method to remove NaN
Average sketch:
People just take ages <90 years of age
Since many are <10 years of age, is not realistic. Then take age> 10 years old
Histogram