Getting data analyst - data preprocessing using Pandas: data cleaning and visualization

Here only selected parts of the video, details or to download and see Datacastle

Data clean-up some elements:

Format Conversion:

For example, Excel and database records with respect to time; it is a format string to be saved, if you want to make some time operation, then it must use Python in some packages.

Missing Data:

Arguably the most important issue of data cleaning.

So, how to deal with missing data problem?

 

Using the average, the most frequently occurring value is filled. (This is the direction of a very large study)

abnormal data:

Does not meet the common sense values ​​appear

standardization:

 


 

Data cleansing practice

Required packages:

pandas:pip install pandas

seaborn:pip install seaborn

introduced:

 user.describe

user.shape

user.loc wait method

 

Data cleansing:

The method of converting into a date type to_datetime

Date subtraction:

 

Abnormal processing age:

Using dropna () method to remove NaN

 

 

 

 

Average sketch: 

People just take ages <90 years of age 

Since many are <10 years of age, is not realistic. Then take age> 10 years old

Histogram

Guess you like

Origin www.cnblogs.com/JasonPeng1/p/12118924.html