Data analysis was performed using the Python Chapter 7 data cleaning and preparation

Learning time: start 2019/10/25 22:30 Friday night.

Learning Objectives: Page188-Page217, a total of 30, the target completion of six days, five a day, expected 1029 completion.

Actual feedback: the X-centralized learning 1.5 hours learning 6. XXX actual completion, N days consuming, M h

Data preparation: loading, cleansing, transforming and reshaping usually takes up 80 percent of the time analysts or more! ! ! Learn efficient data cleaning and preparation, the absolute increase productivity! This chapter discusses handling missing data, duplicate data tools, string manipulation and other analytical data conversion. The next chapter will focus on consolidated several ways to rebuild the data set.

7.1 handle missing data

Missing data presented in pandas in some way is not perfect, but for most users can ensure normal function.

For numeric data, PANDAS floating-point values NaN (Not a Number) indicates missing data. Called sentinel value, can be easily detected:

In pandas, the missing values are expressed as NA (R language of the usage), represents not available. NA data may be data or exists but does not exist is not observed. (When the data cleaning, the missing data is preferably analyzed directly for ease of analysis, to determine the data collection or missing data may lead to deviations Hu.)

None Python built-in object data value may be used as NA: