python study notes 4_ data cleaning and preparation

A processing missing values

pandas floating-point values ​​NaN (Not a Number) to display the missing values, and missing values ​​referred NA (not available (not available)).

Common treatment methods NA:

dropna: The value of each tag is whether the missing data to filter axis labels, and the threshold is determined according to the allowed amount of data loss.

fillna: filled with a certain value or data interpolation method (e.g., 'ffill' or 'bfill').

isnull: the return value is a Boolean value to indicate which missing values.

notnull: isnull inverse function.

1, the filter ( Data. Dropna ())

Delete rows containing missing values (default ): data.dropna () is equivalent to data [data.notnull ()], contains the missing row is deleted default values

 

By passing parameters, it can be

Delete all the rows of values are NA: data.dropna (How = 'All')

Delete all values are NA column : data.dropna (axis = 1, how = all)

 

 

 

 

 

Reserved observations contain a certain number of lines: data.dropna (thresh = 2)

2, the completed ( Data. Fillna ())

 

Second, data conversion

1, delete

 

2. Conversion

 

3, alternative

 

Three, string manipulation

1, a string object method

 

2, regular expressions

 

3, the quantization String Functions

 

Guess you like

Origin www.cnblogs.com/dlp-527/p/11825672.html