The df.dropna() function is used to delete missing data in the dataframe data, that is, delete NaN data.
Official function description:
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Remove missing values.
See the User Guide for more on which values are considered missing,
and how to work with missing data.
Returns
DataFrame
DataFrame with NA entries dropped from it.
Parameter Description:
Parameters | illustrate |
---|---|
axis | 0 is the row and 1 is the column, default 0, the data deletes the dimension |
how | {'any', 'all'}, default 'any', any: delete rows with nan; all: delete rows with all nan |
thresh | int, keep at least int non-nan rows |
subset | list, missing value handling in specific columns |
inplace | bool, whether to modify the source file |
test:
>>>df = pd.DataFrame({
"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
>>>df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Delete rows with at least one element missing:
>>>df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
Drop columns that are missing at least one element:
>>>df.dropna(axis=1)
name
0 Alfred
1 Batman
2 Catwoman
Delete all rows with missing elements:
>>>df.dropna(how='all')
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Only keep rows with at least 2 non-NA values:
>>>df.dropna(thresh=2)
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Find missing values from a specific column:
>>>df.dropna(subset=['name', 'born'])
name toy born
1 Batman Batmobile 1940-04-25
Modify the original data:
>>>df.dropna(inplace=True)
>>>df
name toy born
1 Batman Batmobile 1940-04-25
that's all.