Pandas Explains Ten of Dropna to Filter Missing Data

Convention:

import pandas as pd
import numpy as np
from numpy import nan as NaN

filter out missing data

One of the design goals of pandas is to make the task of dealing with missing data easier. pandas uses NaNs as markers for missing data.
Using dropna makes it easier to filter out missing data.

Working with DataFrame objects

Handling DataFrame objects is more complicated because you may need to discard all or some NaNs.

df1=pd.DataFrame([[1,2,3],[NaN,NaN,2],[NaN,NaN,NaN],[8,8,NaN]])

Code result:

>>> df1
     0    1    2
0  1.0  2.0  3.0
1  NaN  NaN  2.0
2  NaN  NaN  NaN
3  8.0  8.0  NaN

Filter out all containing NaNs by default:

>>> df1.dropna()
     0    1    2
0  1.0  2.0  3.0

Pass in how='all' to filter out rows that are all NaN:

>>> df1.dropna(how='all')
     0    1    2
0  1.0  2.0  3.0
1  NaN  NaN  2.0
3  8.0  8.0  NaN

Pass in axis=1 to filter out columns:

>>> df1[3]=NaN
>>> df1
     0    1    2   3
0  1.0  2.0  3.0 NaN
1  NaN  NaN  2.0 NaN
2  NaN  NaN  NaN NaN
3  8.0  8.0  NaN NaN
>>> df1.dropna(axis = 1,how='all')
     0    1    2
0  1.0  2.0  3.0
1  NaN  NaN  2.0
2  NaN  NaN  NaN
3  8.0  8.0  NaN

Pass in thresh=n to filter out n lines:

>>> df1.dropna(thresh=1)
     0    1    2   3
0  1.0  2.0  3.0 NaN
1  NaN  NaN  2.0 NaN
3  8.0  8.0  NaN NaN
>>> df1.dropna(thresh=3)
     0    1    2   3
0  1.0  2.0  3.0 NaN

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324711339&siteId=291194637