Convention:
import pandas as pd
import numpy as np
from numpy import nan as NaN
filter out missing data
One of the design goals of pandas is to make the task of dealing with missing data easier. pandas uses NaNs as markers for missing data.
Using dropna makes it easier to filter out missing data.
Working with DataFrame objects
Handling DataFrame objects is more complicated because you may need to discard all or some NaNs.
df1=pd.DataFrame([[1,2,3],[NaN,NaN,2],[NaN,NaN,NaN],[8,8,NaN]])
Code result:
>>> df1
0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
2 NaN NaN NaN
3 8.0 8.0 NaN
Filter out all containing NaNs by default:
>>> df1.dropna()
0 1 2
0 1.0 2.0 3.0
Pass in how='all' to filter out rows that are all NaN:
>>> df1.dropna(how='all')
0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
3 8.0 8.0 NaN
Pass in axis=1 to filter out columns:
>>> df1[3]=NaN
>>> df1
0 1 2 3
0 1.0 2.0 3.0 NaN
1 NaN NaN 2.0 NaN
2 NaN NaN NaN NaN
3 8.0 8.0 NaN NaN
>>> df1.dropna(axis = 1,how='all')
0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
2 NaN NaN NaN
3 8.0 8.0 NaN
Pass in thresh=n to filter out n lines:
>>> df1.dropna(thresh=1)
0 1 2 3
0 1.0 2.0 3.0 NaN
1 NaN NaN 2.0 NaN
3 8.0 8.0 NaN NaN
>>> df1.dropna(thresh=3)
0 1 2 3
0 1.0 2.0 3.0 NaN