pandas Data Analysis - Processing fill in missing data

dropna default losing any row with missing values.

date =  DataFrame([[1.,2.,3.],[NA,NA,NA],
[1.,3.,NA],[1.,5.,NA]])
clean = date.dropna()

print(clean)

You may want to discard the row or column containing NA, the transmission how = 'all' drops only the row containing the NA.

date =  DataFrame([[1.,2.,3.],[NA,NA,NA],
[1.,3.,NA],[1.,5.,NA]])
clean = date.dropna(how='all')
print(clean)

Use this way discarded column, only you need to pass axis = 1 can be.

date =  DataFrame([[1.,2.,NA],[NA,NA,NA],
[1.,3.,NA],[1.,5.,NA]])
clean = date.dropna(axis=1,how='all')
print(clean)

Another problem DataFrame filter out rows of data involving time series, suppose you want to leave a portion of the observed data can be used to thresh the parameters for this purpose.

df = DataFrame(np.random.randn(6,3))
df.ix[:4,1]=NA;df.ix[:2,2]=NA
print(df)

df = DataFrame(np.random.randn(5,3))
df.ix[:4,1]=NA;df.ix[:2,2]=NA
print(df.dropna(thresh=2))

You may not want to filter in addition to missing data, but to fill those voids by other means, it will be replaced with missing values ​​for the constant value by a constant call fillna.

df = DataFrame(np.random.randn(5,3))
df.ix[:4,1]=NA;df.ix[:2,2]=NA
print(df.fillna(0))

If the call fillna through the dictionary, you can achieve filled with different values ​​for different columns.

df = DataFrame(np.random.randn(5,3))
df.ix[:4,1]=NA;df.ix[:2,2]=NA
print(df.fillna({1:0.5,2:1}))

fillna default returns a new object, you can also modify existing objects in place.

df.ix[:4,1]=NA;df.ix[:2,2]=NA
_ = df.fillna(0, inplace=True)
print (df)

Those effective to reindex interpolation methods may also be used fillna.

df = DataFrame(np.random.randn(6,3))
df.ix[2:,1]=NA;df.ix[4:,2]=NA
print(df)
print(df.fillna(method='ffill'))

df = DataFrame(np.random.randn(6,3))
df.ix[2:,1]=NA;df.ix[4:,2]=NA
print(df)
print(df.fillna(method='ffill',limit=2))

还可以利用fillna实现很多功能,比如说:你可以传入Series的平均值或中位数。

date = Series([1,NA,2,NA,3])
print(date.fillna(date.mean()))

参数      说明

value     用于填充缺失值得标量值或字典对象

method  插值方式,如果函数调用时未指定其他参数的话,默认为“ffill”

axis        待填充的轴,默认axis=0

inplace   修改调用者对象而不产生副本

limit         对于向前和向后填充,可以连续填充的最大数量

 
 
 
 
 

Guess you like

Origin www.cnblogs.com/li98/p/10991229.html