pandas_cookbook学习(四)

版权声明:https://blog.csdn.net/thfyshz版权所有 https://blog.csdn.net/thfyshz/article/details/83818797

缺失数据处理

对时间序列中缺失值的操作:

In [79]: df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range('2013-08-01', periods=6, freq='B'), columns=list('A'))

In [80]: df.loc[df.index[3], 'A'] = np.nan

In [81]: df
Out[81]: 
                   A
2013-08-01 -1.054874
2013-08-02 -0.179642
2013-08-05  0.639589
2013-08-06       NaN
2013-08-07  1.906684
2013-08-08  0.104050

# 向下填充
In [82]: df.reindex(df.index[::-1]).ffill()
Out[82]: 
                   A
2013-08-08  0.104050
2013-08-07  1.906684
2013-08-06  1.906684
2013-08-05  0.639589
2013-08-02 -0.179642
2013-08-01 -1.054874

分组

使用apply:

In [83]: df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
   ....:                    'size': list('SSMMMLL'),
   ....:                    'weight': [8, 10, 11, 1, 20, 12, 12],
   ....:                    'adult' : [False] * 5 + [True] * 2}); df
   ....: 
Out[83]: 
  animal size  weight  adult
0    cat    S       8  False
1    dog    S      10  False
2    cat    M      11  False
3   fish    M       1  False
4    dog    M      20  False
5    cat    L      12   True
6    cat    L      12   True

#每种动物中最大的体型
In [84]: df.groupby('animal').apply(lambda subf: subf['size'][subf['weight'].idxmax()])
Out[84]: 
animal
cat     L
dog     M
fish    M
dtype: object
Using get_group

In [85]: gb = df.groupby(['animal'])

#得到cat这一组的数据
In [86]: gb.get_group('cat')
Out[86]: 
  animal size  weight  adult
0    cat    S       8  False
2    cat    M      11  False
5    cat    L      12   True
6    cat    L      12   True

#对一个组中不同项目应用函数
In [87]: def GrowUp(x):
   ....:    avg_weight =  sum(x[x['size'] == 'S'].weight * 1.5)
   ....:    avg_weight += sum(x[x['size'] == 'M'].weight * 1.25)
   ....:    avg_weight += sum(x[x['size'] == 'L'].weight)
   ....:    avg_weight /= len(x)
   ....:    return pd.Series(['L',avg_weight,True], index=['size', 'weight', 'adult'])
   ....: 

In [88]: expected_df = gb.apply(GrowUp)

In [89]: expected_df
Out[89]: 
       size   weight  adult
animal                     
cat       L  12.4375   True
dog       L  20.0000   True
fish      L   1.2500   True

apply的扩展应用:

In [90]: S = pd.Series([i / 100.0 for i in range(1,11)])

In [91]: def CumRet(x,y):
   ....:    return x * (1 + y)
   ....: 

In [92]: def Red(x):
   ....:    return functools.reduce(CumRet,x,1.0)
   ....: 

In [93]: S.expanding().apply(Red, raw=True)
Out[93]: 
0    1.010000
1    1.030200
2    1.061106
3    1.103550
4    1.158728
5    1.228251
6    1.314229
7    1.419367
8    1.547110
9    1.701821
dtype: float64

猜你喜欢

转载自blog.csdn.net/thfyshz/article/details/83818797