TSAP(4) : 时间序列采样[asfreq( ) VS resample( )]

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u014281392/article/details/83210952

TSAP : TimeSeries Analysis with Python

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=10, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
# 时间跨度为小时
ts
    2011-01-01 00:00:00   -1.065583
    2011-01-01 01:00:00   -0.586701
    2011-01-01 02:00:00   -0.554193
    2011-01-01 03:00:00   -0.316603
    2011-01-01 04:00:00    0.534045
    2011-01-01 05:00:00   -0.764800
    2011-01-01 06:00:00    0.196573
    2011-01-01 07:00:00    0.201643
    2011-01-01 08:00:00   -0.694384
    2011-01-01 09:00:00    0.555979
    Freq: H, dtype: float64
# 改变时间跨度(间隔为45分钟), value的值向后填充
converted = ts.asfreq('45Min', method='pad')

converted
2011-01-01 00:00:00   -1.065583
2011-01-01 00:45:00   -1.065583
2011-01-01 01:30:00   -0.586701
2011-01-01 02:15:00   -0.554193
2011-01-01 03:00:00   -0.316603
2011-01-01 03:45:00   -0.316603
2011-01-01 04:30:00    0.534045
2011-01-01 05:15:00   -0.764800
2011-01-01 06:00:00    0.196573
2011-01-01 06:45:00    0.196573
2011-01-01 07:30:00    0.201643
2011-01-01 08:15:00   -0.694384
2011-01-01 09:00:00    0.555979
Freq: 45T, dtype: float64

改变时间点的采样频率

缺失值的填充方式.

  • method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
# backfill在缺失的时间点上,value的值向前填充
ts.asfreq('45Min', method='backfill') 

2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -0.586701
2011-01-01 01:30:00 -0.554193
2011-01-01 02:15:00 -0.316603
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 0.534045
2011-01-01 04:30:00 -0.764800
2011-01-01 05:15:00 0.196573
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.201643
2011-01-01 07:30:00 -0.694384
2011-01-01 08:15:00 0.555979
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64

# bfill = backfill
ts.asfreq('45Min', method='bfill')

2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -0.586701
2011-01-01 01:30:00 -0.554193
2011-01-01 02:15:00 -0.316603
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 0.534045
2011-01-01 04:30:00 -0.764800
2011-01-01 05:15:00 0.196573
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.201643
2011-01-01 07:30:00 -0.694384
2011-01-01 08:15:00 0.555979
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64

# ffill 向后填充缺失
# 01:30:00 用 01:00:00的值来填充
converted.asfreq('45Min', method='ffill')

2011-01-01 00:00:00 -1.065583
2011-01-01 00:45:00 -1.065583
2011-01-01 01:30:00 -0.586701
2011-01-01 02:15:00 -0.554193
2011-01-01 03:00:00 -0.316603
2011-01-01 03:45:00 -0.316603
2011-01-01 04:30:00 0.534045
2011-01-01 05:15:00 -0.764800
2011-01-01 06:00:00 0.196573
2011-01-01 06:45:00 0.196573
2011-01-01 07:30:00 0.201643
2011-01-01 08:15:00 -0.694384
2011-01-01 09:00:00 0.555979
Freq: 45T, dtype: float64

# 时间频率切换到低频,向前填充
converted.asfreq('90Min', method = 'ffill')

2011-01-01 00:00:00 -1.065583
2011-01-01 01:30:00 -0.586701
2011-01-01 03:00:00 -0.316603
2011-01-01 04:30:00 0.534045
2011-01-01 06:00:00 0.196573
2011-01-01 07:30:00 0.201643
2011-01-01 09:00:00 0.555979
Freq: 90T, dtype: float64

resample VS asfreq( )

ts.asfreq('D').sum()
-1.0655834142614131
ts.resample('D').sum()

2011-01-01 -2.494026
Freq: D, dtype: float64

ts.asfreq('2H')

2011-01-01 00:00:00 -1.065583
2011-01-01 02:00:00 -0.554193
2011-01-01 04:00:00 0.534045
2011-01-01 06:00:00 0.196573
2011-01-01 08:00:00 -0.694384
Freq: 2H, dtype: float64

ts.resample('2H').sum()
2011-01-01 00:00:00   -1.652284
2011-01-01 02:00:00   -0.870797
2011-01-01 04:00:00   -0.230756
2011-01-01 06:00:00    0.398216
2011-01-01 08:00:00   -0.138405
Freq: 2H, dtype: float64

What is the difference between .resample() and .asfreq()?

  • asfreq() : 采样时间点的value
  • resample() : 采样时间段内value

猜你喜欢

转载自blog.csdn.net/u014281392/article/details/83210952