文章目录

一、函数简单介绍

案例一：简单使用

补充：常用聚合方法

案列二：时间序列

探索Apple公司股价数据

参考链接

官方文档：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

一、函数简单介绍

DataFrame.resample(rule, axis=0, closed=None, label=None, convention=‘start’, kind=None, loffset=None, base=None, on=None, level=None, origin=‘start_day’, offset=None)

函数作用 ：对时间序列数据进行重新采样 、将时间序列从一个频率转换为另一个频率

实际应用场景 ：如将销售信息按月采样、然后统计每月销售总额、每月中的最大销售额等、

rule ： DateOffset, Timedelta or str
The offset string or object representing target conversion.

参数 rule ：定义一个重新采样的规则、比如每周采样一次、每年采样一次等…

rule 参数如下表：

参数	含义
B	business day
C	custom business day (experimental)
D	calendar day
W	weekly
M	month end
BM	business month end
CBM	custom business month end
MS	month start
BMS	business month start
CBMS	custom business month start
Q	quarter end
BQ	business quarter end
QS	quarter start
BQS	business quarter start
A	year end
BA	business year end
AS	year start
BAS	business year start
BH	business hour
H	hourly
T	minutely
S	secondly
L	milliseonds
U	microseconds
N	nanoseconds

案例一：简单使用

构造数据 --> 对数据进行重新采样 —> 聚合操作 —> 打印输出

import pandas as pd
import numpy as np

time_index = pd.date_range('20200101', periods = 12)
ts = pd.Series(np.arange(12), index = time_index) # 构造一个Series对象
print(ts) # 输出原始数据 

# 将数据按照5天的频率采样
ts_re = ts.resample('5D')  # 获得一个DatetimeIndexResampler 对象
ts_re2 = ts_re.sum()  # 进行聚合操作  (求和)
print(ts_re, type(ts_re))  
print(ts_re2, type(ts_re2))  #输出采样后的数据

输出：

2020-01-01     0
2020-01-02     1
2020-01-03     2
2020-01-04     3
2020-01-05     4
2020-01-06     5
2020-01-07     6
2020-01-08     7
2020-01-09     8
2020-01-10     9
2020-01-11    10
2020-01-12    11
Freq: D, dtype: int32

DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0]
 <class 'pandas.core.resample.DatetimeIndexResampler'>

2020-01-01    10
2020-01-06    35
2020-01-11    21
Freq: 5D, dtype: int32 
<class 'pandas.core.series.Series'>

补充：常用聚合方法

print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')
# OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘

输出：

2020-01-01     2.0
2020-01-06     7.0
2020-01-11    10.5
Freq: 5D, dtype: float64 → 求平均值

2020-01-01     4
2020-01-06     9
2020-01-11    11
Freq: 5D, dtype: int32 → 求最大值

2020-01-01     0
2020-01-06     5
2020-01-11    10
Freq: 5D, dtype: int32 → 求最小值

2020-01-01     2.0
2020-01-06     7.0
2020-01-11    10.5
Freq: 5D, dtype: float64 → 求中值

2020-01-01     0
2020-01-06     5
2020-01-11    10
Freq: 5D, dtype: int32 → 返回第一个值

2020-01-01     4
2020-01-06     9
2020-01-11    11
Freq: 5D, dtype: int32 → 返回最后一个值

            open  high  low  close
2020-01-01     0     4    0      4
2020-01-06     5     9    5      9
2020-01-11    10    11   10     11 → OHLC重采样

案列二：时间序列

探索Apple公司股价数据

建议使用 jupyter打开、一步一步操作，可以查看每一步的执行结果

数据 Apple_stock.csv : https://pan.baidu.com/s/1QFdZ595dJcXEFf_CxAz2ig 提取码: hdn7

步骤1 导入必要的库

import pandas as pd
import numpy as np

# visualization
import matplotlib.pyplot as plt

%matplotlib inline

步骤2 数据集地址

path = 'Apple_stock.csv'

步骤3 读取数据并存为一个名叫apple的数据框

apple = pd.read_csv(path)
apple.head()

Out[320]:

	Date	Open	High	Low	Close	Volume	Adj Close
0	2014-07-08	96.27	96.80	93.92	95.35	65130000	95.35
1	2014-07-07	94.14	95.99	94.10	95.97	56305400	95.97
2	2014-07-03	93.67	94.10	93.20	94.03	22891800	94.03
3	2014-07-02	93.87	94.06	93.09	93.48	28420900	93.48
4	2014-07-01	93.52	94.07	93.13	93.52	38170200	93.52

步骤4 查看每一列的数据类型

In [321]:

apple.dtypes

Out[321]:

Date          object
Open         float64
High         float64
Low          float64
Close        float64
Volume         int64
Adj Close    float64
dtype: object

步骤5 将Date这个列转换为datetime类型

In [322]:

apple.Date = pd.to_datetime(apple.Date)
apple['Date'].head()

Out[322]:

0   2014-07-08
1   2014-07-07
2   2014-07-03
3   2014-07-02
4   2014-07-01
Name: Date, dtype: datetime64[ns]

步骤6 将Date设置为索引

In [323]:

df.set_index('Date',inplace=True)
df.head()

Out[323]:

	Open	High	Low	Close	Volume	Adj Close
Date
2014-07-08	96.27	96.80	93.92	95.35	65130000	95.35
2014-07-07	94.14	95.99	94.10	95.97	56305400	95.97
2014-07-03	93.67	94.10	93.20	94.03	22891800	94.03
2014-07-02	93.87	94.06	93.09	93.48	28420900	93.48
2014-07-01	93.52	94.07	93.13	93.52	38170200	93.52

步骤7 有重复的日期吗？

In [324]:

apple.index.is_unique

Out[324]:

True

步骤8 将index设置为升序

df.sort_index(inplace=True)
df.head()

Out[325]:

	Open	High	Low	Close	Volume	Adj Close
Date
1980-12-12	28.75	28.87	28.75	28.75	117258400	0.45
1980-12-15	27.38	27.38	27.25	27.25	43971200	0.42
1980-12-16	25.37	25.37	25.25	25.25	26432000	0.39
1980-12-17	25.87	26.00	25.87	25.87	21610400	0.40
1980-12-18	26.63	26.75	26.63	26.63	18362400	0.41

步骤9 找到每个月的最后一个交易日(business day)、并计算各项的平均值

apple_month = df.resample(rule= 'BM').mean()  # BM代表最后一个交易日、见 rule参数表
apple_month.head()

Out[326]:

	Open	High	Low	Close	Volume	Adj Close
Date
1980-12-31	30.481538	30.567692	30.443077	30.443077	2.586252e+07	0.473077
1981-01-30	31.754762	31.826667	31.654762	31.654762	7.249867e+06	0.493810
1981-02-27	26.480000	26.572105	26.407895	26.407895	4.231832e+06	0.411053
1981-03-31	24.937727	25.016818	24.836364	24.836364	7.962691e+06	0.387727
1981-04-30	27.286667	27.368095	27.227143	27.227143	6.392000e+06	0.423333

步骤10 数据集中最早的日期和最晚的日期相差多少天？

(apple.index.max() - apple.index.min()).days

Out[327]:

步骤11 在数据中一共有多少个月？

In [328]:

apple_months = apple.resample('BM').mean()
len(apple_months.index)

Out[328]:

步骤12 按照时间顺序可视化Adj Close值

In [329]:

# makes the plot and assign it to a variable
appl_open = apple['Adj Close'].plot(title = "Apple Stock")

在这里插入图片描述

参考链接

https://blog.csdn.net/lys_828/article/details/104661045
https://www.kesci.com/api/notebooks/5c69407b336a0d002c184f46/RenderedContent#%E7%BB%83%E4%B9%A0%E9%A2%98%E7%B4%A2%E5%BC%95

pandas resample重采样频率介绍、附案例

文章目录

一、函数简单介绍

案例一：简单使用

补充：常用聚合方法

案列二：时间序列

探索Apple公司股价数据

参考链接

猜你喜欢

pandas resample重采样频率介绍 、附案例

文章目录

一、函数简单介绍

案例一：简单使用

补充 ：常用聚合方法

案列二：时间序列

探索Apple公司股价数据

参考链接

猜你喜欢

pandas resample重采样频率介绍、附案例

补充：常用聚合方法