python pandas 对带时间序列的数据进行重采样处理

今天老板要处理一批带有时间序列的数据，源数据为1秒钟一行的csv数据，处理之后变成15分钟一行的数据。

源数据示例如下：

               time     B00    B01      ...           RollMean2.5     RollMean10
2018-05-31 09:44:39  15.212  5.071      ...                  2.97           2.99
2018-05-31 09:44:40  17.202  4.047      ...                  2.90           3.08
2018-05-31 09:44:41  10.137  4.055      ...                  2.58           2.71
2018-05-31 09:44:42  11.961  1.994      ...                  2.39           2.49
2018-05-31 09:44:43  17.157  2.019      ...                  2.44           2.53
2018-05-31 09:44:44  12.972  3.991      ...                  2.44           3.29
2018-05-31 09:44:45  20.078  6.023      ...                  2.49           3.21

具体操作步骤如下：

（1）读取csv数据：

f = pd.read_csv(os.path.join(path1, file))

（2）将time列转换为 DatetimeIndex类型作为index值，删除time列：

f.index = pd.to_datetime(f.time.values)
del f.time

（3）使用resample函数重采样数据：

# ‘15T’表示间隔15分钟，其他间隔方式可自行查看文档说明
# sum()函数表示求和，还可以用mean()函数进行平均，其他计算方式暂时不明
resample = f.resample('15T').sum()

（4）将reample写入excel：

resample.to_excel(path1+'/'+csvf[0]+'.xlsx')

整个代码示例：

import os
import sys
import copy
import numpy as np
import pandas as pd
import openpyxl

# 获取当前脚本及数据文件夹路径
path = os.path.split(sys.argv[0])[0]
# 获取当前路径下文件夹名称
dirs = [x for x in os.listdir(path) if not os.path.splitext(x)[1]]
# 遍历当前路径文件夹内文件，读取合并数据
for dir_ in dirs:
    path1 = os.path.join(path, dir_)
    files = copy.copy(os.listdir(path1))
    for file in files:
        csvf = os.path.splitext(file)
        if csvf[1] == '.csv':
            f = pd.read_csv(os.path.join(path1, file))
            f.index = pd.to_datetime(f.time.values)
            del f['time']
            resample = f.resample('15T').sum()
            print(csvf[0])
            resample.to_excel(path1+'/'+csvf[0]+'.xlsx')

问题：excel或者csv的时间表示方式有时是以小数形式进行的，这次尚未学习如何将这种时间表示形式直接转换为DatetimeIndex类型，如果有同学知道，欢迎赐教，谢谢！

python pandas 对带时间序列的数据进行重采样处理

猜你喜欢