python时间序列

1.显示当前日期

import datetime
now=datetime.datetime.now()
print(now)

2.可以做时间上的差值运算

delta=datetime.datetime(2019,02,22)-datetime.datetime(2019,01,01)
print(delta)

3.字符串与datet日期型互换

1.strptime型

value='2019-02-22'
time_1=pd.datetime.strptime(value,'%Y-%m-%d')
print(time_1)
print(type(time_1))

结果输出
2019-02-22 00:00:00
<class ‘datetime.datetime’>

2.strftime型

value=datetime.datetime(2019,2,22)
new_value=value.strftime('%Y-%m-%d')
print(new_value)
print(type(new_value))

结果输出
2019-02-22
<class ‘str’>

strptime和strftime的区别在哪？

首先，两者在拼写上就只有一字之差，我们可以这样理解：strptime中的p为**present目前的首字母，strftime中的f为***future将来***的首字母。

其次，两者的参数strptime中有两个参数，第一第二分别表示要转换的字符串，此时字符串的格式。而strftime只有一个参数,表示为将要转化的格式。

再次，strptime是将字符串型转日期型，而strftime是将日期型转字符串型。

（但怎么记？可以这样理解目前的时间“是真正的时间”因为时间最重要就是要把握现在，其他什么将来都是扯淡）

# datetime.strptime

从上面的字符日期型数据转化的格式中，我们都要写格式，datetime.strptime是在已知格式的情况下转换日期的好方式。但每次都必须编写一个格式代码可能有点烦人怎么办呢？使用第三方包dateutil包的parser.parse方法

from dateutil.parser import parse
print(parse('2011-01-03'))
print(parse('Jan 31,1997 10:45 PM'))

结果显示
2011-01-03 00:00:00
2019-01-31 22:45:00

补充说明一下

#在国际场合下，日期出现在月份之前很常见，因此你可以传递dayfirst=True来表明这种情况

print(parse('16/12/2018',dayfirst=True))

结果显示
2018-12-16 00:00:00

索引、选择、子集

1.根据索引选择相应的数据

dates=[datetime.datetime(2011,1,2),datetime.datetime(2011,1,5),datetime.datetime(2011,1,7),datetime.datetime(2011,1,8),datetime.datetime(2011,1,10),datetime.datetime(2011,1,12)]
ts=pd.Series(np.random.randn(6),index=dates)
print(ts)
stamp=ts.index[2]
print(ts[stamp])

ts:
2011-01-02 -0.564252
2011-01-05 -1.092965
2011-01-07 1.794356
2011-01-08 0.765545
2011-01-10 0.511482
2011-01-12 -0.180311
dtype: float64

ts[stamp]:
1.7943558404193065

2.根据一个能解释为日期的字符串选择对应的数据

print(ts['1/10/2011'])

3.对于长达上千上万的时间序列，可以通过相应的年或年月来对数据进行选取

longer_ts=pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000',periods=1000))
print(longer_ts)
print(longer_ts['2001'])
print(longer_ts['2001-05'])

当然也可以使用datetime对象进行切片

print(longer_ts[datetime.datetime(2011,1,2):])
print(ts['1/6/2011':'1/11/2011'])
print(ts.truncate(after='1/9/2011'))

含有重复索引的时间序列

dates=pd.DatetimeIndex(['1/1/2000','1/2/2000','1/2/2000','1/2/2000','1/3/2000'])
dup_ts=pd.Series(np.arange(5),index=dates)  #复重
print(dup_ts)
print(dup_ts.index.is_unique)
print(dup_ts['1/3/2000']) # 不重复
print(dup_ts['1/2/2000']) # 重复
# 遇到的疑惑，unique和is_unique有什么关联吗？
print(dup_ts.unique())  # 默认是对值进行去重的值
print(dup_ts.index.unique()) # 要相对索引进行去重查看，加index
'''
    总结两者的区别：
    is_unique:对总体（可以是对值，也可以是对索引，本例是对索引）进行判断，有重复的返回True，否则返回Flase
    unique：对总体（可以是对值，也可以是对索引，本例是对索引）进行考虑，只留下去重后的值给我们看
    
'''

结果输出
2000-01-01 0
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-03 4
dtype: int32

False

4
2000-01-02 1
2000-01-02 2
2000-01-02 3
dtype: int32

假设你想要聚合含有非唯一时间戳的数据，一种方式就是使用groupby并传递level=0：

# 对含有重复值的时间类型（索引）进行分组，在grouped里面要用到level=0
grouped=dup_ts.groupby(level=0)
print(grouped.mean())
print(grouped.count())

结果输出：
2000-01-01 0
2000-01-02 2
2000-01-03 4
dtype: int32

2000-01-01 1
2000-01-02 3
2000-01-03 1
dtype: int64

日期范围、频率和位移

1.生成日期范围

大概分为三种情形
#1.知道起止日期的

index=pd.date_range('2012-04-01','2012-06-01')
print(index)

#2.知道开始时间和期数的

index1=pd.date_range(start='2012-01-01',periods=20)
print(index1)

#3.知道截止日期和期数的

index2=pd.date_range(end='2012-06-1',periods=5)
print(index2)

如果你需要一个包含每月最后业务日期的时间索引，你可以传递‘BM’频率（Business end of month，月度业务结尾）但是只有在或在日期范围内的日期会被包括

dd=pd.date_range('2000-01-01','2000-12-01',freq='BM')
print(dd)
d1=pd.date_range('2012-05-02 12:56:31',periods=5)
print(d1)
#  normalize 用于生成标准化为零点的时间戳
d2=pd.date_range('2012-05-02',periods=20,normalize=True)
print(d2)

其实，你还可以这样做

g1=pd.date_range('2000-01-01','2000-01-03 23:59'.freq='4h')
print(g1)
g2=pa.date_range('2000-01-01',periods=10,freq='1h30min')
print(g2)

月中某星期的日期

（week of month）

# 每个月的第三周的星期五
g3=pd.date_range('2012-01-01','2012-12-31',freq='WOM-3FRI')
print(g3)

结果输出
[Timestamp(‘2019-01-18 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-02-15 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-03-15 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-04-19 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-05-17 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-06-21 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-07-19 00:00:00’, freq=‘WOM-3FRI’), Timestamp(‘2019-08-16 00:00:00’, freq=‘WOM-3FRI’)]

移位（前向和后向）日期

ts=pd.Series(np.random.randn(4),index=pd.date_range('2000-01-01',freq='M',periods=4)
print(ts)
# 当前数据向后面的数据移动，索引值不变
print(ts.shift(2))
# 当前数据向前面的数据移动，索引值不变
print(ts.shift(-2))

结果输出

2000-01-31 0.120563
2000-02-29 1.195027
2000-03-31 -0.972080
2000-04-30 0.773595
Freq: M, dtype: float64

2000-01-31 NaN
2000-02-29 NaN
2000-03-31 0.120563
2000-04-30 1.195027
Freq: M, dtype: float64

2000-01-31 -0.972080
2000-02-29 0.773595
2000-03-31 NaN
2000-04-30 NaN
Freq: M, dtype: float64
应用场景：
shift经常用于计算时间序列或者DataFrame多列时间序列的百分比变化，代码实现如下

# 口述翻译：第二期/第一期-1放在第二期上面，做法有点像差分
print(ts/ts.shift(1)-1)

#但是这样由于简单移位并不改变索引，一些数据会被丢弃。因此，如果频率是已知的，则可以将频率传递给shift来推移时间戳而不是简单的数据。

print(ts.shift(1,freq='M'))
# 这段代码输出的结果如下，其中前三行的数据是不变的，时间轴整体也像后面的数据月份+1，而原始数据最后一行就是第四行，后面没有数据了，默认最后一行为原始数据的最后一行。

结果输出：
2000-02-29 -1.267211
2000-03-31 -0.492248
2000-04-30 -0.278425
2000-05-31 -0.131732
Freq: M, dtype: float64