1. The time-series basis
import numpy as np
import pandas as pd
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))
PREVIOUS_MAX_ROWS = pd.options.display.max_rows
pd.options.display.max_rows = 20
np.set_printoptions(precision=4, suppress=True)
pandas basic type is a time series (usually expressed in datatime Python string or object) to a timestamp index Series:
from datetime import datetime
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
datetime(2011, 1, 7), datetime(2011, 1, 8),
datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts
These datetime object is actually being placed in a DatetimeIndex:
ts.index
Other Series with the same arithmetic operation between the time series of different indexes are automatically aligned by date:
print (ts [:: 2]) # every take a
ts + ts[::2]
pandas with datetime64 NumPy of data types stored in nanosecond time stamp:
ts.index.dtype
DatetimeIndex the respective scalar values are pandas Timestamp object:
stamp = ts.index[0]
stamp
As long as there is a need, TimeStamp can always automatically converted to datetime object. In addition, it can be stored frequency information (if any), and know how to perform a time zone conversion and other operations. This will be explained in detail later.
2. Index, select, subset construction
When you select the index data in accordance with the label, and other pandas.Series time series like:
print(ts)
stamp = ts.index[2]
print (ts [stamp]) # tab index
print (ts [2]) # integer index
There is also a more convenient usage: pass in a string can be interpreted as a date:
print(ts['1/10/2011'])
print(ts['20110110'])
ts['2011-01-10']
For longer time series, simply passing slices "of" or "years" you can easily select the data:
longer_ts = pd.Series(np.random.randn(1000),
index = pd.date_range ( '1/1/2000', periods = 1000)) # in days
longer_ts
longer_ts['2001']
Here, the character string "2001" is interpreted adult, and based on its selected time interval. Specifies the month also work:
longer_ts['2001-05']
datetime object can also be sliced:
print(ts)
ts[datetime(2011, 1, 7):]
Because most of the time-series data are in accordance with the chronological order, so you can also use does not exist in the time series of a timestamp slice (that is, the scope of the query):
ts['1/6/2011':'1/11/2011']
As before, you can pass a string date, datetime, or Timestamp index. Note that this is a view of the resulting slices (shared memory) original time series, with NumPy array slice operation is the same.
This means that no data is copied sections were changes will be reflected in the original data.
In addition, there is an example of a method equivalent TimeSeries may be taken between the two dates:
ts.truncate (after = '1/9 /2011') , Wuxi and Women's Hospital Which is good http://www.xasgyy.net/
These operations are also effective DataFrame. For example, the line DataFrame the index:
dates = pd.date_range ( '1/1/2000', periods = 100, freq = 'W-WED') # circumferential spacing units
long_df = pd.DataFrame(np.random.randn(100, 4),
index=dates,
columns=['Colorado', 'Texas',
'New York', 'Ohio'])
long_df.loc['5-2001']
3. Time series with duplicate index values
In some application scenarios, there may be a plurality of observational data falls on the same point in time situation. Here is an example:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
'1/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts
Is_unique property by checking the index, we can know that it is not the only one:
dup_ts.index.is_unique
This sequence of time index, to produce either scalar values or generating sections selected depends on the specific point in time whether to repeat:
print(dup_ts['1/3/2000'])# not duplicated
dup_ts['1/2/2000'] # duplicated
假设你想要对具有非唯一时间戳的数据进行聚合。一个办法是使用 groupby,并传入level=0:
grouped = dup_ts.groupby(level=0)
print(grouped.mean())
grouped.count()