Analysis of the time series data base Python

  1. The time-series basis

  import numpy as np

  import pandas as pd

  np.random.seed(12345)

  import matplotlib.pyplot as plt

  plt.rc('figure', figsize=(10, 6))

  PREVIOUS_MAX_ROWS = pd.options.display.max_rows

  pd.options.display.max_rows = 20

  np.set_printoptions(precision=4, suppress=True)

  pandas basic type is a time series (usually expressed in datatime Python string or object) to a timestamp index Series:

  from datetime import datetime

  dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),

  datetime(2011, 1, 7), datetime(2011, 1, 8),

  datetime(2011, 1, 10), datetime(2011, 1, 12)]

  ts = pd.Series(np.random.randn(6), index=dates)

  ts

  These datetime object is actually being placed in a DatetimeIndex:

  ts.index

  

 

  Other Series with the same arithmetic operation between the time series of different indexes are automatically aligned by date:

  print (ts [:: 2]) # every take a

  ts + ts[::2]

  pandas with datetime64 NumPy of data types stored in nanosecond time stamp:

  ts.index.dtype

  

 

  DatetimeIndex the respective scalar values ​​are pandas Timestamp object:

  stamp = ts.index[0]

  stamp

  

 

  As long as there is a need, TimeStamp can always automatically converted to datetime object. In addition, it can be stored frequency information (if any), and know how to perform a time zone conversion and other operations. This will be explained in detail later.

  2. Index, select, subset construction

  When you select the index data in accordance with the label, and other pandas.Series time series like:

  print(ts)

  stamp = ts.index[2]

  print (ts [stamp]) # tab index

  print (ts [2]) # integer index

  There is also a more convenient usage: pass in a string can be interpreted as a date:

  print(ts['1/10/2011'])

  print(ts['20110110'])

  ts['2011-01-10']

  

 

  For longer time series, simply passing slices "of" or "years" you can easily select the data:

  longer_ts = pd.Series(np.random.randn(1000),

  index = pd.date_range ( '1/1/2000', periods = 1000)) # in days

  longer_ts

  longer_ts['2001']

  Here, the character string "2001" is interpreted adult, and based on its selected time interval. Specifies the month also work:

  longer_ts['2001-05']

  datetime object can also be sliced:

  print(ts)

  ts[datetime(2011, 1, 7):]

  Because most of the time-series data are in accordance with the chronological order, so you can also use does not exist in the time series of a timestamp slice (that is, the scope of the query):

  ts['1/6/2011':'1/11/2011']

  

 

  As before, you can pass a string date, datetime, or Timestamp index. Note that this is a view of the resulting slices (shared memory) original time series, with NumPy array slice operation is the same.

  This means that no data is copied sections were changes will be reflected in the original data.

  In addition, there is an example of a method equivalent TimeSeries may be taken between the two dates:

  ts.truncate (after = '1/9 /2011') , Wuxi and Women's Hospital Which is good http://www.xasgyy.net/

  

 

  These operations are also effective DataFrame. For example, the line DataFrame the index:

  dates = pd.date_range ( '1/1/2000', periods = 100, freq = 'W-WED') # circumferential spacing units

  long_df = pd.DataFrame(np.random.randn(100, 4),

  index=dates,

  columns=['Colorado', 'Texas',

  'New York', 'Ohio'])

  long_df.loc['5-2001']

  3. Time series with duplicate index values

  In some application scenarios, there may be a plurality of observational data falls on the same point in time situation. Here is an example:

  dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',

  '1/2/2000', '1/3/2000'])

  dup_ts = pd.Series(np.arange(5), index=dates)

  dup_ts

  

 

  Is_unique property by checking the index, we can know that it is not the only one:

  dup_ts.index.is_unique

  

 

  This sequence of time index, to produce either scalar values ​​or generating sections selected depends on the specific point in time whether to repeat:

  print(dup_ts['1/3/2000'])# not duplicated

  dup_ts['1/2/2000'] # duplicated

  

 

  假设你想要对具有非唯一时间戳的数据进行聚合。一个办法是使用 groupby,并传入level=0:

  grouped = dup_ts.groupby(level=0)

  print(grouped.mean())

  grouped.count()

Guess you like

Origin www.cnblogs.com/djw12333/p/11671131.html