Pandas library based on the analysis - Detailed processing time series

  When using the Python data analysis, often encounter time and date format conversion processing, analysis and data mining in particular time-related, such as quantitative trading is to look for changes of stock prices from the historical data. Python comes processing time module datetime, NumPy library also provides a corresponding method, as the data analysis Pandas BANK Python environment, but also provides a powerful data processing date, the processing tool is a time series.

  1, generating a sequence of dates

  Mainly provides pd.data_range () and pd.period_range () two methods, parameters are given start time, end time, and the number of times generated time-frequency (freq = 'M' month, 'D' day, 'W ', weeks,' the Y ') and other.

  The two main difference is that pd.date_range () is generated sequence DatetimeIndex date format; pd.period_range () is generated sequence PeriodIndex date format.

  The following months by generating time-series sequence and to compare the periphery:

  date_rng = pd.date_range('2019-01-01', freq='M', periods=12)

  print(f'month date_range():\n{date_rng}')

  

  date_range():

  DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',

  '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',

  '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],

  dtype='datetime64[ns]', freq='M')

  

  period_rng = pd.period_range('2019/01/01', freq='M', periods=12)

  print(f'month period_range():\n{period_rng}')

  

  period_range():

  PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06',

  '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12'],

  dtype='period[M]', freq='M')

  

  date_rng = pd.date_range('2019-01-01', freq='W-SUN', periods=12)

  print(f'week date_range():\n{date_rng}')

  

  week date_range():

  DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',

  '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',

  '2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24'],

  dtype='datetime64[ns]', freq='W-SUN')

  

  period_rng=pd.period_range('2019-01-01',freq='W-SUN',periods=12)

  print(f'week period_range():\n{period_rng}')

  

  week period_range():

  PeriodIndex(['2018-12-31/2019-01-06', '2019-01-07/2019-01-13',

  '2019-01-14/2019-01-20', '2019-01-21/2019-01-27',

  '2019-01-28/2019-02-03', '2019-02-04/2019-02-10',

  '2019-02-11/2019-02-17', '2019-02-18/2019-02-24',

  '2019-02-25/2019-03-03', '2019-03-04/2019-03-10',

  '2019-03-11/2019-03-17', '2019-03-18/2019-03-24'],

  dtype='period[W-SUN]', freq='W-SUN')

  

  date_rng = pd.date_range('2019-01-01 00:00:00', freq='H', periods=12)

  print(f'hour date_range():\n{date_rng}')

  

  hour date_range():

  DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:00:00',

  '2019-01-01 02:00:00', '2019-01-01 03:00:00',

  '2019-01-01 04:00:00', '2019-01-01 05:00:00',

  '2019-01-01 06:00:00', '2019-01-01 07:00:00',

  '2019-01-01 08:00:00', '2019-01-01 09:00:00',

  '2019-01-01 10:00:00', '2019-01-01 11:00:00'],

  dtype='datetime64[ns]', freq='H')

  

  period_rng=pd.period_range('2019-01-01 00:00:00',freq='H',periods=12)

  print(f'hour period_range():\n{period_rng}')

  

  hour period_range():

  PeriodIndex(['2019-01-01 00:00', '2019-01-01 01:00', '2019-01-01 02:00',

  '2019-01-01 03:00', '2019-01-01 04:00', '2019-01-01 05:00',

  '2019-01-01 06:00', '2019-01-01 07:00', '2019-01-01 08:00',

  '2019-01-01 09:00', '2019-01-01 10:00', '2019-01-01 11:00'],

  dtype='period[H]', freq='H')

  

  2, and generates a conversion Timestamp object

  Creating a timestamp Timestamp object has pd.Timestamp () method and pd.to_datetime () method. As follows:

  ts=pd.Timestamp(2019,1,1)

  print(f'pd.Timestamp()-1:{ts}')

  #pd.Timestamp()-1:2019-01-01 00:00:00

  ts=pd.Timestamp(dt(2019,1,1,hour=0,minute=1,second=1))

  print(f'pd.Timestamp()-2:{ts}')

  #pd.Timestamp()-2:2019-01-01 00:01:01

  ts=pd.Timestamp(2019-1-1 0:1:1)

  print(f'pd.Timestamp()-3:{ts}')

  #pd.Timestamp()-3:2019-01-01 00:01:01

  print(f'pd.Timestamp()-type:{type(ts)}')

  #pd.Timestamp()-type:

  # Dt = pd.to_datetime (2019,1,1) is not supported

  dt=pd.to_datetime(dt(2019,1,1,hour=0,minute=1,second=1))

  print(f'pd.to_datetime()-1:{dt}')

  #pd.to_datetime()-1:2019-01-01 00:01:01

  dt=pd.to_datetime(2019-1-1 0:1:1)

  print(f'pd.to_datetime()-2:{dt}')

  #pd.to_datetime()-2:2019-01-01 00:01:01

  print(f'pd.to_datetime()-type:{type(dt)}')

  #pd.to_datetime()-type:

  # Pd.to_datetime generate custom time series

  dtlist=pd.to_datetime([2019-1-1 0:1:1, 2019-3-1 0:1:1])

  print(f'pd.to_datetime()-list:{dtlist}')

  #pd.to_datetime()-list:DatetimeIndex(['2019-01-01 00:01:01', '2019-03-01 00:01:01'], dtype='datetime64[ns]', freq=None)

  # Timestamp converted to period-month period

  pr = ts.to_period('M')

  print(f'ts.to_period():{pr}')

  #ts.to_period():2019-01

  print(f'pd.to_period()-type:{type(pr)}')

  #pd.to_period()-type:

  3, and generates a target conversion period

  #Define time period

  per=pd.Period('2019')

  print(f'pd.Period():{per}')

  #pd.Period():2019

  per_del=pd.Period('2019')-pd.Period('2018')

  print (f'2019 and 2018 of spacer {per_del} ') can be directly # + - integer (on behalf)

  # 2019 and 2018 gap year

  # Time to a timestamp

  print(per.to_timestamp(how='end'))#2019-12-31 00:00:00

  print(per.to_timestamp(how='start'))#2019-01-01 00:00:00

  4, generation interval Timedelta

  # Generation interval Timedelta

  print(pd.Timedelta(days=5, minutes=50, seconds=20, milliseconds=10, microseconds=10, nanoseconds=10))

  #5 days 00:50:20.010010

  # Get the current time

  now=pd.datetime.now()

  # Calculate the current time 50 days later date

  dt=now+pd.Timedelta(days=50)

  print (f 'current time is {now}, 50 days after the time {dt}')

  # The current time is 2019-06-0817: 59: 31.726065, 50 days after the time 2019-07-2817: 59: 31.726065

  # Only display date

  print(dt.strftime('%Y-%m-%d'))#2019-07-28

  5, frequency conversion and resampling

  #asfreq display the index value on a quarterly basis

  #'DatetimeIndex' object has no attribute 'asfreq'

  date=pd.date_range('1/1/2018', periods=20, freq='D')

  tsdat_series=pd.Series(range(20),index=date)

  tsp_series=tsdat_series.to_period('D')

  print(tsp_series.index.asfreq('Q'))

  date=pd.period_range('1/1/2018', periods=20, freq='D')

  tsper_series=pd.Series(range(20),index=date)

  print(tsper_series.index.asfreq('Q'))

  

  PeriodIndex(['2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',

  '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',

  '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',

  '2018Q1', '2018Q1'],

  dtype='period[Q-DEC]', freq='Q-DEC')

  

  #resample quarterly statistics and display

  print(tsdat_series.resample('Q').sum().to_period('Q'))

  

  2018Q1 190

  Freq: Q-DEC, dtype: int64

  

  #groupby summarize averaging weekly

  print(tsdat_series.groupby(lambda x:x.weekday).mean())

  

  0 7.0

  1 8.0

  2 9.0

  3 10.0

  4 11.0

  5 12.0

  6 9.5

  dtype: float64

  


Reproduced in: https: //juejin.im/post/5cfe0230f265da1b8d161155

Guess you like

Origin blog.csdn.net/weixin_34184158/article/details/91480357