HenryHub :
I have some random hourly time series data, (lets make some up) how do I resample for a daily max value as well as create a separate df column for the hour of the recorded daily max value?
import pandas as pd
import numpy as np
from numpy.random import randint
import os
np.random.seed(10) # added for reproductibility
rng = pd.date_range('10/9/2018 00:00', periods=1000, freq='1H')
df = pd.DataFrame({'Random_Number':randint(1, 100, 1000)}, index=rng)
df.index.name = 'Date'
Resample random value:
daily_summary = pd.DataFrame()
daily_summary['Random_Number_Resamp'] = df['Random_Number'].resample('D').max()
daily_summary.head()
And then an attempt for recording the hour that the daily max value happened...
daily_summary['Hour_Map'] = daily_summary.Random_Number_Resamp.index.strftime('%H').astype('int')
daily_summary
The code above doesnt throw an attribute error but the hour_map
will be zero.. How do I accomplish when the daily_summary
df is created that the hour_map also occurs in this step?
Quang Hoang :
You could do groupby
:
df.groupby(df.index.normalize())['Random_Number'].agg(['idxmax', 'max'])
Output (head):
idxmax max
Date
2018-10-09 2018-10-09 05:00:00 94
2018-10-10 2018-10-10 20:00:00 95
2018-10-11 2018-10-11 15:00:00 97
2018-10-12 2018-10-12 18:00:00 98
2018-10-13 2018-10-13 22:00:00 91