Olivia :
import pandas as pd
import numpy as np
data = {'dateTimeGmt': {0: pd.Timestamp('2020-01-01 06:44:00'),
1: pd.Timestamp('2020-01-01 06:45:00'), 2: pd.Timestamp('2020-01-01 07:11:00'), 3: pd.Timestamp('2020-01-01 07:12:00'), 4: pd.Timestamp('2020-01-01 07:12:00'), 5: pd.Timestamp('2020-01-01 07:14:00'), 6: pd.Timestamp('2020-01-01 10:04:00'), 7: pd.Timestamp('2020-01-01 10:04:00'), 8: pd.Timestamp('2020-01-01 11:45:00'), 9: pd.Timestamp('2020-01-01 06:45:00')},
'id': {0: 4, 1: 4, 2: 4, 3: 5, 4: 5, 5: 5, 6: 5, 7: 6, 8: 6, 9: 6},
'name': {0: 'four', 1: 'four', 2: 'four', 3: 'five', 4: 'five', 5: 'five', 6: 'five', 7: 'six', 8: 'six', 9: 'six'}, 'a': {0: 1.0, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: 5.0, 8: np.nan, 9: np.nan}, 'b': {0: np.nan, 1: 3.0, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: np.nan, 8: np.nan, 9: 3.0}, 'c': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: 2.0, 5: np.nan, 6: np.nan, 7: np.nan, 8: 0.0, 9: np.nan}}
df = pd.DataFrame(data)
I would like to flatten my dataframe such that all columns after name
are grouped by the hour in dateTimeGmt
and then by id
/name
.
I tried df2 = df.groupby([df.dateTimeGmt.dt.date, df.dateTimeGmt.dt.hour, df.id, df.name]).sum()
This seems to work but combines all my grouping columns into the index.
df3 = df.groupby([df.dateTimeGmt.dt.date, df.dateTimeGmt.dt.hour, df.id, df.name], as_index = False).sum()
keeps id
and name
but the dateTimeGmt
data is lost.
How do I group my data without losing the columns that have been grouped by?
jezrael :
In your solution is necessary add rename
for date
and hour
s columns names for avoid duplicated columns names, and last DataFrame.reset_index
:
df2 = (df.groupby([df.dateTimeGmt.dt.date.rename('date'),
df.dateTimeGmt.dt.hour.rename('h'), 'id', 'name'])
.sum()
.reset_index())
print (df2)
date h id name a b c
0 2020-01-01 6 4 four 1.0 3.0 0.0
1 2020-01-01 6 6 six 0.0 3.0 0.0
2 2020-01-01 7 4 four 0.0 0.0 0.0
3 2020-01-01 7 5 five 0.0 0.0 2.0
4 2020-01-01 10 5 five 0.0 0.0 0.0
5 2020-01-01 10 6 six 5.0 0.0 0.0
6 2020-01-01 11 6 six 0.0 0.0 0.0
Or is possible use Grouper
by hour frequency:
df2 = df.groupby([pd.Grouper(freq='H', key='dateTimeGmt'), 'id', 'name']).sum().reset_index()
print (df2)
dateTimeGmt id name a b c
0 2020-01-01 06:00:00 4 four 1.0 3.0 0.0
1 2020-01-01 06:00:00 6 six 0.0 3.0 0.0
2 2020-01-01 07:00:00 4 four 0.0 0.0 0.0
3 2020-01-01 07:00:00 5 five 0.0 0.0 2.0
4 2020-01-01 10:00:00 5 five 0.0 0.0 0.0
5 2020-01-01 10:00:00 6 six 5.0 0.0 0.0
6 2020-01-01 11:00:00 6 six 0.0 0.0 0.0
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=346702&siteId=1