I have a dataframe that looks sort of looks like this
data
time market
2020-03-02 00:00:00 Commercial 78.0
Residential 79.0
2020-03-02 04:45:15 Commerical 73.0
Residential 79.0
2020-03-02 06:45:29 Commerical 79.0
Residential 71.0
What I want to do is if the User selects a different time format e.g. %Y-%m-%d, to apply it to the first column. However if this happens, there are duplicates in that column (e.g. will have 3 2020-03-02 values) which Multiindex.set_levels does not accept. So I need to somehow group them together and sum the values while also keeping it sorted in time order.
Ideal Output
data
time market
2020-03-02 Commercial 230.0
Residential 229.0
My code
elem = df.index.get_level_values(0).sort_values().strftime("%Y-%m-%d")
df.index.set_levels(elem, level=0, inplace=True, verify_integrity=False)
df.groupby(['time', 'market']).sum()
This code results in duplicates in the time column as well as the market column which is strange. It seems like it's almost concatenating values in the market column
Also I really do not want to change the structure via flattening it or anything like that to not restrict the user.
IIUC, you can groupby time
by day and market
. Also, you need to make sure that time
is datetime
type:
(df.groupby([df.index.get_level_values('time')
.normalize(), 'market'])
.sum()
)
Output:
data
time market
2020-03-02 Commercial 230.0
Residential 229.0