Pandas Multiindex Pivot Table Date Format Change with Sorting and Loss of Precision

Suyash Ganu :

I have a dataframe that looks sort of looks like this

                                                 data
    time                       market
    2020-03-02 00:00:00        Commercial        78.0
                               Residential       79.0
    2020-03-02 04:45:15        Commerical        73.0
                               Residential       79.0
    2020-03-02 06:45:29        Commerical        79.0
                               Residential       71.0

What I want to do is if the User selects a different time format e.g. %Y-%m-%d, to apply it to the first column. However if this happens, there are duplicates in that column (e.g. will have 3 2020-03-02 values) which Multiindex.set_levels does not accept. So I need to somehow group them together and sum the values while also keeping it sorted in time order.

Ideal Output

                                        data
    time              market
    2020-03-02        Commercial        230.0
                      Residential       229.0

My code

elem = df.index.get_level_values(0).sort_values().strftime("%Y-%m-%d")
df.index.set_levels(elem, level=0, inplace=True, verify_integrity=False)
df.groupby(['time', 'market']).sum()

This code results in duplicates in the time column as well as the market column which is strange. It seems like it's almost concatenating values in the market column

Also I really do not want to change the structure via flattening it or anything like that to not restrict the user.

Quang Hoang :

IIUC, you can groupby time by day and market. Also, you need to make sure that time is datetimetype:

(df.groupby([df.index.get_level_values('time')
               .normalize(), 'market'])
   .sum()
)

Output:

                         data
time       market            
2020-03-02 Commercial   230.0
           Residential  229.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=198059&siteId=1