How to get weekly averages for column values and week number for the corresponding year based on daily data records with pandas

Baobab1988 :

I'm still learning python and would like to ask your help with the following problem:

I have a csv file with daily data and I'm looking for a solution to sum it per calendar weeks. So for the mockup data below I have rows stretched over 2 weeks (week 14 (current week) and week 13 (past week)). Now I need to find a way to group rows per calendar week, recognize what year they belong to and calculate week sum and week average. In the file input example there are only two different IDs. However, in the actual data file I expect many more.

input.csv

id   date      activeMembers
1  2020-03-30       10
2  2020-03-30       1
1  2020-03-29       5
2  2020-03-29       6
1  2020-03-28       0
2  2020-03-28       15
1  2020-03-27       32
2  2020-03-27       10
1  2020-03-26       9
2  2020-03-26       3
1  2020-03-25       0
2  2020-03-25       0
1  2020-03-24       0
2  2020-03-24       65
1  2020-03-23       22
2  2020-03-23       12
...

desired output.csv

id   week      WeeklyActiveMembersSum   WeeklyAverageActiveMembers
1   202014              10                        1.4
2   202014              1                         0.1
1   202013              68                        9.7
2   202013              111                      15.9

my goal is to:

import pandas as pd

df = pd.read_csv('path/to/my/input.csv')

Here I'd need to group by 'id' + 'date' column (per calendar week - not sure if this is possible) and create a 'week' column with the week number, then sum 'activeMembers' values for the particular week, save as 'WeeklyActiveMembersSum' column in my output file and finally calculate 'weeklyAverageActiveMembers' for the particular week. I was experimenting with groupby and isin parameters but no luck so far... would I have to go with something similar to this:

df.groupby('id', as_index=False).agg({'date':'max',
                                  'activeMembers':'sum'}  

and finally save all as output.csv:

df.to_csv('path/to/my/output.csv', index=False)

Thanks in advance!

Quang Hoang :

It seems I'm getting a different week setting than you do:

# should convert datetime column to datetime type
df['date'] = pd.to_datetime(df['date'])

(df.groupby(['id',df.date.dt.strftime('%Y%W')], sort=False)
   .activeMembers.agg([('Sum','sum'),('Average','mean')])
   .add_prefix('activeMembers')
   .reset_index()
)

Output:

   id    date  activeMembersSum  activeMembersAverage
0   1  202013                10             10.000000
1   2  202013                 1              1.000000
2   1  202012                68              9.714286
3   2  202012               111             15.857143

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=386613&siteId=1