Joram :
I have the following dataframe:
col1 col2 col3
0 tom 2 cash
1 tom 3 gas
2 tom 5 online
3 jerry 1 online
4 jerry 4 online
5 jerry 5 gas
6 scooby 8 cash
7 scooby 6 dogfood
8 scooby 1 cheese
easily obtained with:
data = {'col1': ['tom', 'tom', 'tom', 'jerry', 'jerry', 'jerry', 'scooby', 'scooby', 'scooby'],
'col2': [2,3,5,1,4,5,8,6,1],
'col3':['cash', 'gas', 'online', 'online', 'online', 'gas', 'cash', 'dogfood', 'cheese']}
pd.DataFrame(data)
How would one group the data by col1
, then as extra columns, get specific aggregates for specified values of col3
.
As an example, say I want to group by col1
and get the sum of the gas
, cash
and online
fields for everyone in col1
, like this.
col1 gas_sum cash_sum online_sum
tom 3 2 5
jerry 5 0 5
scooby 0 8 0
I am relatively new to pandas and the only way I can think of to do this is with a for loop through all the data, as groupby
's purpose is more to give the sum/mean of columns like col2
in my example.
Any help appreciated.
Chris A :
Another way using pivot_table
. We'll also use reindex
to get only the values you're interested in and add_suffix
to change your column names:
# Values to sum
values = ['cash', 'gas', 'online']
df_out = (df.pivot_table(index='col1', columns='col3',
values='col2', aggfunc='sum',
fill_value=0)
.reindex(columns=values, fill_value=0)
.add_suffix('_sum'))
[out]
col3 cash_sum gas_sum online_sum
col1
jerry 0 5 5
scooby 8 0 0
tom 2 3 5