Create new columns from aggregates of other columns in pandas

Joram :

I have the following dataframe:

   col1    col2  col3
0   tom     2    cash
1   tom     3    gas
2   tom     5    online
3   jerry   1    online
4   jerry   4    online
5   jerry   5    gas
6   scooby  8    cash
7   scooby  6    dogfood
8   scooby  1    cheese

easily obtained with:

data = {'col1': ['tom', 'tom', 'tom', 'jerry', 'jerry', 'jerry', 'scooby', 'scooby', 'scooby'],
'col2': [2,3,5,1,4,5,8,6,1],
'col3':['cash', 'gas', 'online', 'online', 'online', 'gas', 'cash', 'dogfood', 'cheese']}

pd.DataFrame(data)

How would one group the data by col1, then as extra columns, get specific aggregates for specified values of col3.

As an example, say I want to group by col1 and get the sum of the gas, cash and online fields for everyone in col1, like this.

col1    gas_sum    cash_sum    online_sum
tom        3          2             5
jerry      5          0             5
scooby     0          8             0

I am relatively new to pandas and the only way I can think of to do this is with a for loop through all the data, as groupby's purpose is more to give the sum/mean of columns like col2 in my example.

Any help appreciated.

Chris A :

Another way using pivot_table. We'll also use reindex to get only the values you're interested in and add_suffix to change your column names:

# Values to sum
values = ['cash', 'gas', 'online']

df_out = (df.pivot_table(index='col1', columns='col3',
                         values='col2', aggfunc='sum',
                         fill_value=0)
 .reindex(columns=values, fill_value=0)
 .add_suffix('_sum'))

[out]

col3    cash_sum  gas_sum  online_sum
col1                                 
jerry          0        5           5
scooby         8        0           0
tom            2        3           5

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=169767&siteId=1