Pandas DataFrame how to group (pivot?) rows by values of specified columns, but keeping the original index?

fhfuih :

I am trying to implement Variable Elimination algorithm using Pandas. If anyone is familiar with the sum-out step, given a table where a, b, c are variables (not necessarily boolean-valued) and f is some function (factor) given the values of these variables.

       a      b      c      f
0   True   True   True  0.015
1   True   True  False  0.035
2   True  False   True  0.270
3   True  False  False  0.180
4  False   True   True  0.030
5  False   True  False  0.070
6  False  False   True  0.240
7  False  False  False  0.160

I want to sum fs of all rows where (a,c)=(T,T), also fs of all rows where (a,c)=(T,F), (F,T), (F,F). The result looks like

       a     c     f
0   True  True  0.285
1   True   False  0.215
2   False  True  0.27
3   False  False  0.23

Hence the name "sum-out (b)".

The closest I can get is using pd.pivot_table(df, index=df.index.values, columns=['a', 'c'], values='f', aggfunc=np.sum, fill_value=0).sum() which returns

a      c
False  False    xxx
       True     xxx
True   False    xxx
       True     xxx

Another unstack can give us

c      False  True
a
False   xxx   xxx
True    xxx   xxx

which is still not what I want.

Note that I can have arbitrarily many variables, and arbitrarily many variables to sum-out (or to keep). So though in this case I can do pd.pivot_table(df, index=<some of the var left, e.g. a>, columns=<other var left, e.g. c>, values='f', aggfunc=np.sum) to get the same result, in other cases there may be only one variable left, or too many.

The variables may not be boolean type, but they should have finite & discrete domains.

Also note that my index here are only dummy meaningless index. By "keeping original index" I mean just leave them dummy, but somehow only aggregate along axis=0.

In addition, it is also OK if anyone can propose a better multi-dimensional-array like data structure to do the job.

Boris :

You can use groupby and agg functions like this.

df.groupby(['a','c'])['f'].agg('sum').reset_index()

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=34438&siteId=1