I am trying to implement Variable Elimination algorithm using Pandas. If anyone is familiar with the sum-out step, given a table where a, b, c are variables (not necessarily boolean-valued) and f is some function (factor) given the values of these variables.
a b c f
0 True True True 0.015
1 True True False 0.035
2 True False True 0.270
3 True False False 0.180
4 False True True 0.030
5 False True False 0.070
6 False False True 0.240
7 False False False 0.160
I want to sum f
s of all rows where (a,c)=(T,T)
, also f
s of all rows where (a,c)=(T,F)
, (F,T)
, (F,F)
. The result looks like
a c f
0 True True 0.285
1 True False 0.215
2 False True 0.27
3 False False 0.23
Hence the name "sum-out (b
)".
The closest I can get is using pd.pivot_table(df, index=df.index.values, columns=['a', 'c'], values='f', aggfunc=np.sum, fill_value=0).sum()
which returns
a c
False False xxx
True xxx
True False xxx
True xxx
Another unstack
can give us
c False True
a
False xxx xxx
True xxx xxx
which is still not what I want.
Note that I can have arbitrarily many variables, and arbitrarily many variables to sum-out (or to keep). So though in this case I can do pd.pivot_table(df, index=<some of the var left, e.g. a>, columns=<other var left, e.g. c>, values='f', aggfunc=np.sum)
to get the same result, in other cases there may be only one variable left, or too many.
The variables may not be boolean type, but they should have finite & discrete domains.
Also note that my index here are only dummy meaningless index. By "keeping original index" I mean just leave them dummy, but somehow only aggregate along axis=0
.
In addition, it is also OK if anyone can propose a better multi-dimensional-array like data structure to do the job.
You can use groupby and agg functions like this.
df.groupby(['a','c'])['f'].agg('sum').reset_index()