Karim :
Is there a direct way to take the average over multiple dataframes (multiple runs of a simulation for example)? One way that I am using, with 3 dataframmes (df1, df2, df3), but is not the most efficient when having a large number of dataframes is:
(df1+df2+df3)/3
Is there a way to just tell Python to do something more direct like mean(df1,df2,df3)
?
jezrael :
To avoid concat
it is possible to convert all data to numpy arrays and use mean
by axis=0
, last convert output to DataFrame
constructor:
df1 = pd.DataFrame({
'A':[4,5,4],
'B':[7,8,90],
})
df2 = pd.DataFrame({
'A':[4,50,4],
'B':[7,8,9],
})
df3 = pd.DataFrame({
'A':[40,5,4],
'B':[7,8,9],
})
print ((df1+df2+df3)/3)
A B
0 16.0 7.0
1 20.0 8.0
2 4.0 36.0
dfs = [df1, df2, df3]
df = pd.DataFrame(np.array([x.to_numpy() for x in dfs]).mean(axis=0),
index=df1.index,
columns=df1.columns)
print (df)
A B
0 16.0 7.0
1 20.0 8.0
2 4.0 36.0
For oldier pandas version change DataFrame.to_numpy
to DataFrame.values
:
df = pd.DataFrame(np.array([x.values for x in dfs]).mean(axis=0),
index=df1.index,
columns=df1.columns)