Average over dataframes

Karim :

Is there a direct way to take the average over multiple dataframes (multiple runs of a simulation for example)? One way that I am using, with 3 dataframmes (df1, df2, df3), but is not the most efficient when having a large number of dataframes is:

(df1+df2+df3)/3

Is there a way to just tell Python to do something more direct like mean(df1,df2,df3)?

jezrael :

To avoid concat it is possible to convert all data to numpy arrays and use mean by axis=0, last convert output to DataFrame constructor:

df1 = pd.DataFrame({
         'A':[4,5,4],
         'B':[7,8,90],
})

df2 = pd.DataFrame({
         'A':[4,50,4],
         'B':[7,8,9],
})

df3 = pd.DataFrame({
         'A':[40,5,4],
         'B':[7,8,9],
})

print ((df1+df2+df3)/3)
      A     B
0  16.0   7.0
1  20.0   8.0
2   4.0  36.0

dfs = [df1, df2, df3]
df = pd.DataFrame(np.array([x.to_numpy() for x in dfs]).mean(axis=0), 
                  index=df1.index, 
                  columns=df1.columns)
print (df)
      A     B
0  16.0   7.0
1  20.0   8.0
2   4.0  36.0

For oldier pandas version change DataFrame.to_numpy to DataFrame.values :

df = pd.DataFrame(np.array([x.values for x in dfs]).mean(axis=0), 
                  index=df1.index, 
                  columns=df1.columns)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=10225&siteId=1