Dean Power :
I have a dataframe:
df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
A B
0 1 2
1 1 3
2 4 6
I want to return a dataframe of the same size containing the mean of each column:
A B
0 2 3.666
1 2 3.666
2 2 3.666
Is there a simple way of doing this?
ALollz :
Recreate the DataFrame. Send the mean Series to a dict, then the index defines the number of rows.
pd.DataFrame(df.mean().to_dict(), index=df.index)
# A B
#0 2.0 3.666667
#1 2.0 3.666667
#2 2.0 3.666667
Same concept, but creating the full array first saves a decent amount of time.
pd.DataFrame(np.broadcast_to(df.mean(), df.shape),
index=df.index,
columns=df.columns)
Here are some timings. Of course this will depend slightly on the number of columns but you can see there are pretty large differences when you provide the entire array to begin with
import perfplot
import pandas as pd
import numpy as np
perfplot.show(
setup=lambda N: pd.DataFrame(np.random.randint(1,100, (N, 5)),
columns=[str(x) for x in range(5)]),
kernels=[
lambda df: pd.DataFrame(np.broadcast_to(df.mean(), df.shape), index=df.index, columns=df.columns),
lambda df: df.assign(**df.mean()),
lambda df: pd.DataFrame(df.mean().to_dict(), index=df.index)
],
labels=['numpy broadcast', 'assign', 'dict'],
n_range=[2 ** k for k in range(1, 22)],
equality_check=np.allclose,
xlabel="Len(df)"
)
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=361025&siteId=1