Replace column in Pandas dataframe with the mean of that column

Dean Power :

I have a dataframe:

df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])

   A  B
0  1  2
1  1  3
2  4  6

I want to return a dataframe of the same size containing the mean of each column:

   A      B
0  2  3.666
1  2  3.666
2  2  3.666

Is there a simple way of doing this?

ALollz :

Recreate the DataFrame. Send the mean Series to a dict, then the index defines the number of rows.

pd.DataFrame(df.mean().to_dict(), index=df.index)

#     A         B
#0  2.0  3.666667
#1  2.0  3.666667
#2  2.0  3.666667

Same concept, but creating the full array first saves a decent amount of time.

pd.DataFrame(np.broadcast_to(df.mean(), df.shape), 
             index=df.index, 
             columns=df.columns)

Here are some timings. Of course this will depend slightly on the number of columns but you can see there are pretty large differences when you provide the entire array to begin with

import perfplot
import pandas as pd
import numpy as np

perfplot.show(
    setup=lambda N: pd.DataFrame(np.random.randint(1,100, (N, 5)),
                                 columns=[str(x) for x in range(5)]), 
    kernels=[
        lambda df: pd.DataFrame(np.broadcast_to(df.mean(), df.shape), index=df.index, columns=df.columns),
        lambda df: df.assign(**df.mean()),
        lambda df: pd.DataFrame(df.mean().to_dict(), index=df.index)
    ],
    labels=['numpy broadcast', 'assign', 'dict'],
    n_range=[2 ** k for k in range(1, 22)],
    equality_check=np.allclose,
    xlabel="Len(df)"
)

enter image description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=361025&siteId=1