Better way to add the result of apply (multiple outputs) to an existing DataFrame with column names

Bram Vanroy :

I am applying a function on the rows of a dataframe in pandas. That function returns four values (meaning, four values per row). In practice, this means that the returned object from the apply function is a Series containing tuples. I want to add these to their own columns. I know that I can convert that output to a DataFrame and then concatenate with the old DataFrame, like so:

import pandas as pd


def some_func(i):
    return i+1, i+2, i+3, i+4

df = pd.DataFrame(range(10), columns=['start'])
res = df.apply(lambda row: some_func(row['start']), axis=1)

# convert to df and add column names
res_df = res.apply(pd.Series)
res_df.columns = ['label_1', 'label_2', 'label_3', 'label_4']

# concatenate with old df
df = pd.concat([df, res_df], axis=1)
print(df)

My question is whether there is a better way to do this? Especially the res.apply(pd.Series) seems redundant, but I don't know a better alternative. Performance is an important factor for me.


As requested, an example input DataFrame could look like this

   start
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9

And the expected output, with the four added columns:

   start  label_1  label_2  label_3  label_4
0      0        1        2        3        4
1      1        2        3        4        5
2      2        3        4        5        6
3      3        4        5        6        7
4      4        5        6        7        8
5      5        6        7        8        9
6      6        7        8        9       10
7      7        8        9       10       11
8      8        9       10       11       12
9      9       10       11       12       13
Keval Dave :

Directly assigning values to the DataFrame would be faster than the concating.

This is one of the way to assign

df = pd.DataFrame(range(10), columns=['start'])

df['label_1'], df['label_2'], df['label_3'], df['label_4'] = zip(*[some_func(x) for x in df['start']])

This is faster than res.apply(pd.Series).

Refer adding multiple columns to pandas simultaneously for more ways to assign multiple columns.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=21787&siteId=1