Creating new columns from pandas df column names and values

LizzAlice :

I have a dataframe of the form:

basket            fruit_apple   fruit_pear   fruit_cherry
basket_id_1       5             NaN          6
basket_id_2       NaN           1            NaN

and I want to create two new columns that should look like this:

basket          fruit_type   number
basket_id_1     apple        5
basket_id_1     pear         NaN
basket_id_1     cherry       6
basket_id_2     apple        NaN
basket_id_2     pear         1
basket_id_2     cherry       NaN

where the contents of fruit_type are determined automatically with a colname.split('_')[1] How can this be done in a concise way?

jezrael :

I think faster is change column names insted new column (because less number of values), so I suggest use rename in first step and reshape with DataFrame.melt, last change order by DataFrame.sort_values:

df = (df.rename(columns= lambda x: x.split('_')[-1])
        .melt('basket', var_name='fruit_type', value_name='number')
        .sort_values('basket'))
print (df)
        basket fruit_type  number
0  basket_id_1      apple     5.0
2  basket_id_1       pear     NaN
4  basket_id_1     cherry     6.0
1  basket_id_2      apple     NaN
3  basket_id_2       pear     1.0
5  basket_id_2     cherry     NaN

Another idea is use DataFrame.set_index with DataFrame.stack:

df = (df.set_index('basket')
        .rename(columns= lambda x: x.split('_')[-1])
        .rename_axis('fruit_type', axis=1)
        .stack(dropna=False)
        .reset_index(name='number')
        )
print (df)
        basket fruit_type  number
0  basket_id_1      apple     5.0
1  basket_id_1       pear     NaN
2  basket_id_1     cherry     6.0
3  basket_id_2      apple     NaN
4  basket_id_2       pear     1.0
5  basket_id_2     cherry     NaN

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=3960&siteId=1