Pandas - conditional column reference with regex

Josh :

In pandas, I'm creating a new column which conditionally comes directly from another column based on a regex match.

It might be easier to just an example:

HAVE:

+-------+------------+------------+-------------+-----------+
| Color | value_Gold | value_Blue | value_Green | value_Red |
+-------+------------+------------+-------------+-----------+
| Gold  |         12 |          9 |          11 |         8 |
| Gold  |          2 |          7 |          12 |        11 |
| Blue  |          1 |          5 |           7 |         1 |
| Red   |          6 |         12 |           3 |        10 |
| Green |         11 |         10 |          11 |         2 |
+-------+------------+------------+-------------+-----------+

Code so that you can recreate this dataframe:

df = pd.DataFrame(np.array([['Gold',12,9,11,8],['Gold',2,7,12,11],['Blue',1,5,7,1],['Red',6,12,3,10],['Green',11,10,11,2]]),
columns = ['Color','value_Gold','value_Blue','value_Green','value_Red'])

WANT:

+-------+------------+------------+-------------+-----------+------+
| Color | value_Gold | value_Blue | value_Green | value_Red | WANT |
+-------+------------+------------+-------------+-----------+------+
| Gold  |         12 |          9 |          11 |         8 |   12 |
| Gold  |          2 |          7 |          12 |        11 |    2 |
| Blue  |          1 |          5 |           7 |         1 |    5 |
| Red   |          6 |         12 |           3 |        10 |   10 |
| Green |         11 |         10 |          11 |         2 |   11 |
+-------+------------+------------+-------------+-----------+------+

At first, I was thinking I should focus on eliminating the values I don't want in the dataframe first, and then doing some sort of coalesce. But I'm really not sure what the most elegant way to do this is.

Ideas?

Edit: Another thought I had, was to transpose the value_ columns and use it for a lookup, but the truth is that my real dataset has hundreds of thousands of columns and so I didn't think this route was best. In reality, think 200K rows and 20 unique colors.

Quang Hoang :

You can, for example, use mask and bfill:

mask = df['Color'].values  == df.columns[1:].str.extract('_(.*)$').values
df['WANT'] = df.iloc[:,1:].where(mask.T).bfill(1).iloc[:,0]

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=8181&siteId=1