Merging Pandas DataFrames with keys in different columns

glpsx :

I'm trying to merge two Pandas DataFrames which are as follows:

import pandas as pd

df1 = pd.DataFrame({'PAIR': ['140-120', '200-280', '350-310', '410-480', '500-570'],
                    'SCORE': [99, 70, 14, 84, 50]})
print(df1)

      PAIR  SCORE
0  140-120     99
1  200-280     70
2  350-310     14
3  410-480     84
4  500-570     50

df2 = pd.DataFrame({'PAIR1': ['140-120', '280-200', '350-310', '480-410', '500-570'],
                    'PAIR2': ['120-140', '200-280', '310-350', '410-480', '570-500'],
                    'BRAND' : ['A', 'V', 'P', 'V', 'P']})
print(df2)

     PAIR1    PAIR2 BRAND
0  140-120  120-140     A
1  280-200  200-280     V
2  350-310  310-350     P
3  480-410  410-480     V
4  500-570  570-500     P

If you take a closer look, you will notice that each value in the PAIR column of df1 match either the value in PAIR1 or PAIR2 of df2. In df2, the keys are present in both ways (e.g. 140-120 and 120-140).

My goal is to merge the two DataFrames to obtain the following result:

      PAIR  SCORE BRAND
0  140-120     99     A
1  200-280     70     V
2  350-310     14     P
3  410-480     84     V
4  500-570     50     P

I tried to first merge df1 with df2 the following way:

df3 = pd.merge(left = df1, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR1')

Then, taking the resulting DataFrame df3 and merge it back with df2:

df4 = pd.merge(left = df3, right = df2, how = 'left', left_on = 'PAIR', right_on = 'PAIR2')

print(df4)

      PAIR  SCORE  PAIR1_x  PAIR2_x BRAND_x  PAIR1_y  PAIR2_y BRAND_y
0  140-120     99  140-120  120-140       A      NaN      NaN     NaN
1  200-280     70      NaN      NaN     NaN  280-200  200-280       V
2  350-310     14  350-310  310-350       P      NaN      NaN     NaN
3  410-480     84      NaN      NaN     NaN  480-410  410-480       V
4  500-570     50  500-570  570-500       P      NaN      NaN     NaN

This is not my desired result. I don't how else I can account for the fact that the correct key might be either in PAIR1 or PAIR2. Any help would be appreciated.

Peter Leimbigler :

Somewhat clumsy solution: build a Series that maps each pair in df2 to its corresponding brand, then pass this mapping to df1['PAIR'].map().

# Build a series whose index maps pairs to values
mapper = df2.melt(id_vars='BRAND').set_index('value')['BRAND']
mapper
value
140-120    A
280-200    V
350-310    P
480-410    V
500-570    P
120-140    A
200-280    V
310-350    P
410-480    V
570-500    P
Name: BRAND, dtype: object

# Use the mapper on df1['PAIR']
df1['BRAND'] = df1['PAIR'].map(mapper)
df1
      PAIR  SCORE BRAND
0  140-120     99     A
1  200-280     70     V
2  350-310     14     P
3  410-480     84     V
4  500-570     50     P

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=8171&siteId=1