Nested Joins using Python pandas Dataframe

nickwarters :

Apologies if the terminology is incorrect.

My Dataframe has 4 columns. Col1 is the personId (will hold multiples of the same value) Col2 is the event type Col3 is Boolean Col4 is Boolean

Only col3 OR col4 can be true at any one time.

Input:

personId      EventType      Col3       Col4
1             A              True       False
2             A              True       False
3             A              True       False
2             B              False      True
4             A              True       False
3             B              False      True

Desired output:

personId      EventType      Col3       Col4      Col5
1             A              True       False     False
2             A              True       False     True
3             A              True       False     True
2             B              False      True      False
4             A              True       False     False
3             B              False      True      False
4             C              False      True      False

As you can see, Col5 will be true where the Trues from Col3 relate to a personId which has also had EventType B, but not C.

I am utterly stuck. I am sure I am overlooking something extremely simple.

I have tried the following:

df = pd.merge(df, left_on='personId', right_on='personId')


df = pd.merge(df, df[['personId','Col4']], on='personId', how='left)

but neither achieve the desired results.

Scott Boston :

Better logic:

df['Col5'] = df.groupby('personId')['EventType'].transform(lambda x: (x=='B').any()) & df['Col3']

Output:

   personId EventType   Col3   Col4   Col5
0         1         A   True  False  False
1         2         A   True  False   True
2         3         A   True  False   True
3         2         B  False   True  False
4         4         A   True  False  False
5         3         B  False   True  False

Details: Use groupby personid and transform EventType column to True if any Event in that personid equals to 'B', then use boolean logic operator and, & with 'col3'.


IIUC, you try using groupby and some boolean logic:

df.join(df.groupby('personId', group_keys=False)
          .apply(lambda x: (x['EventType']=='B').any() & x['Col3'])
          .rename('Col5'))

Output:

   personId EventType   Col3   Col4   Col5
0         1         A   True  False  False
1         2         A   True  False   True
2         3         A   True  False   True
3         2         B  False   True  False
4         4         A   True  False  False
5         3         B  False   True  False

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=21284&siteId=1