Filtering a pandas dataframe comparing two columns

HegChin :

Consider a dataframe which consists of various columns, but I would want to filter the dataframe by comparing values of two columns. Here below is the sample of dataframe .

Machine   Position
  M01         PB0
  M02         PB0
  M03         PB0
  M04         PB0
  M01         PB1
  M02         PB1
  M01         PB1
  M01         PB1

Above you can see All machines have Position PB0 but only two machines has both PB0 and PB1, now I would expect maybe a list of machines which has both PB0 and PB1 machine=['M01','M02'] One thing to consider is among these two columns there could be many duplicates.

Itamar Mushkin :

Let's define your dataframe:

import pandas as pd
df = pd.DataFrame({'Machine': {0: 'M01',
  1: 'M02',
  2: 'M03',
  3: 'M04',
  4: 'M01',
  5: 'M02',
  6: 'M01',
  7: 'M01'},
 'Position': {0: 'PB0',
  1: 'PB0',
  2: 'PB0',
  3: 'PB0',
  4: 'PB1',
  5: 'PB1',
  6: 'PB1',
  7: 'PB1'}})

To get the positions of each machine, regardless of duplicates, we can use:

s = df.groupby('Machine')['Position'].apply(set)

Which looks like this:

Machine
M01    {PB1, PB0}
M02    {PB1, PB0}
M03         {PB0}
M04         {PB0}
Name: Position, dtype: object

To get only the machines whose positions include both PB0 and PB1 we can use

s[s.apply(lambda x: x.issuperset({'PB1','PB0'}))].index

which returns

Index(['M01', 'M02'], dtype='object', name='Machine')

(you can also add a .to_list() at the end if you prefer a list to a pd.Index)

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=3884&siteId=1