Consider a dataframe which consists of various columns, but I would want to filter the dataframe by comparing values of two columns. Here below is the sample of dataframe .
Machine Position
M01 PB0
M02 PB0
M03 PB0
M04 PB0
M01 PB1
M02 PB1
M01 PB1
M01 PB1
Above you can see All machines have Position PB0 but only two machines has both PB0 and PB1, now I would expect maybe a list of machines which has both PB0 and PB1 machine=['M01','M02']
One thing to consider is among these two columns there could be many duplicates.
Let's define your dataframe:
import pandas as pd
df = pd.DataFrame({'Machine': {0: 'M01',
1: 'M02',
2: 'M03',
3: 'M04',
4: 'M01',
5: 'M02',
6: 'M01',
7: 'M01'},
'Position': {0: 'PB0',
1: 'PB0',
2: 'PB0',
3: 'PB0',
4: 'PB1',
5: 'PB1',
6: 'PB1',
7: 'PB1'}})
To get the positions of each machine, regardless of duplicates, we can use:
s = df.groupby('Machine')['Position'].apply(set)
Which looks like this:
Machine
M01 {PB1, PB0}
M02 {PB1, PB0}
M03 {PB0}
M04 {PB0}
Name: Position, dtype: object
To get only the machines whose positions include both PB0
and PB1
we can use
s[s.apply(lambda x: x.issuperset({'PB1','PB0'}))].index
which returns
Index(['M01', 'M02'], dtype='object', name='Machine')
(you can also add a .to_list()
at the end if you prefer a list to a pd.Index)