I have a dataframe like this:
Title Participants
0 ShowA B. Smith,C. Ball
1 ShowB T. Smooth
2 ShowC K. Dulls,L. Allen,B. Smith
I'm splitting on ,
in the Participants column and creating a list for each cell. Next, I check for specific participant(s) in each list. In this example, I'm checking for either B. Smith
or K. Dulls
for item in df['Participants']:
listX = item.split(',')
if 'B. Smith' in listX or 'K. Dulls' in listX:
print(listX)
This returns:
['B. Smith', 'C. Ball']
['K. Dulls', 'L. Allen', 'B. Smith']
1) I'm guessing there is a cleaner way to check for multiple participants, in my if
statement. I'd love any suggestions.
2) This is where i've been spinning in circles, how do I return the Title
associated with the list(s) I return?
In this example, i'd like to return:
ShowA
ShowC
Setup code:
import pandas as pd
df = pd.DataFrame(data={'Title': ['ShowA', 'ShowB', 'ShowC'],
'Participants': ['B. Smith,C. Ball', 'T. Smooth', 'K. Dulls,L. Allen,B. Smith']})
target_participants = ['B. Smith', 'K. Dulls']
get_dummies
You can use pandas.Series.str.get_dummies
and create a dataframe where columns are boolean expressions of where names are present.
dummies = df.Participants.str.get_dummies(',').astype(bool)
dummies
B. Smith C. Ball K. Dulls L. Allen T. Smooth
0 True True False False False
1 False False False False True
2 True False True True False
Then we can find your result
df.loc[dummies['B. Smith'] | dummies['K. Dulls'], 'Title']
0 ShowA
2 ShowC
Name: Title, dtype: object
contains
Otherwise, you can use pandas.Series.str.contains
. First we'll need to specify the people you are looking for in a list and then construct a string to use as a regular expression.
people_to_look_for = ['B. Smith', 'K. Dulls']
pattern = '|'.join(people_to_look_for)
mask = df.Participants.str.contains(pattern)
df.loc[mask, 'Title']
0 ShowA
2 ShowC
Name: Title, dtype: object