Rahul rajan :
I have a data frame Description as mentioned below
Description
Government entertainment people
Dinner with CFO
Commission to Agents government
I am trying to do a keyword search on the description column and I have list of keywords as a list .
My current code checks only exact matches not partial matches.If there are multiple keywords present in the row these will be separated by a delimiter and populated new column.
My Code
data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','govern','Agent','entertain']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))
I need code work even if I give keyword list above find matches such as
1) entertains for row 1
2) govern for row 3
Currently the code only works for exact matches not partial matches.
Expected Output
Description Keyword
Government entertainment people Govern/entertain
Dinner with CFO Dinner
Commission to Agents government Agent
How can this be done?
Serge Ballesta :
extractall
will do the job, but you must first build the pattern:
...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)
You would get:
Description Keyword
0 Government entertainment people Govern/entertain
1 Dinner with CFO Dinner
2 Commission to Agents government Agent/govern
(pattern
is here '((?:dinner)|(?:govern)|(?:agent)|(?:entertain))'
)