Partial keyword match not working when I am trying to create a new column from a pandas data frame in python?

Rahul rajan :

I have a data frame Description as mentioned below

  Description
  Government entertainment people
  Dinner with CFO
  Commission to Agents government

I am trying to do a keyword search on the description column and I have list of keywords as a list .

My current code checks only exact matches not partial matches.If there are multiple keywords present in the row these will be separated by a delimiter and populated new column.

My Code

data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','govern','Agent','entertain']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))

I need code work even if I give keyword list above find matches such as

1) entertains for row 1

2) govern for row 3

Currently the code only works for exact matches not partial matches.

Expected Output

Description                         Keyword
Government entertainment people      Govern/entertain
Dinner with CFO                       Dinner
Commission to Agents government       Agent

How can this be done?

Serge Ballesta :

extractall will do the job, but you must first build the pattern:

...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)

You would get:

                       Description           Keyword
0  Government entertainment people  Govern/entertain
1                  Dinner with CFO            Dinner
2  Commission to Agents government      Agent/govern

(pattern is here '((?:dinner)|(?:govern)|(?:agent)|(?:entertain))')

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=174112&siteId=1