Removing word from panda series of string using regex

Bruno Mello :

Suppose I have the following pandas series:

x = pd.Series(['box abcd', 'abcd box abcd', 'abcd box', 'abcdboxabcd'])

And I want to remove all the occurrences of the word box (note that I don't want to remove all occurrences of the substring box), I have done it like this:

x.apply(lambda x: ' '.join([w for w in x.split(' ') if w != 'box']))

Which gives me what I expected:

0           abcd
1      abcd abcd
2           abcd
3    abcdboxabcd
dtype: object

I would like to know if there is a way to do this using regex, for instance:

x.str.replace(regex, '')

Where regex is the regex matches the word box, I have searched a lot about regex but can't seem to find an answer, is it possible? Or there isn't such regex like that?

Quang Hoang :

You want \b indicating word separation, and then strip extra spaces:

x.str.replace(r'\b(\s?box\s?)\b', ' ').str.strip()

Output:

0           abcd
1      abcd abcd
2           abcd
3    abcdboxabcd
dtype: object

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=390496&siteId=1