Pandas regex to return any string that contains U or UN with a digit

tomoc4 :

I'm trying to create a new column with values from another columns string What I want is to create a new column with unit values.

The position of the units can vary.

The examples of my strings are

this is a string and we have 4U to use
this is another string 5UN
only 6U to use today

I need to extract the numbers that are joined to both U and UN since the positions vary.

df['test_units'] = df['ITEM_DESC'].str.get(r'\(*U.*?\)',)
df['test_units']

This is my regex but I return only nan values.

How do I just return the number that's joined to a U Or UN?

Wiktor Stribiżew :

You may use

df['test_units'] = df['ITEM_DESC'].str.extract(r'\b(\d+)UN?\b')

See the regex demo. Note the unescaped pair of parentheses that form a capturing group whose value is returned by Series.str.extract.

The regex matches:

  • \b - a word boundary
  • (\d+) - Group 1: one or more digits
  • U - U
  • N? - an optional N
  • \b - word boundary

Pandas test:

import pandas as pd
cols={'ITEM_DESC': ['this is a string and we have 4U to use','this is another string 5UN','only 6U to use today']}
df = pd.DataFrame(cols)
df['test_units'] = df['ITEM_DESC'].str.extract(r'\b(\d+)UN?\b')

Output:

>>> df
                                ITEM_DESC test_units
0  this is a string and we have 4U to use  4        
1  this is another string 5UN              5        
2  only 6U to use today                    6        
>>> 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=23758&siteId=1