tomoc4 :
I'm trying to create a new column with values from another columns string What I want is to create a new column with unit values.
The position of the units can vary.
The examples of my strings are
this is a string and we have 4U to use
this is another string 5UN
only 6U to use today
I need to extract the numbers that are joined to both U and UN since the positions vary.
df['test_units'] = df['ITEM_DESC'].str.get(r'\(*U.*?\)',)
df['test_units']
This is my regex but I return only nan values.
How do I just return the number that's joined to a U Or UN?
Wiktor Stribiżew :
You may use
df['test_units'] = df['ITEM_DESC'].str.extract(r'\b(\d+)UN?\b')
See the regex demo. Note the unescaped pair of parentheses that form a capturing group whose value is returned by Series.str.extract
.
The regex matches:
\b
- a word boundary(\d+)
- Group 1: one or more digitsU
-U
N?
- an optionalN
\b
- word boundary
Pandas test:
import pandas as pd
cols={'ITEM_DESC': ['this is a string and we have 4U to use','this is another string 5UN','only 6U to use today']}
df = pd.DataFrame(cols)
df['test_units'] = df['ITEM_DESC'].str.extract(r'\b(\d+)UN?\b')
Output:
>>> df
ITEM_DESC test_units
0 this is a string and we have 4U to use 4
1 this is another string 5UN 5
2 only 6U to use today 6
>>>