sygneto :
Input:
df=pd.DataFrame({'text':['value 123* 333','122* 666','722 888*']})
print(df)
text
0 value 123* 333
1 122* 666
2 722 888*
I need to extract from df['text']
only numeric values, but withou *
label
my code:
df.text.str.extract(r'([0-9]+|[0-9]+\.[0-9]+)')
But with this code, values with the *
char on the right are returned.
Expected output:
text
333
666
722
Wiktor Stribiżew :
You may use
df['text'].str.extract(r'(?=([0-9]+(?:\.[0-9]+)?))\1(?!\*)')
See the regex demo. Or, you may also require a word boundary on the left with r'\b(?=([0-9]+(?:\.[0-9]+)?))\1(?!\*)'
. See this regex demo.
Regex details
(?=([0-9]+(?:\.[0-9]+)?))
- a positive lookahead that requires and captures into Group 1 the following sequence of patterns immediately on the right:[0-9]+
- 1+ digits(?:\.[0-9]+)?
- an optional sequence of.
and 1+ digits.
\1
- the value of Group 1(?!\*)
- a negative lookahead that fails the match if, immediately to the right, there is a*
char.
See the Python test:
>>> import pandas as pd
>>> df=pd.DataFrame({'text':['value 123* 333','122* 666','722 888*']})
>>> df['text'].str.extract(r'(?=([0-9]+(?:\.[0-9]+)?))\1(?!\*)')
0 333
1 666
2 722
Name: text, dtype: object
>>>
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=401678&siteId=1