Replacing dataframe column values from a re.search loop

Lara :

How can I replace the values of an existing dataframe column with the values from the re.search loop?

This is my re.search loop.

for i in dataset['col1']:
    clean = re.search('(nan|[0-9]{1,4})([,.][0-9]{1,4})?', i)
    print(clean.group())    

This is the sample data set (dataset)

    year    col1
1    2001    10.563\D
2    2002    9.540\A
3    2003    4.674\G
4    2004    3.2754\u
5    2005    nan\x
Shubham Sharma :

You can use Series.apply to apply the custom function to the dataset["col1"]. Or, better you can use Series.str.replace to replace the pattern with the replacement string.

Try this:

def func(i):
    clean = re.search('(nan|[0-9]{1,4})([,.][0-9]{1,4})?', i)
    return clean.group()

dataset["col1"] = dataset["col1"].apply(func)

OR Better,

df["col1"] = df["col1"].str.replace(r'(.*?)(\\.*?$)', r"\1")

Output:

>>> print(dataset)

   year    col1
0  2001  10.563
1  2002   9.540
2  2003   4.674
3  2004  3.2754
4  2005     nan

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=401638&siteId=1