How to extract numbers from string using a pattern?

melo777 :

I have the following ten rows in a pandas dataframe. I want to extract the coordinates as in [49,49],[31,78] etc.(for each row).

I tried to use string extract but I couldn't figure out the pattern.

This is what I tried. I am not sure I understand how this works though:

b is the dataframe and positions is the column

b.positions.str.extract("""[{'y': (\d+), 'x': (\d+)}],""")

[{'y': 49, 'x': 49}, {'y': 78, 'x': 31}]
[{'y': 78, 'x': 31}, {'y': 75, 'x': 51}]
[{'y': 75, 'x': 51}, {'y': 71, 'x': 35}]
[{'y': 71, 'x': 35}, {'y': 95, 'x': 41}]
[{'y': 95, 'x': 41}, {'y': 88, 'x': 72}]
[{'y': 88, 'x': 72}, {'y': 75, 'x': 77}]
[{'y': 25, 'x': 23}, {'y': 15, 'x': 39}]
[{'y': 15, 'x': 39}, {'y': 20, 'x': 33}]
[{'y': 85, 'x': 61}, {'y': 80, 'x': 67}]
[{'y': 80, 'x': 67}, {'y': 61, 'x': 59}]
[{'y': 61, 'x': 59}, {'y': 45, 'x': 45}]

Valdi_Bo :

Try str.extractall and named capturing groups. Assuming that the source column holding your strings is named col1, the code is:

df.col1.str.extractall(r"'y': (?P<y>\d+), 'x': (?P<x>\d+)")

For your sample data, the result is:

           y   x
   match        
0  0      49  49
   1      78  31
1  0      78  31
   1      75  51
2  0      75  51
   1      71  35
3  0      71  35
   1      95  41
4  0      95  41
   1      88  72
5  0      88  72
   1      75  77
6  0      25  23
   1      15  39
7  0      15  39
   1      20  33
8  0      85  61
   1      80  67
9  0      80  67
   1      61  59
10 0      61  59
   1      45  45

The first level in the MultiIndex of the result (unnamed) is the index from the source row. The second level (named match) is the match number for the current row, starting from 0.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=375481&siteId=1