How to get semicolons except in parentheses with regex

aria :

For the following C source code piece:

for (j=0; j<len; j++) a = (s) + (4); test = 5;

I want to insert \n after semicolons ; except in parenthesis using python code regex module.

For the following C source code piece:

for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;

The regex ;(?![^(]*\)) works but not on the first piece of code.

usr2564301 :

Use a custom replacement function:

re.sub(pattern, repl, string, count=0, flags=0)
...
If repl is a function, it is called for every non-overlapping occurrence of pattern.

The function repl is called for every occurrence of a single ; and for parenthesized expressions. Since re.sub does not find overlapping sequences, the very first opening parenthesis will trigger a full match all the way up to the last closing parenthesis.

import re

def repl(m):
    contents = m.group(1)
    if '(' in contents:
        return contents
    return ';\n'

str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'

print (re.sub (r'(;\s*|\(.*\))', repl, str1))
print (re.sub (r'(;\s*|\(.*\))', repl, str2))

Result:

for (j=0; j<len; j++) a = (s) + (4);
test = 5;

for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;

Mission accomplished, for your (very little) sample data.

But wait!

A small – but valid – change in one of the examples

str1 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'

breaks this with a wrong output:

for (j=0; j<len; j++) test = 5; a = (s) + (4);

There is no way around it, you need a state machine instead:

def state_match (text):
    parentheses = 0
    drop_space = False
    result = ''
    for character in text:
        if character == '(':
            parentheses += 1
            result += '('
        elif character == ')':
            parentheses -= 1
            result += ')'
        elif character == ' ':
            if not drop_space:
                result += ' '
            drop_space = False
        elif character == ';':
            if parentheses:
                result += character
            else:
                result += ';\n'
                drop_space = True
        else:
            result += character
    return result

str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'
str3 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'

print (state_match(str1))
print (state_match(str2))
print (state_match(str3))

results correctly in:

for (j=0; j<len; j++) a = (s) + (4);
test = 5;

for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;

for (j=0; j<len; j++) test = 5;
a = (s) + (4);

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=13866&siteId=1