Error in tag separated by `|` using Regex python

Akshay Godase :

I want to add | before every tag. Please check the below code that I have used.

tags = ['XYZ', 'CREF', 'BREF', 'RREF', 'REF']

string_data = 'XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY'

for each_tag in tags:
    result = string_data.replace(each_tag, "|" + each_tag)
    print(result)

How can I do it using the Regex?

Input String:

XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY

Actual result (wrong):

XYZ:MUMBAI UNIVERSITYC|REF:PUNE UNIVERSITYB|REF:DADAR UNIVERSITYR|REF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY

Expected result:

|XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY

Is there any way to do it using regex?

The fourth bird :

You could match an optional B or R or match a C when not preceded with an L using a negative lookbehind.

(?:[BR]?|(?<!L)C)REF|^(?!\|)

Explanation

  • (?: Non capture group
    • [BR]? Match an optional B or R
    • | Or
    • (?<!L)C Match a C and assert what is directly to the left is not L
  • ) Close group
  • REF Match literally
  • | Or
  • ^(?!\|) Assert the start of the string when not directly followed by a | to prevent starting with a double || if there already is one present

Regex demo | Python demo

In the replacement use the match prepended with a pipe

|\g<0>

For example

import re

regex = r"(?:[BR]?|(?<!L)C)REF|^(?!\|)"
test_str = "XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY"
subst = "|\\g<0>"
result = re.sub(regex, subst, test_str)

print (result)

Output

|XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=3823&siteId=1