Modifing PCRE Regex to C# or Java Supported Regex

Prasad Kada :

Business requirement: Address needs to parsed to Street, House number and Address line 2

Example single line addresses

Bygholm Søpark 21B, 
Peder Skrams Gade 9  3. tv., 
Willemoesgade 29  kid.

The below PCRE regular expressing is working for the above business scenario. I need to use this regular expression and create a Java method which accepts an input parameter (Single line address) and returns the output from Regex groups (Street, House number and Address line 2). Could anyone help me with this?

Regex:

/
\A\s*
(?: #########################################################################
    # Option A: [<Addition to address 1>] <House number> <Street name>      #
    # [<Addition to address 2>]                                             #
    #########################################################################
    (?:(?P<A_Addition_to_address_1>.*?),\s*)? # Addition to address 1
(?:No\.\s*)?
    (?P<A_House_number_1>\pN+[a-zA-Z]?(?:\s*[-\/\pP]\s*\pN+[a-zA-Z]?)*) # House number
\s*,?\s*
    (?P<A_Street_name_1>(?:[a-zA-Z]\s*|\pN\pL{2,}\s\pL)\S[^,#]*?(?<!\s)) # Street name
\s*(?:(?:[,\/]|(?=\#))\s*(?!\s*No\.)
    (?P<A_Addition_to_address_2>(?!\s).*?))? # Addition to address 2
|   #########################################################################
    # Option B: [<Addition to address 1>] <Street name> <House number>      #
    # [<Addition to address 2>]                                             #
    #########################################################################
    (?:(?P<B_Addition_to_address_1>.*?),\s*(?=.*[,\/]))? # Addition to address 1
    (?!\s*No\.)(?P<B_Street_name>\S\s*\S(?:[^,#](?!\b\pN+\s))*?(?<!\s)) # Street name
\s*[\/,]?\s*(?:\sNo\.)?\s+
    (?P<B_House_number>\pN+\s*-?[a-zA-Z]?(?:\s*[-\/\pP]?\s*\pN+(?:\s*[\-a-zA-Z])?)*|[IVXLCDM]+(?!.*\b\pN+\b))(?<!\s) # House number
\s*(?:(?:[,\/]|(?=\#)|\s)\s*(?!\s*No\.)\s*
    (?P<B_Addition_to_address_2>(?!\s).*?))? # Addition to address 2
)
\s*\Z

https://regex101.com/library/lU7gY7

JAVA Method:

    import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class regEx {
    public static void main( String args[] ) {
          // String to be scanned to find the pattern.
          String line = "Bygholm Søpark 21B";
          String pattern = "\\A\\s*\r\n" + 
                "(?: #########################################################################\r\n" + 
                "    # Option A: [<Addition to address 1>] <House number> <Street name>      #\r\n" + 
                "    # [<Addition to address 2>]                                             #\r\n" + 
                "    #########################################################################\r\n" + 
                "    (?:(?:P<A_Addition_to_address_1>.*?),\\s*)? # Addition to address 1\r\n" + 
                "(?:No\\.\\s*)?\r\n" + 
                "    (?:P<A_House_number_1>\\pN+[a-zA-Z]?(?:\\s*[-\\/\\pP]\\s*\\pN+[a-zA-Z]?)*) # House number\r\n" + 
                "\\s*,?\\s*\r\n" + 
                "    (?:P<A_Street_name_1>(?:[a-zA-Z]\\s*|\\pN\\pL{2,}\\s\\pL)\\S[^,#]*?(?<!\\s)) # Street name\r\n" + 
                "\\s*(?:(?:[,\\/]|(?=\\#))\\s*(?!\\s*No\\.)\r\n" + 
                "    (?:P<A_Addition_to_address_2>(?!\\s).*?))? # Addition to address 2\r\n" + 
                "|   #########################################################################\r\n" + 
                "    # Option B: [<Addition to address 1>] <Street name> <House number>      #\r\n" + 
                "    # [<Addition to address 2>]                                             #\r\n" + 
                "    #########################################################################\r\n" + 
                "    (?:(?:P<B_Addition_to_address_1>.*?),\\s*(?=.*[,\\/]))? # Addition to address 1\r\n" + 
                "    (?:!\\s*No\\.)(?:P<B_Street_name>\\S\\s*\\S(?:[^,#](?!\\b\\pN+\\s))*?(?:<!\\s)) # Street name\r\n" + 
                "\\s*[\\/,]?\\s*(?:\\sNo\\.)?\\s+\r\n" + 
                "    (?:P<B_House_number>\\pN+\\s*-?[a-zA-Z]?(?:\\s*[-\\/\\pP]?\\s*\\pN+(?:\\s*[\\-a-zA-Z])?)*|[IVXLCDM]+(?!.*\\b\\pN+\\b))(?<!\\s) # House number\r\n" + 
                "\\s*(?:(?:[,\\/]|(?=\\#)|\\s)\\s*(?!\\s*No\\.)\\s*\r\n" + 
                "    (?:P<B_Addition_to_address_2>(?!\\s).*?))? # Addition to address 2\r\n" + 
                ")\r\n" + 
                "\\s*\\Z";

          // Create a Pattern object
          Pattern r = Pattern.compile(pattern);

          // Now create a matcher object.
          Matcher m = r.matcher(line);
          if (m.find( )) {
             System.out.println("B_Street_name: " + m.group(1) );
             System.out.println("B_House_number: " + m.group(2) );
             System.out.println("B_Addition_to_address_2: " + m.group(3) );
          }else {
             System.out.println("NO MATCH");
          }
       }
}
Wiktor Stribiżew :

There are lots of things to bear in mind.

  • Named capturing groups: There syntax in Java is (?<name>pattern) and the names can only consist of ASCII digits or letters (see I can't use a group name like this "abc_def" using Patterns). Replace all (?P<name_parts>...) with (?<nameparts>...)
  • Use of #: In many flavors but Java, the free-spacing mode allows using a literal # inside character classes unescaped. In Java, any meaningful whitespace and # MUST be escaped EVEN inside character classes (replace all # with \\# inside character classes and pattern).
  • Pattern.COMMENTS is used in Java to enable free-spacing / comment mode. Alternatively, add (?x) at the pattern start.

Here is your code fix:

String line = "Bygholm Søpark 21B";
String pattern = "\\A\\s*\r\n" + 
  "(?: #########################################################################\r\n" + 
  "    # Option A: [<Addition to address 1>] <House number> <Street name>      #\r\n" + 
  "    # [<Addition to address 2>]                                             #\r\n" + 
  "    #########################################################################\r\n" + 
  "    (?:(?<AAdditiontoaddress1>.*?),\\s*)?         # Addition to address 1\r\n" + 
  "(?:No\\.\\s*)?\r\n" + 
  "    (?<AHousenumber1>\\pN+[a-zA-Z]?(?:\\s*[-/\\pP]\\s*\\pN+[a-zA-Z]?)*) # House number\r\n" + 
  "\\s*,?\\s*\r\n" + 
  "    (?<AStreetname1>(?:[a-zA-Z]\\s*|\\pN\\pL{2,}\\s\\pL)\\S[^,\\#]*?(?<!\\s)) # Street name\r\n" + 
  "\\s*(?:(?:[,/]|(?=\\#))\\s*(?!\\s*No\\.)\r\n" + 
  "    (?<AAdditiontoaddress2>(?!\\s).*?))?              # Addition to address 2\r\n" + 
  "|   #########################################################################\r\n" + 
  "    # Option B: [<Addition to address 1>] <Street name> <House number>      #\r\n" + 
  "    # [<Addition to address 2>]                                             #\r\n" + 
  "    #########################################################################\r\n" + 
  "    (?:(?<BAdditiontoaddress1>.*?),\\s*(?=.*[,/]))?   # Addition to address 1\r\n" + 
  "    (?!\\s*No\\.)(?<BStreetname>\\S\\s*\\S(?:[^,\\#](?!\\b\\pN+\\s))*?(?<!\\s)) # Street name\r\n" + 
  "\\s*[/,]?\\s*(?:\\sNo\\.)?\\s+\r\n" + 
  "    (?<BHousenumber>\\pN+\\s*-?[a-zA-Z]?(?:\\s*[-/\\pP]?\\s*\\pN+(?:\\s*[-a-zA-Z])?)*|[IVXLCDM]+(?!.*\\b\\pN+\\b))(?<!\\s) # House number\r\n" + 
  "\\s*(?:(?:[,/]|(?=\\#)|\\s)\\s*(?!\\s*No\\.)\\s*\r\n" + 
  "    (?<BAdditiontoaddress2>(?!\\s).*?))? # Addition to address 2\r\n" + 
  ")\r\n" + 
  "\\s*\\Z";

// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.COMMENTS);
// Now create a matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
    System.out.println("B_Street_name: " + m.group("BStreetname") );
    System.out.println("B_House_number: " + m.group("BHousenumber") );
    System.out.println("B_Addition_to_address_2: " + m.group("BAdditiontoaddress2") );
} else {
    System.out.println("NO MATCH");
}

See the Java demo online.

Output:

B_Street_name: Bygholm Søpark
B_House_number: 21B
B_Addition_to_address_2: null

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=157214&siteId=1