Java Regex to get 11 words before and after specific word recursively

AndroidHacker :

I am trying to get 11 words before and after a specific word in String.

For Ex:

and WINSOCK 2.0 in Visual Studio 2012/2013, compiled as Release for use on 64-bit and 32-bit Windows Servers. Client application discovers and validates qualifying Windows Server product

Now here the challenge is to identify word like 32 which is connected to word bit with hyhen. If I change this word to 32+bit instead of 32-bit... the regex identifies and gets me 11 word before and after the sentence.

My regex looks like

Pattern pattern = Pattern.compile("(?<!-)\\b(?<!&)(" + "\\b" + word + "\\b" + ")(?!&)\\b(?!-)(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,11}");

I seek any help in this.

PS Note* I am not able to identify words attached with hyphen

@Solution Thanks to @Wiktor

\\b(?<!&)\\b" + word + "\\b(?!&)(?:[^a-zA-Z']+[a-zA-Z'-]+){0,11}

Thanks.

Wiktor Stribiżew :

You may "take out" the hyphen from the regex:

"\\b(?<!&)" + word + "\\b(?!&)(?:[^a-zA-Z']+[a-zA-Z'-]+){0,11}"

Or, if the word may start/end with special chars:

"(?<![&\\w])" + Pattern.quote(word) + "(?![&\\w])(?:[^a-zA-Z']+[a-zA-Z'-]+){0,11}"

See the regex demo

Details

  • \b(?<!&) - a word boundary that is not preceded with &
  • word - a variable word (note you may need to escape it with Pattern.quote(word) or even replace "\\b(?<!&)" + word + "\\b(?!&)" with "(?<![&\\w])" + word + "(?![&\\w])" if the word may start/end with special chars)
  • \b(?!&) - a word boundary that is not followed with &
  • (?:[^a-zA-Z']+[a-zA-Z'-]+){0,11} - 0 or more sequences of:
    • [^a-zA-Z']+ - 1+ chars other than ASCII letters or '
    • [a-zA-Z'-]+ - 1+ ASCII letters or '.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=153643&siteId=1