RegEx for combining multiple sequences

user2557930 :

As many people ,i am struggling with what it seems a "trivial" regex issue. in a given text, whenever I encounter a word within {} brackets i need to extract it.At first i used

"\\{-?(\\w{3,})\\}"

and it worked ok:

as long as the word didnt have any white space or special character like ' . For example {Project} returns Project.But {Project Test} or {Project D'arce} don't return anything. i know that for white characters i need to use \s.But it is absolutely not clear for me how to add to the above , i tried :

"%\\{-?(\\w(\\s{3,})\\)\\}"))

but not working.Also what if i want to add words containing a special characters like ' ??? Its really frustrating

The fourth bird :

You could use a character class [\w\s']and add to it what you could allow to match:

\{-?([\w\s']{3,})}

In Java

String regex = "\\{-?([\\w\\s']{3,})}";

Regex demo

If you want to prevent matching only 3 whitespace chars, you could use a repeating group:

\{-?\h*([\w']{3,}(?:\h+[\w']+)*)\h*}

About the pattern

  • \{ Match { char
  • -? Optional hyphen
  • \h* Match 0+ times a horizontal whitespace char
  • ([\w\s']{3,}) Capture in a group matching 3 or more times either a word char, whitespace char or '
  • (?:\h[\w']+)* Repeat 0+ times matching 1+ horizontal whitespace chars followed by what is listed in the character class
  • \h* Match 0+ times a horizontal whitespace char
  • } Match }

In Java

String regex = "\\{-?\\h*([\\w']{3,}(?:\\h+[\\w']+)*)\\h*}";

Regex demo

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=90561&siteId=1