I'm trying to figure out the regexp to match all the occurences of *this kind of strings*
. Two additional rules unfortunately made this thing more complicated than I thought:
- tagged string should start with
*
followed by non-whitespace character (so* this one*
should not be matched - tagged string should end with non-whitespace followed by
*
followed by whitespace (so*this one *
and*this o*ne
should not be matched
I started with simplest regexp \*\S([^\*]+)?\*
which for my testing string:
*foo 1 * 2 bar* foo *b* azz *qu **ux*
matches places in square brackets:
[*foo 1 *] 2 bar* foo [*b*] azz [*qu *][*ux*]
and this is what I'd like to achieve:
[*foo 1 * 2 bar*] foo [*b*] azz [*qu **ux*]
so 2 problems appear:
- how to express in a regexp a rule from 2. "search till first non-whitespace followed
*
followed by whitespace appears"? positive lookahead? - how to match whitespace from rule 2. but not include it into result, which
\*\S([^\*]+)?\*\s
would do?
public class Test {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\*\\S.*?(?<!\\s)\\*(?=\\s|$)");
Matcher matcher = pattern.matcher("*foo 1 * 2 bar* foo *b* azz *qu **ux*");
int i = 1;
while(matcher.find()) {
System.out.printf("%d: %s%n", i++, matcher.group());
}
}
}
*\S : * followed by a non-whitespace char
.*? : consume chars non-greedy.
(?<!\s)* : * following non-whitespace char. This is negative look behind, which does not consume non-whitespace char.
(?=\s|$) : positive lookahead. * should be followed by whitespace or end of line.