regexp to match simple markdown

Michal :

I'm trying to figure out the regexp to match all the occurences of *this kind of strings*. Two additional rules unfortunately made this thing more complicated than I thought:

  1. tagged string should start with * followed by non-whitespace character (so * this one* should not be matched
  2. tagged string should end with non-whitespace followed by * followed by whitespace (so *this one * and *this o*ne should not be matched

I started with simplest regexp \*\S([^\*]+)?\* which for my testing string:

*foo 1 * 2 bar* foo *b* azz *qu **ux*

matches places in square brackets:

[*foo 1 *] 2 bar* foo [*b*] azz [*qu *][*ux*]

and this is what I'd like to achieve:

[*foo 1 * 2 bar*] foo [*b*] azz [*qu **ux*]

so 2 problems appear:

  • how to express in a regexp a rule from 2. "search till first non-whitespace followed * followed by whitespace appears"? positive lookahead?
  • how to match whitespace from rule 2. but not include it into result, which \*\S([^\*]+)?\*\s would do?
yavuzkavus :
public class Test {

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\\*\\S.*?(?<!\\s)\\*(?=\\s|$)");
        Matcher matcher = pattern.matcher("*foo 1 * 2 bar* foo *b* azz *qu **ux*");
        int i = 1;
        while(matcher.find()) {
            System.out.printf("%d: %s%n", i++, matcher.group());
        }
    }
}

*\S : * followed by a non-whitespace char

.*? : consume chars non-greedy.

(?<!\s)* : * following non-whitespace char. This is negative look behind, which does not consume non-whitespace char.

(?=\s|$) : positive lookahead. * should be followed by whitespace or end of line.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=69188&siteId=1