Danilo Piazzalunga :
I am trying to use java.util.Scanner
to tokenize an arithmetic expression, where the delimiters can either be:
- Whitespace (
\s+
or\p{Space}+
), which should be discarded - Punctation (
\p{Punct}
), which should be returned as tokens
Example
Given this expression:
12 + (ab-bc*3)
I would like Scanner to return these tokens:
12
+
(
ab
-
bc
*
3
)
Code
So far, I have only been able to:
- Eat up all of the punctation characters (not what I wanted):
new Scanner("12 + (ab-bc*3)").useDelimiter("\\p{Space}+|\\p{Punct}").tokens().collect(Collectors.toList())
- Result:
"12", "", "", "", "ab", "bc", "3"
- Achieve partial success using positive lookahead
new Scanner("12 + (ab-bc*3)").useDelimiter("\\p{Space}+|(?=\\p{Punct})").tokens().collect(Collectors.toList())
- Result:
"12", "+", "(ab", "-bc", "*3", ")"
But now I am stuck.
Wiktor Stribiżew :
A matching approach allows you to use a much simpler regex here:
String text = "12 + (ab-bc*3)";
List<String> results = Pattern.compile("\\p{Punct}|\\w+").matcher(text)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
System.out.println(results);
// => "12", "+", "(", "ab", "-", "bc", "*", "3", ")"
See Java demo.
The regex matches
\p{Punct}
- punctuation and symbol chars|
- or\w+
- 1+ letters, digits or_
chars.
See the regex demo (converted to PCRE for the demo purpose).
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=327541&siteId=1