By default, a Scanner splits input tokens along whitespace, but we can also specify our own delimiter pattern in the form of a regular expression.
example 1:
// strings/ScannerDelimiter.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.
import java.util.*;
public class ScannerDelimiter {
public static void main(String[] args) {
Scanner scanner = new Scanner("12, 42, 78, 99, 42");
scanner.useDelimiter("\\s*,\\s*");
while (scanner.hasNextInt()) {
System.out.println(scanner.nextInt());
}
}
}
/* Output:
12
42
78
99
42
*/
example 2:
This example reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish"; Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*"); System.out.println(s.nextInt()); System.out.println(s.nextInt()); System.out.println(s.next()); System.out.println(s.next()); s.close();
prints the following output:
1 2 red blue
The same output can be generated with this code, which uses a regular expression to parse all four tokens at once:
String input = "1 fish 2 fish red fish blue fish"; Scanner s = new Scanner(input); s.findInLine("(\\d+) fish (\\d+) fish (\\w+) fish (\\w+)"); MatchResult result = s.match(); for (int i=1; i<=result.groupCount(); i++) { System.out.println(result.group(i)); } s.close();
The default whitespace delimiter used by a scanner is as recognized by Character
.isWhitespace
. The reset()
method will reset the value of the scanner's delimiter to the default whitespace delimiter regardless of whether it was previously changed.
example 3:
// strings/ThreatAnalyzer.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.
import java.util.*;
import java.util.regex.*;
public class ThreatAnalyzer {
static String threatData =
"58.27.82.161@08/10/2015\n"
+ "204.45.234.40@08/11/2015\n"
+ "58.27.82.161@08/11/2015\n"
+ "58.27.82.161@08/12/2015\n"
+ "58.27.82.161@08/12/2015\n"
+ "[Next log section with different data format]";
public static void main(String[] args) {
Scanner scanner = new Scanner(threatData);
String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@" + "(\\d{2}/\\d{2}/\\d{4})";
while (scanner.hasNext(pattern)) {
scanner.next(pattern);
MatchResult match = scanner.match();
String ip = match.group(1);
String date = match.group(2);
System.out.format("Threat on %s from %s%n", date, ip);
}
}
}
/* Output:
Threat on 08/10/2015 from 58.27.82.161
Threat on 08/11/2015 from 204.45.234.40
Threat on 08/11/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
*/
Regular expression
POSIX | Non-standard | Perl/Tcl | Vim | Java | ASCII | Description |
---|---|---|---|---|---|---|
[:ascii:] [29] |
\p{ASCII} |
[\x00-\x7F] |
ASCII characters | |||
[:alnum:] |
\p{Alnum} |
[A-Za-z0-9] |
Alphanumeric characters | |||
[:word:] [29] |
\w |
\w |
\w |
[A-Za-z0-9_] |
Alphanumeric characters plus "_" | |
\W |
\W |
\W |
[^A-Za-z0-9_] |
Non-word characters | ||
[:alpha:] |
\a |
\p{Alpha} |
[A-Za-z] |
Alphabetic characters | ||
[:blank:] |
\s |
\p{Blank} |
[ \t] |
Space and tab |
references:
1. On Java 8 - Bruce Eckel
2. https://github.com/wangbingfeng/OnJava8-Examples/blob/master/strings/ScannerDelimiter.java
3. https://docs.oracle.com/javase/8/docs/api/java/util/Scanner.html
4. https://github.com/wangbingfeng/OnJava8-Examples/blob/master/strings/ThreatAnalyzer.java