strings - Scanner Delimiters

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/wangbingfengf98/article/details/93419835

By default, a Scanner splits input tokens along whitespace, but we can also specify our own delimiter pattern in the form of a regular expression.

example 1:

// strings/ScannerDelimiter.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.

import java.util.*;

public class ScannerDelimiter {
  public static void main(String[] args) {
    Scanner scanner = new Scanner("12, 42, 78, 99, 42");
    scanner.useDelimiter("\\s*,\\s*");
    while (scanner.hasNextInt()) {
      System.out.println(scanner.nextInt());
    }
  }
}
/* Output:
12
42
78
99
42
*/

example 2:

This example reads several items in from a string:


     String input = "1 fish 2 fish red fish blue fish";
     Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
     System.out.println(s.nextInt());
     System.out.println(s.nextInt());
     System.out.println(s.next());
     System.out.println(s.next());
     s.close();
 

prints the following output:


     1
     2
     red
     blue
 

The same output can be generated with this code, which uses a regular expression to parse all four tokens at once:


     String input = "1 fish 2 fish red fish blue fish";
     Scanner s = new Scanner(input);
     s.findInLine("(\\d+) fish (\\d+) fish (\\w+) fish (\\w+)");
     MatchResult result = s.match();
     for (int i=1; i<=result.groupCount(); i++) {
         System.out.println(result.group(i)); 
     }
     s.close();
 

The default whitespace delimiter used by a scanner is as recognized by Character.isWhitespace. The reset() method will reset the value of the scanner's delimiter to the default whitespace delimiter regardless of whether it was previously changed.

example 3:

// strings/ThreatAnalyzer.java
// (c)2017 MindView LLC: see Copyright.txt
// We make no guarantees that this code is fit for any purpose.
// Visit http://OnJava8.com for more book information.

import java.util.*;
import java.util.regex.*;

public class ThreatAnalyzer {
  static String threatData =
      "58.27.82.161@08/10/2015\n"
          + "204.45.234.40@08/11/2015\n"
          + "58.27.82.161@08/11/2015\n"
          + "58.27.82.161@08/12/2015\n"
          + "58.27.82.161@08/12/2015\n"
          + "[Next log section with different data format]";

  public static void main(String[] args) {
    Scanner scanner = new Scanner(threatData);
    String pattern = "(\\d+[.]\\d+[.]\\d+[.]\\d+)@" + "(\\d{2}/\\d{2}/\\d{4})";
    while (scanner.hasNext(pattern)) {
      scanner.next(pattern);
      MatchResult match = scanner.match();
      String ip = match.group(1);
      String date = match.group(2);
      System.out.format("Threat on %s from %s%n", date, ip);
    }
  }
}
/* Output:
Threat on 08/10/2015 from 58.27.82.161
Threat on 08/11/2015 from 204.45.234.40
Threat on 08/11/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
Threat on 08/12/2015 from 58.27.82.161
*/

Regular expression

POSIX Non-standard Perl/Tcl Vim Java ASCII Description
  [:ascii:][29]     \p{ASCII} [\x00-\x7F] ASCII characters
[:alnum:]       \p{Alnum} [A-Za-z0-9] Alphanumeric characters
  [:word:][29] \w \w \w [A-Za-z0-9_] Alphanumeric characters plus "_"
    \W \W \W [^A-Za-z0-9_] Non-word characters
[:alpha:]     \a \p{Alpha} [A-Za-z] Alphabetic characters
[:blank:]     \s \p{Blank} [ \t] Space and tab

references:

1. On Java 8 - Bruce Eckel

2. https://github.com/wangbingfeng/OnJava8-Examples/blob/master/strings/ScannerDelimiter.java

3. https://docs.oracle.com/javase/8/docs/api/java/util/Scanner.html

4. https://github.com/wangbingfeng/OnJava8-Examples/blob/master/strings/ThreatAnalyzer.java

5. https://en.wikipedia.org/wiki/Regular_expression

猜你喜欢

转载自blog.csdn.net/wangbingfengf98/article/details/93419835