20190906 On Java8 Chapter XVIII string

Chapter XVIII string

+ Overloaded with StringBuilder

For Stringa +and +=are the only two Java through operator overloading, Java allow programmers to reload any other operator. The compiler automatically in the java.lang.StringBuilderclass is optimized.

StringBuilderJava SE5 is introduced, before using a StringBuffer. The latter is thread-safe , so the cost will be greater. Use StringBuilder string manipulation faster.

You can use JDK comes with javaptools to decompile:

javap -c Concatenation.class

Formatted output

printf()

Java SE5 launched a C language printf()-style formatted output of this function.

System.out.format()

format()The method mimics the C language printf(). format()And printf()it is equivalent.

Formatter class

In Java, all of the formatting functions are a java.util.Formatterclass processed. Can be Formatterseen as a translator, you will translate the data format string to the desired result.

FormatterConstructor overloads to support a plurality of output paths, but still the most commonly used PrintStream, OutputStreamand File.

Formatting Modifier

grammar:

%[argument_index$][flags][width][.precision]conversion

The most common application is the control field of a minimum length, which can be specified widthto achieve.

The widthopposite is precisionused to specify the maximum length. widthIt can be applied to various types of data conversion, and their behavior is the same. precisionIs not, not all types can be used precision, and, when applied to different types of data conversion, precisionthe meaning is different. In the precisionapplied String, it represents the maximum number of characters when printing output string. In the precisionwhen applied to floating-point number that represents the fractional part of the number of bits to be displayed (default is 6 decimal places), if the number of decimal places is too rounded, too little at the end of zeros. Since no fractional integer part, it precisioncan not be applied to integer, if you apply for an integer precision, it will trigger an exception.

Formatter Conversion

Types of meaning
d Int (decimal)
c Unicode character
b Boolean value
s String
f Floating-point (decimal)
e Float (scientific notation)
x Integer (hex)
h Hash code (hex)
% Literal "%"

String.format()

String.format()It is a static method, which takes the Formatter.format()same way as the parameters, but returns a String object. When you only need to use a format()method of time, String.format()it is easy to use.

String string = String.format("(t%d, q%d) %s", transactionID, queryID, message);

Regular Expressions

Regular expressions are a powerful and flexible text processing tools.

basis

In other languages, \\he said, "I want to insert a general expression of (literally) backslash positive, do not give it any special significance." In Java, \\it means "I want to insert a backslash a regular expression, so the following character has special meaning." For example, if you want to represent a single digit, then because of the regular expression is \\d. If you want to insert an ordinary backslash, it should be written \\\. However, line breaks and tabs and stuff just need to use a single backslash: \n \t.

In the regular expression, the expression in parentheses are grouped by vertical |or operational.

\\WIt means that a non-word character (if W lowercase \\w, it means a word character).

Application of regular expressions easiest way is to use the String class built-in functions:

boolean matches = "-1234".matches("-?\\d+");
String[] split = knights.split(regex);
String replaceFirst = s.replaceFirst("f\\w+", "located");
String replaceAll = s.replaceAll("shrubbery|tree|herring", "banana");

Creating regular expressions

A complete list of the sub-structure of regular expressions, refer to the JDK documentation java.util.regexpackage Patternclass.

expression meaning
B Specifies the character B
\ socialization Character hexadecimal value 0xhh
\ Uhhhh Hexadecimal Unicode character showed 0xhhhh
\t Tab Tab
\n Newline
\r Enter
\f Feed
\e Escape (Escape)

The following are some of the typical way to create a character class, as well as some of the predefined categories:
expressions | Meaning
--- | ---
| any character.
[Abc] | contain any characters a, b, or c
[^ abc] | any character other than a, b and c (negative)
[a-zA-Z] | any character (range) from a to z or from a to Z
[abc [hij]] | a , b, c, h, i, any character j in
[az && [hij]] | any h, i or j (cross)
\ S | whitespace (space, tab, line feed, form feed, carriage return)
\ S | non-blank character ( [^ \ S])
\ D | digital ([0-9])
\ D | non-digital ([^ 0-9])
\ W | word character ([A-zA-Z ~ 0-9])
\ W is | non word character ([^ \ w])

Logical Operators meaning
XY Y followed X
X Y
(X) Capture group (capturing group). Can be used in the expression \ i quote i-th capturing group
Boundary matchers meaning
^ Beginning of the line
$ End of the line
\b Word boundary
\B Non-word boundary
\G Before the end of a match

The purpose is not to write regular expressions difficult to understand , but try to write the job done, the simplest and the most necessary of regular expressions. Once the real start using regular expressions, you will find that before writing new expression, you usually reference code has been used in regular expressions.

quantifier

Quantifier capture mode describes a text input mode:

  • Greed type: quantifiers are always greedy, unless there are other options to be set. Greed will find expression as much as possible match for all possible modes. A typical reason causing this problem is to assume that our model can only match the first possible character set, if it is greedy, it will continue to match down.
  • Barely type: a question mark to specify that this quantifier match the minimum number of characters required to satisfy mode. It is also called lazy least matching or non-greedy not greedy.
  • Possession type: Currently, this type of quantifiers only in the Java language is available (not available in other languages), and also more advanced, so we probably will not use it immediately. When regular expressions are applied to the String, it will generate considerable amounts of state, you can go back to when the matching fails. The "possession" quantifier does not save these intermediate states, so they can prevent backtracking. They are often used to prevent runaway regular expression, regular expression can be performed to make it more efficient.
Greed type Barely type
X? X??
X* X*?
X+ X+?
X{n} X{n}?
X{n,} X{n,}?
X{n,m} X{n,m}?

We should be very well aware that the expression X usually have to use parentheses, so that it can go to perform according to our desired effect.

It is easy to confuse when to use regular expressions, because it is a new language on top of Java.

CharSequence

Interface CharSequencefrom CharBuffer, String, StringBuffer, StringBuilderclass abstract definition of a generalized sequence of characters. Most regular expression operators accept CharSequencetype parameters.

Pattern and Matcher

导入java.util.regex包,然后用static Pattern.compile()方法来编译你的正则表达式。它会根据你的String类型的正则表达式生成一个Pattern对象。接下来,把你想要检索的字符串传入Pattern对象的matcher()方法。matcher()方法会生成一个Matcher对象,它有很多功能可用。

Pattern类还提供了一个static方法:

static boolean matches(String regex, CharSequence input)

编译后的Pattern对象还提供了split()方法,它从匹配了regex的地方分割输入字符串,返回分割后的子字符串String数组。

通过调用Pattern.matcher()方法,并传入一个字符串参数,我们得到了一个Matcher对象。使用Matcher上的方法,我们将能够判断各种不同类型的匹配是否成功:

boolean matches() 
boolean lookingAt() 
boolean find() 
boolean find(int start)

find()

Matcher.find()方法可用来在CharSequence中查找多个匹配。

find()方法像迭代器那样向前遍历输入字符串。重载的find()接收一个整型参数,该整数表示字符串中字符的位置,并以其作为搜索的起点,能够根据其参数的值,不断重新设定搜索的起始位置。

组(Groups)

组是用括号划分的正则表达式,可以根据组的编号来引用某个组。组号为0表示整个表达式,组号1表示被第一对括号括起来的组,以此类推。

Matcher对象提供了一系列方法,用以获取与组相关的信息:
public int groupCount() 返回该匹配器的模式中的分组数目,组0不包括在内。
public String group() 返回前一次匹配操作(例如find())的第0组(整个匹配)。
public String group(int i) 返回前一次匹配操作期间指定的组号,如果匹配成功,但是指定的组没有匹配输入字符串的任何部分,则将返回null
public int start(int group) 返回在前一次匹配操作中寻找到的组的起始索引。
public int end(int group) 返回在前一次匹配操作中寻找到的组的最后一个字符索引加一的值。

start()和end()

注意,find()可以在输入的任意位置定位正则表达式,而lookingAt()matches()只有在正则表达式与输入的最开始处就开始匹配时才会成功。matches()只有在整个输入都匹配正则表达式时才会成功,而lookingAt()只要输入的第一部分匹配就会成功。

Pattern标记

Pattern类的compile()方法还有另一个版本,它接受一个标记参数,以调整匹配行为:

Pattern Pattern.compile(String regex, int flag)

其中的flag来自以下Pattern类中的常量
编译标记 | 效果
---|---
Pattern.CANON_EQ | 当且仅当两个字符的完全规范分解相匹配时,才认为它们是匹配的。例如,如果我们指定这个标记,表达式\u003F就会匹配字符串?。默认情况下,匹配不考虑规范的等价性
Pattern.CASE_INSENSITIVE(?i) | 默认情况下,大小写不敏感的匹配假定只有US-ASCII字符集中的字符才能进行。这个标记允许模式匹配不考虑大小写(大写或小写)。通过指定UNICODE_CASE标记及结合此标记。基于Unicode的大小写不敏感的匹配就可以开启了
Pattern.COMMENTS(?x) | 在这种模式下,空格符将被忽略掉,并且以#开始直到行末的注释也会被忽略掉。通过嵌入的标记表达式也可以开启Unix的行模式
Pattern.DOTALL(?s) | 在dotall模式下,表达式.匹配所有字符,包括行终止符。默认情况下,.不会匹配行终止符
Pattern.MULTILINE(?m) | 在多行模式下,表达式^和\(分别匹配一行的开始和结束。^还匹配输入字符串的开始,而\)还匹配输入字符串的结尾。默认情况下,这些表达式仅匹配输入的完整字符串的开始和结束
Pattern.UNICODE_CASE(?u) | 当指定这个标记,并且开启CASE_INSENSITIVE时,大小写不敏感的匹配将按照与Unicode标准相一致的方式进行。默认情况下,大小写不敏感的匹配假定只能在US-ASCII字符集中的字符才能进行
Pattern.UNIX_LINES(?d) | 在这种模式下,在.、^和$的行为中,只识别行终止符\n

在这些标记中,Pattern.CASE_INSENSITIVEPattern.MULTILINE以及Pattern.COMMENTS(对声明或文档有用)特别有用。请注意,你可以直接在正则表达式中使用其中的大多数标记,只需要将上表中括号括起来的字符插入到正则表达式中,你希望它起作用的位置即可。还可以通过“或”(|)操作符组合多个标记的功能。

Pattern p = Pattern.compile("^java", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

reset()

通过reset()方法,可以将现有的Matcher对象应用于一个新的字符序列。

扫描输入

Scanner的构造器可以接收任意类型的输入对象,包括File对象、InputStreamString或者Readable实现类。Readable是Java SE5中新加入的一个接口,表示“具有read()方法的某种东西”。

所有的基本类型(除char之外)都有对应的next方法,包括BigDecimal和BigInteger。所有的next方法,只有在找到一个完整的分词之后才会返回。Scanner还有相应的hasNext方法,用以判断下一个输入分词是否是所需的类型,如果是则返回true。

默认情况下,Scanner根据空白字符对输入进行分词,可以用正则表达式指定自己所需的分隔符。可以用useDelimiter()来设置分隔符,同时,还有一个delimiter()方法,用来返回当前正在作为分隔符使用的Pattern对象。

除了能够扫描基本类型之外,还可以使用自定义的正则表达式进行扫描。当next()方法配合指定的正则表达式使用时,将找到下一个匹配该模式的输入部分,调用match()方法就可以获得匹配的结果。在配合正则表达式使用扫描时,有一点需要注意:它仅仅针对下一个输入分词进行匹配,如果你的正则表达式中含有分隔符,那永远不可能匹配成功。

StringTokenizer类

在Java引入正则表达式(J2SE1.4)和Scanner类(Java SE5)之前,分割字符串的唯一方法是使用StringTokenizer来分词。不过,现在有了正则表达式和Scanner,我们可以使用更加简单、更加简洁的方式来完成同样的工作了。

使用正则表达式或Scanner对象,我们能够以更加复杂的模式来分割一个字符串,而这对于StringTokenizer来说就很困难了。基本上,我们可以放心地说,StringTokenizer已经可以废弃不用了。

Guess you like

Origin www.cnblogs.com/huangwenjie/p/11478638.html