String utilities

Joiner

以分隔符连接字符串序列是必须的操作，但是如果字符序列中包含null就会很难处理。Joiner会让这一切变得简单易用。

Joiner joiner = Joiner.on("; ").skipNulls(); 

return joiner.join("Harry", null, "Ron", "Hermione");

返回的字符串为 "Harry; Ron; Hermione". 当然, 不使用 skipNulls 方法，你可以使用useForNull(String)方法将null替换掉。

也可以将Joiner 方法使用在一个对象上，这可以通过他们的 toString() 方法然后连接。

Joiner.on(",").join(Arrays.asList(1, 5, 7)); // returns "1,5,7"

Warning: joiner 实例是不可变的。 joiner 的配置方法始终返回一个新的 Joiner 对象，您必须使用它来获得所需的语义。这使得 Joiner 线程安全，可以作为 static final 常数使用。

Splitter

Java 中分割字符串的工具有一些奇怪的行为。例如，String.split 会默默丢弃最后的分隔符, 以及 StringTokenizer 完全支持五个空白字符。

Quiz: ",a,,b,".split(",") 这个方法将会返回什么?

"", "a", "", "b", ""
null, "a", null, "b", null
"a", null, "b"
"a", "b"
None of the above

正确的结果是第五个选项: "", "a", "", "b". 仅仅末尾的空字符串被跳过。

Splitter 允许使用令人满意的直截了当的流畅模式完全控制这些混乱的行为

Splitter.on(',') .trimResults() .omitEmptyStrings() .split("foo,bar,, qux");

返回包含"foo", "bar", "qux"的 Iterable<String> . Splitter 可以分离的类型有 Pattern, char, String, 或者 CharMatcher.

Base Factories

Method	Description	Example
`Splitter.on(char)`	Split on occurrences of a specific, individual character.	`Splitter.on(';')`
`Splitter.on(CharMatcher)`	Split on occurrences of any character in some category.	`Splitter.on(CharMatcher.BREAKING_WHITESPACE)` `Splitter.on(CharMatcher.anyOf(";,."))`
`Splitter.on(String)`	Split on a literal `String`.	`Splitter.on(", ")`
`Splitter.on(Pattern)` `Splitter.onPattern(String)`	Split on a regular expression.	`Splitter.onPattern("\r?\n")`
`Splitter.fixedLength(int)`	Splits strings into substrings of the specified fixed length. The last piece can be smaller than `length`, but will never be empty.	`Splitter.fixedLength(3)`

Modifiers

扫描二维码关注公众号，回复： 5491573 查看本文章

Method	Description	Example
`omitEmptyStrings()`	Automatically omits empty strings from the result.	`Splitter.on(',').omitEmptyStrings().split("a,,c,d")` returns `"a", "c", "d"`
`trimResults()`	Trims whitespace from the results; equivalent to `trimResults(CharMatcher.WHITESPACE)`.	`Splitter.on(',').trimResults().split("a, b, c, d")` returns `"a", "b", "c", "d"`
`trimResults(CharMatcher)`	Trims characters matching the specified `CharMatcher` from results.	`Splitter.on(',').trimResults(CharMatcher.is('_')).split("_a ,_b_ ,c__")` returns `"a ", "b_ ", "c"`.
`limit(int)`	Stops splitting after the specified number of strings have been returned.	`Splitter.on(',').limit(3).split("a,b,c,d")` returns `"a", "b", "c,d"`

想要得到一个List的时候请使用splitToList() 方法。

Warning: splitter 的实例始终是不可变的。 splitter 配置方法始终返回一个新的 Splitter对象。这使得 Splitter 是线程安全的，可以当做static final 常数被使用。

Map Splitters

还可以使用splitter通过指定第二定界符使用 withKeyValueSeparator()来反序列化map。产生的 MapSplitter 将使用分割器的分隔符将输入分割成entries，然后使用给定的键值分隔符将这些entries分割成键和值，返回Map<String,String>。

CharMatcher

之前, StringUtil 类不受限制，有很多这样的方法：

allAscii
collapse
collapseControlChars
collapseWhitespace
lastIndexNotOf
numSharedChars
removeChars
removeCrLf
retainAllChars
strip
stripAndCollapse
stripNonDigits

它们代表了两个概念的部分交叉积：

是什么构成了 "matching" 字符?
这些 "matching" 字符是干什么的?

为了简化这个困境，我们开发了 CharMatcher.

你可以认为 CharMatcher 代表着一个特殊的字符类，像数字和空白一样。实际上, CharMatcher只是一个字符上的布尔谓词 -- CharMatcher 实现了 [Predicate<Character>] -- 但是其太普遍代表着的是 "all whitespace characters" 或者 "all lowercase letters," Guava 为字符提供专门的语法和API。

CharMatcher的效用在于它允许在指定的字符类出现时执行的操作： trimming, collapsing, removing, retaining, 等.

String noControl = CharMatcher.javaIsoControl().removeFrom(string); // remove control 

characters String theDigits = CharMatcher.digit().retainFrom(string); // only the digits 

String spaced = CharMatcher.whitespace().trimAndCollapseFrom(string, ' '); // trim whitespace at ends, and replace/collapse whitespace into single spaces 

String noDigits = CharMatcher.javaDigit().replaceFrom(string, "*"); // star out all digits 

String lowerAndDigit = CharMatcher.javaDigit().or(CharMatcher.javaLowerCase()).retainFrom(string); // eliminate all characters that aren't digits or lowercase

Note: CharMatcher仅处理 char 值; 其不理解 0x10000 到 0x10FFFF.的补充Unicode码点。这样的逻辑字符使用代理对编码为String，CharMatcher将这些字符视为两个独立的字符。

Obtaining CharMatchers

Many needs can be satisfied by the provided CharMatcher factory methods:

Other common ways to obtain a CharMatcher include:

Method	Description
`anyOf(CharSequence)`	Specify all the characters you wish matched. For example, `CharMatcher.anyOf("aeiou")` matches lowercase English vowels.
`is(char)`	Specify exactly one character to match.
`inRange(char, char)`	Specify a range of characters to match, e.g. `CharMatcher.inRange('a', 'z')`.

Additionally, CharMatcher has negate(), and(CharMatcher), and or(CharMatcher). These provide simple boolean operations on CharMatcher.

Using CharMatchers

CharMatcher provides a wide variety of methods to operate on occurrences of the specified characters in any CharSequence. There are more methods provided than we can list here, but some of the most commonly used are:

Method	Description
`collapseFrom(CharSequence, char)`	Replace each group of consecutive matched characters with the specified character. For example, `WHITESPACE.collapseFrom(string, ' ')` collapses whitespaces down to a single space.
`matchesAllOf(CharSequence)`	Test if this matcher matches all characters in the sequence. For example, `ASCII.matchesAllOf(string)` tests if all characters in the string are ASCII.
`removeFrom(CharSequence)`	Removes matching characters from the sequence.
`retainFrom(CharSequence)`	Removes all non-matching characters from the sequence.
`trimFrom(CharSequence)`	Removes leading and trailing matching characters.
`replaceFrom(CharSequence, CharSequence)`	Replace matching characters with a given sequence.

(Note: all of these methods return a String, except for matchesAllOf, which returns a boolean.)

Charsets

Don't do this:

try { 
   bytes = string.getBytes("UTF-8"); 
} catch (UnsupportedEncodingException e) { 
   // how can this possibly happen? 
   throw new AssertionError(e);
}

Do this instead:

bytes = string.getBytes(Charsets.UTF_8);

Charsets provides constant references to the six standard Charset implementations guaranteed to be supported by all Java platform implementations. Use them instead of referring to charsets by their names.

TODO: an explanation of charsets and when to use them

(Note: If you're using JDK7, you should use the constants in StandardCharsets

CaseFormat

CaseFormat is a handy little class for converting between ASCII case conventions — like, for example, naming conventions for programming languages. Supported formats include:

Format	Example
`LOWER_CAMEL`	`lowerCamel`
`LOWER_HYPHEN`	`lower-hyphen`
`LOWER_UNDERSCORE`	`lower_underscore`
`UPPER_CAMEL`	`UpperCamel`
`UPPER_UNDERSCORE`	`UPPER_UNDERSCORE`

Using it is relatively straightforward:

CaseFormat.UPPER_UNDERSCORE.to(CaseFormat.LOWER_CAMEL, "CONSTANT_NAME"); // returns "constantName"

We find this especially useful, for example, when writing programs that generate other programs.

Strings

A limited number of general-purpose String utilities reside in the Strings class.

Guava学习之String utilities