String.split函数的用法

今天一个朋友问了我一个关于split的问题,突然发现以前都是使用的默认情况,全然不知spilt竟然是有两个参数的。

那么这里就好好再来学习一下split的用法。

spilt函数主要有两种参数形式:

public String[] split(String regex)

public String[] split(String regex, int limit)

第一种参数形式可以看做是第二种参数形式的第二个参数默认为0的情况,这种默认情况也是我们最常用的。

既然要好好学习spilt,不妨先来看看源码:

/**
  * Splits this string around matches of the given
  *
  * 围绕给定的匹配将这个字符串分开
  * 
  * <p> The array returned by this method contains each substring of this
  * string that is terminated by another substring that matches the given
  * expression or is terminated by the end of the string.  The substrings in
  * the array are in the order in which they occur in this string.  If the
  * expression does not match any part of the input then the resulting array
  * has just one element, namely this string.
  *
  * 该方法返回的数组包含该字符串的每个子字符串,该子字符串由另一个匹配给定表达式的子字符串终止, 
  * 或由字符串结束终止。数组中的子字符串按照它们在这个字符串中出现的顺序排列。如果表达式不匹配输 
  * 入的任何部分,那么结果数组只有一个元素,即这个字符串。
  *
  * <p> When there is a positive-width match at the beginning of this
  * string then an empty leading substring is included at the beginning
  * of the resulting array. A zero-width match at the beginning however
  * never produces such empty leading substring.
  *
  * 当字符串开头有一个正宽度匹配时,结果数组的开头包含一个空的前导子字符串。但是一开始的零宽度匹 
  * 配永远不会产生这样空的前导子字符串。
  *
  * <p> The {@code limit} parameter controls the number of times the
  * pattern is applied and therefore affects the length of the resulting
  * array.  If the limit <i>n</i> is greater than zero then the pattern
  * will be applied at most <i>n</i>&nbsp;-&nbsp;1 times, the array's
  * length will be no greater than <i>n</i>, and the array's last entry
  * will contain all input beyond the last matched delimiter.  If <i>n</i>
  * is non-positive then the pattern will be applied as many times as
  * possible and the array can have any length.  If <i>n</i> is zero then
  * the pattern will be applied as many times as possible, the array can
  * have any length, and trailing empty strings will be discarded.
  *
  * {int limit}参数控制模式应用的次数,因此会影响结果数组的长度。如果限制n > 0,则模式最多应用 
  * n-1次,数组的长度将不大于n,数组的最后一个条目将包含超过最后一个匹配分隔符的所有输入。如果n 
  * 是非正的,则模式将被尽可能多地应用,数组可以有任意长度。如果n为0,则模式将被尽可能多次应用, 
  * 数组可以有任意长度,并丢弃尾随的空字符串。
  */
  public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
      ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
      (regex.length() == 2 &&
        regex.charAt(0) == '\\' &&
        (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
        ((ch-'a')|('z'-ch)) < 0 &&
        ((ch-'A')|('Z'-ch)) < 0)) &&
      (ch < Character.MIN_HIGH_SURROGATE ||
        ch > Character.MAX_LOW_SURROGATE))
    {
      int off = 0;
      int next = 0;
      boolean limited = limit > 0;
      ArrayList<String> list = new ArrayList<>();
      while ((next = indexOf(ch, off)) != -1) {
        if (!limited || list.size() < limit - 1) {
          list.add(substring(off, next));
          off = next + 1;
        } else {    // last one
          //assert (list.size() == limit - 1);
          list.add(substring(off, value.length));
          off = value.length;
          break;
        }
      }
      // If no match was found, return this
      if (off == 0)
        return new String[]{this};

      // Add remaining segment
      if (!limited || list.size() < limit)
        list.add(substring(off, value.length));

      // Construct result
      int resultSize = list.size();
      if (limit == 0) {
        while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
          resultSize--;
        }
      }
      String[] result = new String[resultSize];
      return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
  }

源码中的注释着重对第二个参数进行了解释,第二个参数取值不同,会有不同的情况,下面我也就第二个参数的取值来对split函数进行分析。

一,limit 大于 0

当limit大于0时,它限制regex最多成功匹配limit-1次,也就是说字符串最多被分成limit个子串。此时,spilt会保留分割出来的空字符串(当两个regex连续匹配或者regex在头尾匹配,会产生空字符串),直到达到匹配上限。

    val str = "a*b*c"
    val res = str.split("\\*",2)
    println(res.toSeq)//Array(a, b*c)


    val str = "a*b*c"
    val res = str.split("\\*",4)
    println(res.toSeq)//Array(a, b, c)


    val str = "*a*b*c*"
    val res = str.split("\\*",3)
    println(res.toSeq)//Array(, a, b*c*)


    val str = "*a*b*c**"
    val res = str.split("\\*",6)
    println(res.toSeq)//Array(, a, b, c, , )


    val str = "*a*b*c**"
    val res = str.split("\\*",5)
    println(res.toSeq)//Array(, a, b, c, *)

二,limit 等于 0

当limit等于0时,split函数会尽可能的多匹配regex,但不再保留处于末尾位置的空字符串。这里的一个特殊情况是,当被分割的字符串时,分割结果仍然是一个空字符串组成的数组。

    val str = "a*b*c"
    val res = str.split("\\*",0)
    println(res.toSeq)//Array(a, b, c)


    val str = "*a*b*c**"
    val res = str.split("\\*",0)
    println(res.toSeq)//Array(, a, b, c)


    val str = ""
    val res = str.split("\\*",0)
    println(res.toSeq)//Array()

三,limit 小于 0

当limit为负数的时候,split函数会尽可能多的匹配regex,并且保留末尾的空字符串。

    val str = "*a*b**c***"
    val res = str.split("\\*",-1)
    println(res.toSeq)//Array(, a, b, , c, , , )

猜你喜欢

转载自blog.csdn.net/big_data1/article/details/82218855