Java string split () method is used, not as simple as you think

Look at the following methods to predict in advance what, after the split method, divided by comma after the array length of the array is the number of generation, many people feel that the visual is 8, but the result was unexpected. 5 instead of 8.


    private static void testSplit() {

        String ss = ",aa,bb,cc,dd,,,";

        String[] array = ss.split(",");

 

        System.out.println (array.length); // result is 5, instead of the expected 8

        for (int i = 0; i < array.length; i++) {

            System.out.println(array[i]);

        }

    }

Then there are the eight children of the operating results map:

20170713135938533



You can see, the result is really not our guess is 5 8, this is why?


The reasons of:


Point went to see the source code, debug it, as shown below:

20170713140454908



You can see, just the beginning, there is indeed 0-7,8 split out of things exist, there are three empty string "." Not null. Do not worry null pointer exception.


Then see the following continue to go, if statement, you see, he began to judge from the tail of the list, if the length of the string behind this list is 0, then this would resultSize and reduction, and has been a while loop,


Until not meet the conditions, then the list is the value of the subscript 5,6,7 was abandoned, and the results of newborn string array, the size of length after the processing becomes, the array length becomes 5 .


list.subList(0, resultSize).toArray(result)


Is this sentence friends.


He just behind the end of the empty string processing, does not handle empty string intermediate the beginning and the empty string.


After know this principle, then the question becomes how the future, you have gains friends.


 


Here is another update


Now requires a string


"aa12sas32sasa223sas12as12wqe"//去掉数字,然后弄成数组

"aa,,sas,,sasa,,,,sasas,,,"//去掉逗号,不管几个逗号,都去掉

"aa  sas sa sa     sas  as  "//去掉空格,也不管几个,

把他们给分解成数组,同时刨除掉不用的多余信息。


具体实现代码如下:


    private static void testSplitPlus() {

        String ss = "aa12sas32sasa223sas12as12wqe";

        String[] array = ss.split("[\\d]+");

        System.out.println(Arrays.toString(array));

        ss = "aa,,sas,,sasa,,,,sasas,,,";

        array = ss.split("[,]+");

        System.out.println(Arrays.toString(array));

        ss = "aa  sas sa sa     sas  as  ";

        array = ss.split("[\\s]+");

        System.out.println(Arrays.toString(array));

    }

下面再看看这个split的源码。


    public String[] split(String regex) {

        return split(regex, 0);

    }

不知道你看到没,那个regex,不就是正则表达式的意思嘛?


那就明白了,这个方法是支持正则表达式操作的。那上面的代码就好解释啦。


第一个[\\d]+这个解释就是\\是转义符,d表示数字,括起来后面的加号表示一个或者多个,那么就可以解释运行结果啦。


第二个[,]解释类似,就是逗号出现一次或多次,


The third, \\ s, that match any whitespace characters, including spaces, tabs, page breaks, and so on. Is equivalent to [\ f \ n \ r \ t \ v].


Then below, to see the results.

20170713142049526


This is the result we want it.


Guess you like

Origin blog.51cto.com/14028890/2422416
Recommended