I misunderstood String # substring method

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/lidelin10/article/details/102764626

We generally use the substring string interception operation, especially when the pattern matching, we will obtain the matched start and End, then call str.substring(start, end)interception substring [start, End) range. Recently I've been doing business on the sensitive word filtering, I would like to get forward longest string matching string exists in the dictionary, then desensitization treatment, part of the code is as follows:

public CharSequence searchNextMatch(String sourceText, int start, int end) {
    TireTreeFindResult findResult = tireTree.find(sourceText, start, end);//获取匹配的结果,结果会包含匹配的起始下标和结束下标(exclude)
    return sourceText.substring(findResult.getMatchTextStart(), findResult.getMatchTextEnd());
}

I use a String substring method returns the string result, this is not much problem. At that time, looked at the substring of source code, found him to be copied by substring method first char array, and then re-create a String object using the char array, I have always thought that substring will return a string like subList view, but it is not in this way. So had some problems: like this will not be slow to realize it? Why choose this Java api to achieve?

I go online to search a bit substring, and she found some clues. substring before jdk7 is achieved through the view, but to jdk7, then modified into an array and create a copy of the character. It is clear that the newly revised efficiency is relatively low, but this modification is a certain consideration.

If we get to by reptile one page html source htmlString, then the match and substring get to a satisfying match through positive List<String>, List<String>all in character array string point is htmlString.value, after we get the information you need should be to discard htmlStringthe let JVM recover lost, but because List<String>all the elements are referenced with htmlString.value, leading to the large object string can not be recycled, where it causes a memory leak.

Sometimes the interest of efficiency, we can achieve a view CharSequence class by themselves, the code is as follows:

/**
* 使用视图减少substring的开销
*/
private static class SubstringView implements CharSequence{
    private String sourceString;
    private int start;
    private int end;
    private int length;


    public SubstringView(String sourceString, int start, int end) {
        checkBounds(sourceString, start, end);
        this.sourceString = sourceString;
        this.start = start;
        this.end = end;
        this.length = end - start;
    }


    @Override
    public int length() {
        return this.length;
    }


    @Override
    public char charAt(int index) {
        if (start + index > end){
            throw new IndexOutOfBoundsException(String.valueOf(index));
        }
        return sourceString.charAt(start + index);
    }


    @Override
    public CharSequence subSequence(int start, int end) {
        return new SubstringView(sourceString, start, end);
    }


    private void checkBounds(String string, int start, int end){
        if (start > end){
            throw new IllegalArgumentException("start > end");
        }


        if (end > string.length()){
            throw new IllegalArgumentException("end is greater than source string length");
        }
    }
}

It should be noted, String CharSequence as a parameter while providing api, but the interior could just call the toString method, and to achieve CharSequence class, with the same general category toString default implementations for: full class name @hashCode

String str1 = "1234abc111";
String str2 = "abc";
CharSequence sequence = new SubstringView(str1, 4, 4 + 3);
System.out.println(str2.contains(sequence));

This sample output is false, the reason is the internal logic of the judgment contains: indexOf(s.toString()) > -1, call SubstringView#toString, toString we are no implementation. Then how do we achieve toString methods? To realize the method can only be called toString String#substringmethod, which is no other way.

reference:

  1. This blog describes it in detail, to analyze an array of bloggers Comparative Perspective multiplexing security, old new implementation, see: https://www.cnblogs.com/antineutrino/p/4213268.html

  2. java.lang.String#substringSource

  3. java.lang.StringBuilder#substringSource

Guess you like

Origin blog.csdn.net/lidelin10/article/details/102764626