We generally use the substring string interception operation, especially when the pattern matching, we will obtain the matched start and End, then call str.substring(start, end)
interception substring [start, End) range. Recently I've been doing business on the sensitive word filtering, I would like to get forward longest string matching string exists in the dictionary, then desensitization treatment, part of the code is as follows:
public CharSequence searchNextMatch(String sourceText, int start, int end) {
TireTreeFindResult findResult = tireTree.find(sourceText, start, end);//获取匹配的结果,结果会包含匹配的起始下标和结束下标(exclude)
return sourceText.substring(findResult.getMatchTextStart(), findResult.getMatchTextEnd());
}
I use a String substring method returns the string result, this is not much problem. At that time, looked at the substring of source code, found him to be copied by substring method first char array, and then re-create a String object using the char array, I have always thought that substring will return a string like subList view, but it is not in this way. So had some problems: like this will not be slow to realize it? Why choose this Java api to achieve?
I go online to search a bit substring, and she found some clues. substring before jdk7 is achieved through the view, but to jdk7, then modified into an array and create a copy of the character. It is clear that the newly revised efficiency is relatively low, but this modification is a certain consideration.
If we get to by reptile one page html source htmlString, then the match and substring get to a satisfying match through positive List<String>
, List<String>
all in character array string point is htmlString.value
, after we get the information you need should be to discard htmlString
the let JVM recover lost, but because List<String>
all the elements are referenced with htmlString.value
, leading to the large object string can not be recycled, where it causes a memory leak.
Sometimes the interest of efficiency, we can achieve a view CharSequence class by themselves, the code is as follows:
/**
* 使用视图减少substring的开销
*/
private static class SubstringView implements CharSequence{
private String sourceString;
private int start;
private int end;
private int length;
public SubstringView(String sourceString, int start, int end) {
checkBounds(sourceString, start, end);
this.sourceString = sourceString;
this.start = start;
this.end = end;
this.length = end - start;
}
@Override
public int length() {
return this.length;
}
@Override
public char charAt(int index) {
if (start + index > end){
throw new IndexOutOfBoundsException(String.valueOf(index));
}
return sourceString.charAt(start + index);
}
@Override
public CharSequence subSequence(int start, int end) {
return new SubstringView(sourceString, start, end);
}
private void checkBounds(String string, int start, int end){
if (start > end){
throw new IllegalArgumentException("start > end");
}
if (end > string.length()){
throw new IllegalArgumentException("end is greater than source string length");
}
}
}
It should be noted, String CharSequence as a parameter while providing api, but the interior could just call the toString method, and to achieve CharSequence class, with the same general category toString default implementations for: full class name @hashCode
String str1 = "1234abc111";
String str2 = "abc";
CharSequence sequence = new SubstringView(str1, 4, 4 + 3);
System.out.println(str2.contains(sequence));
This sample output is false, the reason is the internal logic of the judgment contains: indexOf(s.toString()) > -1
, call SubstringView#toString
, toString we are no implementation. Then how do we achieve toString methods? To realize the method can only be called toString String#substring
method, which is no other way.
reference:
-
This blog describes it in detail, to analyze an array of bloggers Comparative Perspective multiplexing security, old new implementation, see: https://www.cnblogs.com/antineutrino/p/4213268.html
-
java.lang.String#substring
Source -
java.lang.StringBuilder#substring
Source