Performance optimization practice of Java string usage

When writing JAVA programs, there is no need to manually apply for memory and release memory like C. It is completely managed by JVM, which improves development efficiency. However, if you do not pay attention to some details when writing code, it will cause waste of memory space and code. Poor performance, etc. Next, take the use of strings as an example, because strings are the most used data type, and strings in Java are immutable types:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
    ... ...
}

The benefit of this immutable type is that it is inherently thread-safe in a multithreaded environment. However, it also brings some problems. For example, when splicing and intercepting strings, because char arrays cannot be shared, more redundant string instances will be generated, and the more instances, the more memory they will occupy, and at the same time It will also increase the burden of JVM garbage collection. Next, use the Benchmark tool to test the performance comparison of various operations on strings.

1. String concatenation

Test code:

@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class StringBuilderBenchmark {
	
    @Benchmark
    public void testStringAdd() {
        String a = "";
        for (int i = 0; i < 10; i++) {
            a += i;
        }
        print(a);
    }
	
    @Benchmark
    public void testStringBuilderAdd() {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 10; i++) {
            sb.append(i);
        }
        print(sb.toString());
    }
	
    private void print(String a) {
    }
	
    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(StringBuilderBenchmark.class.getSimpleName())
                .output("./StringBuilderBenchmark.log")
                .build();
        new Runner(options).run();
    }
}

Test Results:

Benchmark                                     Mode  Cnt      Score      Error   Units
StringBuilderBenchmark.testStringAdd         thrpt   20  22163.429 ±  537.729  ops/ms
StringBuilderBenchmark.testStringBuilderAdd  thrpt   20  43400.877 ± 2447.492  ops/ms

From the above test results, the performance of using StringBuilder is indeed better than using string concatenation directly.

2. Split the string

Test code:

@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StringSplitBenchmark {
	
    private static final String regex = "\\.";
	
    private static final char CHAR = '.';
    
    private static final Pattern pattern = Pattern.compile(regex);
	
    private String[] strings;
	
    @Setup
    public void prepare() {
        strings = new String[20];
        for(int i=0;i<strings.length;i++) {
            strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd" + Math.random();
        }
    }
	
    @Benchmark
    public void testStringSplit() {
        for(int i=0;i<strings.length;i++) {
            strings[i].split(regex);
        }
    }
	
    @Benchmark
    public void testPatternSplit() {
        for(int i=0;i<strings.length;i++) {
            pattern.split(strings[i]);
        }
    }
	
    @Benchmark
    public void testCharSplit() {
        for(int i=0;i<strings.length;i++) {
            split(strings[i], CHAR, 6);
        }
	
    }
	
    public static List<String> split(final String str, final char separatorChar, int expectParts) {
        if (null == str) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return Collections.emptyList();
        }
        final List<String> list = new ArrayList<String>(expectParts);
        int i = 0;
        int start = 0;
        boolean match = false;
        while (i < len) {
            if (str.charAt(i) == separatorChar) {
                if (match) {
                    list.add(str.substring(start, i));
                    match = false;
                }
                start = ++i;
                continue;
            }
            match = true;
            i++;
        }
        if (match) {
            list.add(str.substring(start, i));
        }
        return list;
    }
	
    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(StringSplitBenchmark.class.getSimpleName())
                .output("./StringSplitBenchmark.log")
                .build();
        new Runner(options).run();
    }
}

Test Results:

Benchmark                               Mode  Cnt    Score     Error   Units
StringSplitBenchmark.testCharSplit     thrpt   20  872.048 ±  63.872  ops/ms
StringSplitBenchmark.testPatternSplit  thrpt   20  534.371 ±  28.275  ops/ms
StringSplitBenchmark.testStringSplit   thrpt   20  814.661 ± 115.653  ops/ms

From the test results, the performance of testCharSplit and testStringSplit is similar, which is different from our expectation. We all know that the String.split method needs to pass in a regular expression, and when using regular expressions, the performance will be higher by using compiled regular expressions, which is not the case here. In that line, I still have to look at the implementation in String.split to find out:

    public String[] split(String regex) {
        return split(regex, 0);
    }
    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if ((
           (regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
           (regex.length() == 2 && regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

It turns out that the String.split method has been optimized, and regular expressions are not used to split strings in all cases we imagined. This also explains why testCharSplit and testStringSplit have similar performance.

3. String replacement

Test code:

@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 3)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(8)
@Fork(2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StringReplaceAllBenchmark {
	
    private static final String EMPTY = "";
	
    private static final String regex = "\\.";
	
    private static final String CHAR = ".";
    private static final Pattern pattern = Pattern.compile(regex);
	
    private String[] strings;
	
    @Setup
    public void prepare() {
        strings = new String[20];
        for (int i = 0; i < strings.length; i++) {
            strings[i] = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random();
        }
    }
	
    @Benchmark
    public void testStringReplaceAll() {
        for (int i = 0; i < strings.length; i++) {
            strings[i].replaceAll(regex, EMPTY);
        }
    }
	
    @Benchmark
    public void testPatternReplaceAll() {
        for (int i = 0; i < strings.length; i++) {
            pattern.matcher(strings[i]).replaceAll(EMPTY);
        }
    }
	
    @Benchmark
    public void testCustomReplaceAll() {
        for (int i = 0; i < strings.length; i++) {
            replaceAll(strings[i], CHAR, EMPTY);
        }
	
    }
	
	
    public static String replaceAll(final String str, final String remove, final String replacement) {
        if (null == str) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return str;
        }
        final StringBuilder res = new StringBuilder(len);
        int offset = 0;
        int index;
        while (true) {
            index = str.indexOf(remove, offset);
            if (index == -1) {
                break;
            }
            res.append(str, offset, index);
            if(null != replacement && replacement.length() >0) {
                res.append(replacement);
            }
            offset = index + remove.length();
        }
        if(offset < len) {
            res.append(str, offset, len);
        }
        return res.toString();
    }
	
    public static void main(String[] args) throws RunnerException {
        String str = System.currentTimeMillis() + ".aaa.bbb.ccc.ddd." + Math.random();
        String str1 = str.replaceAll(regex, EMPTY);
        String str2 = pattern.matcher(str).replaceAll(EMPTY);
        String str3 = replaceAll(str, CHAR, EMPTY);
	
        System.out.println(str1);
        System.out.println(str2);
        System.out.println(str3);
        Options options = new OptionsBuilder()
                .include(StringReplaceAllBenchmark.class.getSimpleName())
                .output("./StringReplaceAllBenchmark.log")
                .build();
        new Runner(options).run();
    }
}

Test Results:

Benchmark                                         Mode  Cnt     Score    Error   Units
StringReplaceAllBenchmark.testCustomReplaceAll   thrpt   20  1167.891 ± 39.699  ops/ms
StringReplaceAllBenchmark.testPatternReplaceAll  thrpt   20   438.079 ±  1.859  ops/ms
StringReplaceAllBenchmark.testStringReplaceAll   thrpt   20   353.060 ± 11.177  ops/ms

Both testPatternReplaceAll and testStringReplaceAll use regular expressions to replace, so the performance is similar. Regular expressions are very convenient and easy to use when dealing with some complex situations, but from a performance point of view, try not to use them if you can.

4. Take desensitization tools as an example to optimize practice

The following code is the case before optimization:

public class DesensitizeUtils {
	
    /**
     * 根据value长度取值(切分)
     * @param value
     * @return
     */
    public static String desensitizeByLengthOld(String value) {
        if (value.length() == 2) {
            value = value.substring(0, 1) + "*";
        } else if (value.length() == 3) {
            value = value.substring(0, 1) + "*" + value.substring(value.length() - 1);
        } else if (value.length() > 3 && value.length() <= 5) {
            value = value.substring(0, 1) + "**" + value.substring(value.length() - 2);
        } else if (value.length() > 5 && value.length() <= 7) {
            value = value.substring(0, 2) + "***" + value.substring(value.length() - 2);
        } else if (value.length() > 7) {
         	  String str = "";
            for(int i=0; i<value.length()-6; i++) {
              str += "*";
            }
            value = value.substring(0, 3) + str + value.substring(value.length() - 3);
        }
        return value;
    }
	
	
    /**
     * 中文名称脱敏策略:
     * 0. 少于等于1个字 直接返回
     * 1. 两个字 隐藏姓
     * 2. 三个及其以上 只保留第一个和最后一个 其他用星号代替
     * @param fullName
     * @return
     */
    public static String desensitizeChineseNameOld(final String fullName) {
        if (StringUtils.isBlank(fullName)) {
            return "";
        }
        if (fullName.length() <= 1) {
            return fullName;
        } else if (fullName.length() == 2) {
            final String name = StringUtils.right(fullName, 1);
            return StringUtils.leftPad(name, StringUtils.length(fullName), "*");
        } else {
            return StringUtils.left(fullName, 1).concat(StringUtils.removeStart(StringUtils.leftPad(StringUtils.right(fullName, 1), StringUtils.length(fullName), "*"), "*"));
        }
    }
	
}

Next, optimize the above code

1. Try to use constants, but also reduce the number of constants

1). Where "*", "**", "***" are used in the above code, use a '*' char constant instead.

public class DesensitizeUtils {
	private static final char DESENSITIZE_CODE = '*';
}

2). Another example is the return "" of 38 lines of code; use return StringUtils.EMPTY; use StringUtils class constants.

if (StringUtils.isBlank(fullName)) {
   return StringUtils.EMPTY;
}

Using constants can avoid frequent instantiation of strings under high concurrency conditions and improve the overall performance of the program.

2. Use local variables to reduce function calls

Propose the acquisition length to avoid repeated acquisitions

if (value.length() == 2) { 
	
} else if (value.length() == 3) {
  
} else if (value.length() > 3 && value.length() <= 5) {
   
} else if (value.length() > 5 && value.length() <= 7) {
   
} else if (value.length() > 7) {
   
}

Optimized:

int length = value.length(); 
if (length == 2) {
           
} else if (length == 3) {
   
} else if (length > 3 && length <= 5) {
   
} else if (length > 5 && length <= 7) {
    
} else if (length > 7) { 
  
}

The optimized code is more concise. If the value.length() method is a very time-consuming operation, it will inevitably cause repeated calls, and the time-consuming multiplication will increase.

3. Attach great importance to third-party libraries

In order to reuse and save costs, we will more or less use class libraries written by others, but we must have a certain understanding of their principles before using them, and choose a reasonable solution based on our actual situation to avoid Step on the pit.

1). String interception method substring

Using the substring method of strings is very convenient to intercept strings, but since strings are immutable types, it returns a new string each time. In the following code, multiple string instances will be generated:

value = value.substring(0, 2) + "***" + value.substring(length - 2);

Use StringBuilder's append(CharSequence s, int start, int end) method to optimize:

public AbstractStringBuilder append(CharSequence s, int start, int end) {
    if (s == null)
        s = "null";
    if ((start < 0) || (start > end) || (end > s.length()))
        throw new IndexOutOfBoundsException(
            "start " + start + ", end " + end + ", s.length() "
            + s.length());
    int len = end - start;
    ensureCapacityInternal(count + len);
    for (int i = start, j = count; i < end; i++, j++)
        value[j] = s.charAt(i);
    count += len;
    return this;
}

This method is not the best solution to copy the string through the for loop. It will be better if the JDK can be further optimized. The optimization method is as follows:

public AbstractStringBuilder append(String s, int start, int end) {
   if (s == null)
    	s = "null";
    if ((start < 0) || (start > end) || (end > s.length()))
        throw new IndexOutOfBoundsException(
            "start " + start + ", end " + end + ", s.length() "
            + s.length());
    int len = end - start;
    ensureCapacityInternal(count + len);
    s.getChars(start, end, value, count); // 这句代替上面的for 循环
    count += len;
    return this;
}

Optimized:

StringBuilder str = new StringBuilder(length);
str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);    

2). There is also the leftPad method used in the above code, which uses recursive calls, and also uses the string substring and concat to generate redundant instances, which is not recommended:

public static String leftPad(final String str, final int size, String padStr) {
        if (str == null) {
            return null;
        }
        if (isEmpty(padStr)) {
            padStr = SPACE;
        }
        final int padLen = padStr.length();
        final int strLen = str.length();
        final int pads = size - strLen;
        if (pads <= 0) {
            return str; // returns original String when possible
        }
        if (padLen == 1 && pads <= PAD_LIMIT) {
            return leftPad(str, size, padStr.charAt(0));
        }

        if (pads == padLen) {
            return padStr.concat(str);
        } else if (pads < padLen) {
            return padStr.substring(0, pads).concat(str);
        } else {
            final char[] padding = new char[pads];
            final char[] padChars = padStr.toCharArray();
            for (int i = 0; i < pads; i++) {
                padding[i] = padChars[i % padLen];
            }
            return new String(padding).concat(str);
        }
    }

4. Use of StringBuilder

1). Through the above test, try to use StringBuilder instead of using "+" to splicing strings, and I won't repeat them here.

2). Try to set capacity for StringBuilder

In the case of predictable string length, try to set the capacity size for StringBuilder. If the string length is smaller than the default capacity, the memory allocation can be reduced. If the string length is larger than the default capacity, the expansion performance of StringBuilder's internal char array can be reduced. loss.

3). There are many append methods of StringBuilder, it is best to have a deep understanding of the purpose of each method, such as the use of public AbstractStringBuilder append(String str, int start, int end) mentioned above instead of the substring method.

5. The optimized code is as follows:


public class DesensitizeUtils {
	
        private static final char DESENSITIZE_CODE = '*';

    /**
     * 根据value长度取值(切分)
     *
     * @param value
     * @return 返回值长度等于入参长度
     */
    public static String desensitizeByLength(String value) {
        if (StringUtils.isBlank(value)) {
            return StringUtils.EMPTY;
        }
        int length = value.length();
        if (length == 1) {
            return value;
        }
        StringBuilder str = new StringBuilder(length);
        switch (length) {
            case 2:
                str.append(value, 0, 1).append(DESENSITIZE_CODE);
                break;
            case 3:
                str.append(value, 0, 1).append(DESENSITIZE_CODE).append(value, length - 1, length);
                break;
            case 4:
            case 5:
                str.append(value, 0, 1).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
                break;
            case 6:
            case 7:
                str.append(value, 0, 2).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(DESENSITIZE_CODE).append(value, length - 2, length);
                break;
            default:
                str.append(value, 0, 3);
                for (int i = 0; i < length - 6; i++) {
                    str.append(DESENSITIZE_CODE);
                }
                str.append(value, length - 3, length);
                break;
        }
        return str.toString();
    }


    /**
     * 中文名称脱敏策略:
     * 0. 少于等于1个字 直接返回
     * 1. 两个字 隐藏姓
     * 2. 三个及其以上 只保留第一个和最后一个 其他用星号代替
     *
     * @param fullName
     * @return
     */
    public static String desensitizeChineseName(final String fullName) {
        if (StringUtils.isBlank(fullName)) {
            return StringUtils.EMPTY;
        }
        int length = fullName.length();
        switch (length) {
            case 1:
                return fullName;
            case 2:
                StringBuilder str = new StringBuilder(2);
                return str.append(DESENSITIZE_CODE).append(fullName, length - 1, length).toString();
            default:
                str = new StringBuilder(length);
                str.append(fullName, 0, 1);
                for (int i = 0; i < length - 2; i++) {
                    str.append(DESENSITIZE_CODE);
                }
                str.append(fullName, length - 1, length);
                return str.toString();
        }
    }
}

6. Performance comparison:

Test code:

private static final String testString = "akkadmmajkkakkajjk";
    @Benchmark
    public void testDesensitizeByLengthOld() {
        desensitizeByLengthOld(testString);
    }

    @Benchmark
    public void testDesensitizeChineseNameOld() {
        desensitizeChineseNameOld(testString);
    }

    @Benchmark
    public void testDesensitizeByLength() {
        desensitizeByLength(testString);
    }

    @Benchmark
    public void testDesensitizeChineseName() {
        desensitizeChineseName(testString);
    }


    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(DesensitizeUtilsBenchmark.class.getSimpleName())
                .output("./DesensitizeUtilsBenchmark.log")
                .build();
        new Runner(options).run();
    }

Test Results:

Benchmark                                                 Mode  Cnt       Score      Error   Units
DesensitizeUtilsBenchmark.testDesensitizeByLength        thrpt   20   61460.601 ± 7262.830  ops/ms
DesensitizeUtilsBenchmark.testDesensitizeByLengthOld     thrpt   20   11700.417 ± 1402.169  ops/ms
DesensitizeUtilsBenchmark.testDesensitizeChineseName     thrpt   20  117560.449 ±  731.851  ops/ms
DesensitizeUtilsBenchmark.testDesensitizeChineseNameOld  thrpt   20   39682.513 ±  463.306  ops/ms

The above test cases are relatively few and cannot cover all situations, and the existing Benchmark tool cannot see the impact on GC before and after code optimization. Here are some ideas for reference.

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324040824&siteId=291194637