Dry goods | performance improvement key, the ultimate experience brought by code details

foreword

As we all know, code is the core of a project, and a small piece of code may affect the experience of the entire project. A project from 0 to 1, from growth to maturity, is inseparable from the careful polishing of the code. Details determine success or failure, and this is exactly what an excellent open source project is. This article will take the performance improvement of ShardingSphere 5.1.0 as an example to show you the ultimate experience brought by the details of the code, and how to make a leap in the code.

Wu Weijie, SphereEx Infrastructure R&D Engineer, Apache ShardingSphere Committer. Currently focusing on the research and development of Apache ShardingSphere and its sub-project ElasticJob.

Optimize content

Correct how Optional is used

Introduced in Java 8, it can make code more elegant, such as avoiding direct method returns . There are two more commonly used methods: java.util.Optional  null Optional

public T orElse(T other{
    return value != null ? value : other;
}

public T orElseGet(Supplier<? extends T> other{
    return value != null ? value : other.get();
}

There is such a piece of code used in the ShardingSphere class : org.apache.shardingsphere.infra.binder.segment.select.orderby.engine.OrderByContextEngine Optional

Optional<OrderByContext> result = // 省略代码...
return result.orElse(getDefaultOrderByContextWithoutOrderBy(groupByContext));

In the above writing method, even if the result of result is not empty, the method inside will be called, especially when the method inside involves modification operations, unexpected things may happen. In the case of method calls, it should be adjusted to the following writing: orElse orElse  orElse 

Optional<OrderByContext> result = // 省略代码...
return result.orElseGet(() -> getDefaultOrderByContextWithoutOrderBy(groupByContext));

Use a lambda to provide one so that the method inside will only be called if result is empty . Supplier  orElseGet orElseGet 

相关 PR:https://github.com/apache/shardingsphere/pull/11459/files

Avoid high-frequency concurrent calls to computeIfAbsent of Java 8 ConcurrentHashMap

java.util.concurrent.ConcurrentHashMapIt is a kind of Map that we commonly use in concurrent scenarios. Compared with all operations , it provides better performance while ensuring thread safety. However, in the implementation of Java 8, the value will still be obtained in the code block when the key exists, which greatly affects the concurrency performance in the case of frequent calls to the same key . synchronized  java.util.HashtableConcurrentHashMapConcurrentHashMap  computeIfAbsent  synchronized  computeIfAbsent 

Reference: https://bugs.openjdk.java.net/browse/JDK-8161372

This problem was solved in Java 9, but in order to ensure concurrent performance on Java 8, we adjusted the writing method in ShardingSphere code to avoid this problem.

Take a frequently called class of ShardingSphere as an example: org.apache.shardingsphere.infra.executor.sql.prepare.driver.DriverExecutionPrepareEngine

 // 省略部分代码...
    private static final Map<String, SQLExecutionUnitBuilder> TYPETOBUILDERMAP = new ConcurrentHashMap<>(81);
    // 省略部分代码...
    public DriverExecutionPrepareEngine(final <span class="hljs-builtin" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(170, 87, 60); word-wrap: inherit !important; word-break: inherit !important;">String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ?, ?> executorDriverManager, 
                                        final StorageResourceOption option, final Collection<ShardingSphereRule> rules) {
        super(maxConnectionsSizePerQuery, rules);
        this.executorDriverManager = executorDriverManager;
        this.option = option;
        sqlExecutionUnitBuilder = TYPETOBUILDER_MAP.computeIfAbsent(type, 
                key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
    }

There are only two types of the above code , and this code is the only way for most SQL executions, which means that methods will be called concurrently and frequently on the same key , resulting in limited concurrency performance. We circumvent this problem as follows: computeIfAbsent  type computeIfAbsent 

SQLExecutionUnitBuilder result;
if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) {
    result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
return result;

相关 PR:https://github.com/apache/shardingsphere/pull/13275/files

Avoid frequent calls to java.util.Properties

java.util.Properties It is a commonly used class in ShardingSphere configuration, Properties inherited , so it is necessary to avoid methods that are frequently called in concurrent situations . java.util.Hashtable Properties 

We found that there are high-frequency calling logic in the classes related to the data sharding algorithm in ShardingSphere , which results in limited concurrency performance. Our approach is to put the logic involved in method invocation in the method to avoid the concurrent performance of the calculation logic in the sharding algorithm. org.apache.shardingsphere.sharding.algorithm.sharding.inline.InlineShardingAlgorithm getProperty Properties  InlineShardingAlgorithm init 

相关 PR:https://github.com/apache/shardingsphere/pull/13282/files

Avoid Collections.synchronizedMap

In the process of checking the Monitor Blocked of ShardingSphere, it is found that Map which is decorated with high frequency read is used in this class , which affects the concurrent performance. After analysis, the modified Map will only have modification operations in the initialization phase, and the subsequent operations are all read operations. We can directly remove the modification method. org.apache.shardingsphere.infra.metadata.schema.model.TableMetaData Collections.synchronizedMap Collections.synchronizedMap

相关 PR:https://github.com/apache/shardingsphere/pull/13264/files

String concatenation instead of unnecessary String.format

The class in ShardingSphere has such a piece of logic: org.apache.shardingsphere.sql.parser.sql.common.constant.QuoteCharacter

 public String wrap(final String value{
        return String.format("%s%s%s", startDelimiter, value, endDelimiter);
    }

Obviously, the above logic is to do a string concatenation, but the overhead of using the method will be greater than that of direct string concatenation. We modify it as follows: String.format

public String wrap(final String value{
        return startDelimiter + value + endDelimiter;
    }

We use JMH to do a simple test, the test result:

# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
Benchmark                           Mode  Cnt          Score         Error  Units
StringConcatBenchmark.benchFormat  thrpt    9   28490416.644 ± 1377409.528  ops/s
StringConcatBenchmark.benchPlus    thrpt    9  163475708.153 ± 1748461.858  ops/s

It can be seen that the overhead of using concatenated strings is greater than that of using concatenated strings, and the performance of direct concatenated strings has been optimized since Java 9. This shows the importance of choosing an appropriate string concatenation method. String.format  +

相关 PR:https://github.com/apache/shardingsphere/pull/11291/files

Use for-each instead of high frequency stream

ShadingSphere 5.x code used more . java.util.stream.Stream

In a performance test of BenchmarkSQL (Java implementation of TPC-C test) that we did to stress test ShardingSphere-JDBC + openGauss, we found that after replacing all high-frequency streams found during stress testing with for-each, ShardingSphere- The performance improvement of JDBC is obvious.

* Note: ShardingSphere-JDBC and openGauss use Bisheng JDK 8 on two 128-core aarch64 machines respectively.

The above test results may also be related to the aarch64 platform and JDK. However, the stream itself has a certain overhead, and the performance varies greatly in different scenarios. For the logic that is called frequently and is not sure that the stream can optimize the performance, we consider using the for-each loop first.

相关 PR:https://github.com/apache/shardingsphere/pull/13845/files

Avoid unnecessary logical (repeated) calls

There are many cases to avoid unnecessary repeated calls of logic:

hashCode calculation

ShardingSphere has a class that implements and methods: org.apache.shardingsphere.sharding.route.engine.condition.Column equals  hashCode

@RequiredArgsConstructor
@Getter
@ToString
public final class Column {

    private final String name;

    private final String tableName;

    @Override
    public boolean equals(final Object obj) {...}

    @Override
    public int hashCode() {
        return Objects.hashCode(name.toUpperCase(), tableName.toUpperCase()); 
    } 
}

Obviously, the above class is immutable, but the method calculation is called every time in the implementation of the method . If this object is frequently accessed in Map or Set, there will be a lot of unnecessary computational overhead. hashCode  hashCode

adjusted:

@Getter
@ToString
public final class Column {

    private final String name;

    private final String tableName;

    private final int hashCode;

    public Column(final String name, final String tableName) {
        this.name = name;
        this.tableName = tableName;
        hashCode = Objects.hash(name.toUpperCase(), tableName.toUpperCase());
    }

    @Override
    public boolean equals(final Object obj) {...}

    @Override
    public int hashCode() {
        return hashCode;
    } 
}

相关 PR:https://github.com/apache/shardingsphere/pull/11760/files

Use lambdas instead of reflection to call methods

In the ShardingSphere source code, there are the following scenarios that need to record method and parameter calls, and replay the method calls to the specified object when needed:

  1. Send statements such as begin to ShardingSphere-Proxy;

  2. Use ShardingSpherePreparedStatement to set parameters for placeholders at specified locations.

Take the following code as an example, before refactoring, use reflection to record method calls and replay them. The reflection method itself has a certain performance overhead, and the code readability is not good:

@Override
public void begin() {
    recordMethodInvocation(Connection.class, "setAutoCommit"new Class[]{boolean.class}, new Object[]{false});
}

After refactoring, the overhead of calling the method using reflection is avoided:

@Override
public void begin() {
    connection.getConnectionPostProcessors().add(target -> {
        try {
            target.setAutoCommit(false);
        } catch (final SQLException ex) {
            throw new RuntimeException(ex);
        }
    });
}

Related PRs:

https://github.com/apache/shardingsphere/pull/10466/files

https://github.com/apache/shardingsphere/pull/11415/files

Netty Epoll support for aarch64

Netty's Epoll is implemented in a Linux environment that supports the aarch64 architecture. In aarch64 Linux environment, using Netty Epoll API can improve performance compared to Netty NIO API. 4.1.50.Final 

Reference: https://stackoverflow.com/a/23465481/7913731

5.1.0 and 5.0.0 ShardingSphere-Proxy TPC-C performance test comparison

We benchmark ShardingSphere-Proxy using TPC-C to verify the performance optimization results. Since earlier versions of ShardingSphere-Proxy have limited support for PostgreSQL, TPC-C testing cannot be performed, so version 5.0.0 and 5.1.0 are used for comparison.

In order to highlight the performance loss of ShardingSphere-Proxy itself, this test will use ShardingSphere-Proxy with data sharding (1 shard) to compare with PostgreSQL 14.2.

The test is performed according to the "BenchmarkSQL Performance Test ( https://shardingsphere.apache.org/document/current/cn/reference/test/performance-test/benchmarksql-test/ )" in the official document, and the configuration is reduced from 4 slices to 1 shard.

test environment

Test parameters

BenchmarkSQL parameters:

  • warehouses=192 (data volume)
  • terminals=192 (number of concurrency)
  • terminalWarehouseFixed=false
  • Running time 30 mins

PostgreSQL JDBC parameters:

  • defaultRowFetchSize=50
  • reWriteBatchedInserts=true

ShardingSphere-Proxy JVM part parameters:

  • -Xmx16g
  • -Xms16g
  • -Xmn12g
  • -XX:AutoBoxCacheMax=4096
  • -XX:+UseNUMA
  • -XX:+DisableExplicitGC
  • -XX:LargePageSizeInBytes=128m
  • -XX:+SegmentedCodeCache
  • -XX:+AggressiveHeap

Test Results

Conclusions obtained in the context and scenarios of this paper:

  • Based on ShardingSphere-Proxy 5.0.0 + PostgreSQL, the performance of 5.1.0 is improved by about 26.8%.
  • Taking the direct connection to PostgreSQL as the benchmark, ShardingSphere-Proxy 5.1.0 reduces the loss by about 15% compared with 5.0.0, from 42.7% to 27.4%.

Since the code details are optimized throughout each module of ShardingSphere, the above test results do not cover all optimization points.

How to view performance issues

From time to time, someone may ask, "How is the performance of ShardingSphere? How much is the loss?"

In my opinion, the performance can meet the needs. Performance is a complex issue, affected by many factors. In different environments and scenarios, the performance loss of ShardingSphere may be less than 1% or as high as 50%. We cannot give the answer without the environment and scenario. In addition, as an infrastructure, the performance of ShardingSphere is one of the key considerations in the research and development process. Teams and individuals in the ShardingSphere community will continue to exert their craftsmanship to push the performance of ShardingSphere to the extreme.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5137513/blog/5481410