foreword
As we all know, code is the core of a project, and a small piece of code may affect the experience of the entire project. A project from 0 to 1, from growth to maturity, is inseparable from the careful polishing of the code. Details determine success or failure, and this is exactly what an excellent open source project is. This article will take the performance improvement of ShardingSphere 5.1.0 as an example to show you the ultimate experience brought by the details of the code, and how to make a leap in the code.
Wu Weijie, SphereEx Infrastructure R&D Engineer, Apache ShardingSphere Committer. Currently focusing on the research and development of Apache ShardingSphere and its sub-project ElasticJob.
Optimize content
Correct how Optional is used
Introduced in Java 8, it can make code more elegant, such as avoiding direct method returns . There are two more commonly used methods: java.util.Optional
null
Optional
public T orElse(T other) {
return value != null ? value : other;
}
public T orElseGet(Supplier<? extends T> other) {
return value != null ? value : other.get();
}
There is such a piece of code used in the ShardingSphere class : org.apache.shardingsphere.infra.binder.segment.select.orderby.engine.OrderByContextEngine
Optional
Optional<OrderByContext> result = // 省略代码...
return result.orElse(getDefaultOrderByContextWithoutOrderBy(groupByContext));
In the above writing method, even if the result of result is not empty, the method inside will be called, especially when the method inside involves modification operations, unexpected things may happen. In the case of method calls, it should be adjusted to the following writing: orElse
orElse
orElse
Optional<OrderByContext> result = // 省略代码...
return result.orElseGet(() -> getDefaultOrderByContextWithoutOrderBy(groupByContext));
Use a lambda to provide one so that the method inside will only be called if result is empty . Supplier
orElseGet
orElseGet
相关 PR:https://github.com/apache/shardingsphere/pull/11459/files
Avoid high-frequency concurrent calls to computeIfAbsent of Java 8 ConcurrentHashMap
java.util.concurrent.ConcurrentHashMap
It is a kind of Map that we commonly use in concurrent scenarios. Compared with all operations , it provides better performance while ensuring thread safety. However, in the implementation of Java 8, the value will still be obtained in the code block when the key exists, which greatly affects the concurrency performance in the case of frequent calls to the same key . synchronized
java.util.Hashtable
ConcurrentHashMap
ConcurrentHashMap
computeIfAbsent
synchronized
computeIfAbsent
Reference: https://bugs.openjdk.java.net/browse/JDK-8161372
This problem was solved in Java 9, but in order to ensure concurrent performance on Java 8, we adjusted the writing method in ShardingSphere code to avoid this problem.
Take a frequently called class of ShardingSphere as an example: org.apache.shardingsphere.infra.executor.sql.prepare.driver.DriverExecutionPrepareEngine
// 省略部分代码...
private static final Map<String, SQLExecutionUnitBuilder> TYPETOBUILDERMAP = new ConcurrentHashMap<>(8, 1);
// 省略部分代码...
public DriverExecutionPrepareEngine(final <span class="hljs-builtin" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(170, 87, 60); word-wrap: inherit !important; word-break: inherit !important;">String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ?, ?> executorDriverManager,
final StorageResourceOption option, final Collection<ShardingSphereRule> rules) {
super(maxConnectionsSizePerQuery, rules);
this.executorDriverManager = executorDriverManager;
this.option = option;
sqlExecutionUnitBuilder = TYPETOBUILDER_MAP.computeIfAbsent(type,
key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
There are only two types of the above code , and this code is the only way for most SQL executions, which means that methods will be called concurrently and frequently on the same key , resulting in limited concurrency performance. We circumvent this problem as follows: computeIfAbsent
type
computeIfAbsent
SQLExecutionUnitBuilder result;
if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) {
result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
return result;
相关 PR:https://github.com/apache/shardingsphere/pull/13275/files
Avoid frequent calls to java.util.Properties
java.util.Properties
It is a commonly used class in ShardingSphere configuration, Properties
inherited , so it is necessary to avoid methods that are frequently called in concurrent situations . java.util.Hashtable
Properties
We found that there are high-frequency calling logic in the classes related to the data sharding algorithm in ShardingSphere , which results in limited concurrency performance. Our approach is to put the logic involved in method invocation in the method to avoid the concurrent performance of the calculation logic in the sharding algorithm. org.apache.shardingsphere.sharding.algorithm.sharding.inline.InlineShardingAlgorithm
getProperty
Properties
InlineShardingAlgorithm
init
相关 PR:https://github.com/apache/shardingsphere/pull/13282/files
Avoid Collections.synchronizedMap
In the process of checking the Monitor Blocked of ShardingSphere, it is found that Map which is decorated with high frequency read is used in this class , which affects the concurrent performance. After analysis, the modified Map will only have modification operations in the initialization phase, and the subsequent operations are all read operations. We can directly remove the modification method. org.apache.shardingsphere.infra.metadata.schema.model.TableMetaData
Collections.synchronizedMap
Collections.synchronizedMap
相关 PR:https://github.com/apache/shardingsphere/pull/13264/files
String concatenation instead of unnecessary String.format
The class in ShardingSphere has such a piece of logic: org.apache.shardingsphere.sql.parser.sql.common.constant.QuoteCharacter
public String wrap(final String value) {
return String.format("%s%s%s", startDelimiter, value, endDelimiter);
}
Obviously, the above logic is to do a string concatenation, but the overhead of using the method will be greater than that of direct string concatenation. We modify it as follows: String.format
public String wrap(final String value) {
return startDelimiter + value + endDelimiter;
}
We use JMH to do a simple test, the test result:
# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
Benchmark Mode Cnt Score Error Units
StringConcatBenchmark.benchFormat thrpt 9 28490416.644 ± 1377409.528 ops/s
StringConcatBenchmark.benchPlus thrpt 9 163475708.153 ± 1748461.858 ops/s
It can be seen that the overhead of using concatenated strings is greater than that of using concatenated strings, and the performance of direct concatenated strings has been optimized since Java 9. This shows the importance of choosing an appropriate string concatenation method. String.format
+
相关 PR:https://github.com/apache/shardingsphere/pull/11291/files
Use for-each instead of high frequency stream
ShadingSphere 5.x code used more . java.util.stream.Stream
In a performance test of BenchmarkSQL (Java implementation of TPC-C test) that we did to stress test ShardingSphere-JDBC + openGauss, we found that after replacing all high-frequency streams found during stress testing with for-each, ShardingSphere- The performance improvement of JDBC is obvious.
* Note: ShardingSphere-JDBC and openGauss use Bisheng JDK 8 on two 128-core aarch64 machines respectively.
The above test results may also be related to the aarch64 platform and JDK. However, the stream itself has a certain overhead, and the performance varies greatly in different scenarios. For the logic that is called frequently and is not sure that the stream can optimize the performance, we consider using the for-each loop first.
相关 PR:https://github.com/apache/shardingsphere/pull/13845/files
Avoid unnecessary logical (repeated) calls
There are many cases to avoid unnecessary repeated calls of logic:
hashCode calculation
ShardingSphere has a class that implements and methods: org.apache.shardingsphere.sharding.route.engine.condition.Column
equals
hashCode
@RequiredArgsConstructor
@Getter
@ToString
public final class Column {
private final String name;
private final String tableName;
@Override
public boolean equals(final Object obj) {...}
@Override
public int hashCode() {
return Objects.hashCode(name.toUpperCase(), tableName.toUpperCase());
}
}
Obviously, the above class is immutable, but the method calculation is called every time in the implementation of the method . If this object is frequently accessed in Map or Set, there will be a lot of unnecessary computational overhead. hashCode
hashCode
adjusted:
@Getter
@ToString
public final class Column {
private final String name;
private final String tableName;
private final int hashCode;
public Column(final String name, final String tableName) {
this.name = name;
this.tableName = tableName;
hashCode = Objects.hash(name.toUpperCase(), tableName.toUpperCase());
}
@Override
public boolean equals(final Object obj) {...}
@Override
public int hashCode() {
return hashCode;
}
}
相关 PR:https://github.com/apache/shardingsphere/pull/11760/files
Use lambdas instead of reflection to call methods
In the ShardingSphere source code, there are the following scenarios that need to record method and parameter calls, and replay the method calls to the specified object when needed:
-
Send statements such as begin to ShardingSphere-Proxy;
-
Use ShardingSpherePreparedStatement to set parameters for placeholders at specified locations.
Take the following code as an example, before refactoring, use reflection to record method calls and replay them. The reflection method itself has a certain performance overhead, and the code readability is not good:
@Override
public void begin() {
recordMethodInvocation(Connection.class, "setAutoCommit", new Class[]{boolean.class}, new Object[]{false});
}
After refactoring, the overhead of calling the method using reflection is avoided:
@Override
public void begin() {
connection.getConnectionPostProcessors().add(target -> {
try {
target.setAutoCommit(false);
} catch (final SQLException ex) {
throw new RuntimeException(ex);
}
});
}
Related PRs:
https://github.com/apache/shardingsphere/pull/10466/files
https://github.com/apache/shardingsphere/pull/11415/files
Netty Epoll support for aarch64
Netty's Epoll is implemented in a Linux environment that supports the aarch64 architecture. In aarch64 Linux environment, using Netty Epoll API can improve performance compared to Netty NIO API. 4.1.50.Final
Reference: https://stackoverflow.com/a/23465481/7913731
5.1.0 and 5.0.0 ShardingSphere-Proxy TPC-C performance test comparison
We benchmark ShardingSphere-Proxy using TPC-C to verify the performance optimization results. Since earlier versions of ShardingSphere-Proxy have limited support for PostgreSQL, TPC-C testing cannot be performed, so version 5.0.0 and 5.1.0 are used for comparison.
In order to highlight the performance loss of ShardingSphere-Proxy itself, this test will use ShardingSphere-Proxy with data sharding (1 shard) to compare with PostgreSQL 14.2.
The test is performed according to the "BenchmarkSQL Performance Test ( https://shardingsphere.apache.org/document/current/cn/reference/test/performance-test/benchmarksql-test/ )" in the official document, and the configuration is reduced from 4 slices to 1 shard.
test environment
Test parameters
BenchmarkSQL parameters:
- warehouses=192 (data volume)
- terminals=192 (number of concurrency)
- terminalWarehouseFixed=false
- Running time 30 mins
PostgreSQL JDBC parameters:
- defaultRowFetchSize=50
- reWriteBatchedInserts=true
ShardingSphere-Proxy JVM part parameters:
- -Xmx16g
- -Xms16g
- -Xmn12g
- -XX:AutoBoxCacheMax=4096
- -XX:+UseNUMA
- -XX:+DisableExplicitGC
- -XX:LargePageSizeInBytes=128m
- -XX:+SegmentedCodeCache
- -XX:+AggressiveHeap
Test Results
Conclusions obtained in the context and scenarios of this paper:
- Based on ShardingSphere-Proxy 5.0.0 + PostgreSQL, the performance of 5.1.0 is improved by about 26.8%.
- Taking the direct connection to PostgreSQL as the benchmark, ShardingSphere-Proxy 5.1.0 reduces the loss by about 15% compared with 5.0.0, from 42.7% to 27.4%.
Since the code details are optimized throughout each module of ShardingSphere, the above test results do not cover all optimization points.
How to view performance issues
From time to time, someone may ask, "How is the performance of ShardingSphere? How much is the loss?"
In my opinion, the performance can meet the needs. Performance is a complex issue, affected by many factors. In different environments and scenarios, the performance loss of ShardingSphere may be less than 1% or as high as 50%. We cannot give the answer without the environment and scenario. In addition, as an infrastructure, the performance of ShardingSphere is one of the key considerations in the research and development process. Teams and individuals in the ShardingSphere community will continue to exert their craftsmanship to push the performance of ShardingSphere to the extreme.