性能有坑 | 慎用 Java 8 ConcurrentHashMap 的 computeIfAbsent

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

前言

我们先看一段代码,代码中使用 Map 的时候,有可能会这么写:

Map<String, Value> map;
// ...
Value result = map.get(key);
if (null == result) {
	result = this.calculateValue(key);
	map.put(key, result);
}
return result;
复制代码

Java 8 的 java.util.Map 里面有个方法 computeIfAbsent,能够简化以上代码:

Map<String, Value> map;
// ...
return map.computeIfAbsent(key, this::calculateValue);
复制代码

以上这种写法除了简洁,如果使用的是 java.util.concurrent.ConcurrentHashMap,还能够在并发调用的情况下确保 calculateValue 方法不会被重复调用,保证原子性。

不过,前段时间对 Apache ShardingSphere-Proxy 做压测时遇到一个问题,当 BenchmarkSQL 连接 ShardingSphere Proxy 的 Terminal 数量比较高时,其中一条很简单的插入 SQL 执行延迟增加了很多。借助 Async Profiler 发现 Java 8 ConcurrentHashMap 的 computeIfAbsent 在性能上有坑。

不了解 Apache ShardingSphere 的读者可以参考 github.com/apache/shar…

排查

考虑到当时的压测的现象是 BenchmarkSQL 并发数(Terminals)越高,New Order 业务中一条简单且重复执行的 insert SQL 执行延时越长。但是 ShardingSphere-Proxy 的所在机器的 CPU 也没有压满,考虑是不是 Proxy 代码层面存在瓶颈,于是借助 async-profiler 对压测状态下的 Proxy JVM 采样。

./profiler.sh -e lock --lock 1ms -d 180 -o jfr -f output.jfr $PID
复制代码

关于 async-profiler 可以参考 github.com/jvm-profili…,后续我也考虑写一些相关文章。

使用 IDEA 读取采样获得的 jfr 文件,看到 Java Monitor Blocked 事件居然有三百多万次! 在这里插入图片描述 根据堆栈,找到 ShardingSphere 这段使用了 computeIfAbsent 代码,以下为节选:

    // ...
    private static final Map<String, SQLExecutionUnitBuilder> TYPE_TO_BUILDER_MAP = new ConcurrentHashMap<>(8, 1);
    // ...
    public DriverExecutionPrepareEngine(final String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ?, ?> executorDriverManager, 
                                        final StorageResourceOption option, final Collection<ShardingSphereRule> rules) {
        super(maxConnectionsSizePerQuery, rules);
        this.executorDriverManager = executorDriverManager;
        this.option = option;
        sqlExecutionUnitBuilder = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, 
        	key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
    }
    // ...
复制代码

github.com/apache/shar…

以上这段代码在每一次 Proxy 与数据库交互前都会执行,即通过 Proxy 执行 CRUD 操作的必经之路,而且里面的 type 目前只有 2 种,分别是 JDBC.STATEMENTJDBC.PREPARED_STATEMENT,所以在高并发的情况下会有大量的线程调用同一个 key 的 computeIfAbsent

扫描二维码关注公众号,回复: 14122850 查看本文章

我的理解是,如果在 key 存在的情况下,computeIfAbsent 操作就不存在修改的情况了,直接 get 出来就好,那事实如何? 看一下 computeIfAbsent 方法的实现(JDK 是 Oracle 8u311),节选代码并加了一些注释:

    public V computeIfAbsent(K key, Function<? super K, ? extends V> mappingFunction) {
        if (key == null || mappingFunction == null)
            throw new NullPointerException();
        int h = spread(key.hashCode());
        V val = null;
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
            	// Map 初始化
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & h)) == null) {
            	// key 不存在且 hash 对应的位置还没有东西
                Node<K,V> r = new ReservationNode<K,V>();
                synchronized (r) {
                	// 初始化 hash 对应的位置,放入 kv 等操作
                }
            }
            else if ((fh = f.hash) == MOVED)
            	// Map 正忙着扩容
                tab = helpTransfer(tab, f);
            else {
                // key 的 hash 对应的位置已经存在链表或红黑树
                boolean added = false;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                        	// 去链表里面找 key
                        }
                        else if (f instanceof TreeBin) {
                        	// 去红黑树里面找 key
                        }
                    }
                }
                // 省略部分代码
            }
        }
        // 省略部分代码
        return val;
    }
复制代码

根据我对源码的理解,即使 key 存在,computeIfAbsent 去找 key 的时候,都会进入 synchronized 代码。 那这相比 ConcurrentHashMap 不加锁的 get 操作不就影响性能了吗?Google 一下相应的话题,发现了一些内容: bugs.openjdk.java.net/browse/JDK-… 这个问题早就有人提过了,也在 JDK 9 处理了。截至本文编写 JDK 17 已经正式发布了。

解决

在目前 JDK 8 仍然盛行的环境下,我们有必要考虑如何避免上面的问题,于是相应的处理方法就诞生了:github.com/apache/shar…

SQLExecutionUnitBuilder result;
if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) {
    result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
return result;
复制代码

github.com/apache/shar…

每次从 Map 中获取 value 前,都先用 get 做一次检查,value 不存在才使用 computeIfAbsent 放入 value。由于 ConcurrentHashMapcomputeIfAbsent 可以保证操作原子性,这里也不需要自己加 synchronized 或者做多重检查之类的操作。

问题解决~

附:JMH 测试

测试环境

在这里插入图片描述

测试代码

package icu.wwj.jmh.dangling;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Fork(3)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 3, time = 5)
@Threads(16)
@State(Scope.Benchmark)
public class ConcurrentHashMapBenchmark {
    
    private static final String KEY = "key";
    
    private static final Object VALUE = new Object();
    
    private final Map<String, Object> concurrentMap = new ConcurrentHashMap<>(1, 1);
    
    @Setup(Level.Iteration)
    public void setup() {
        concurrentMap.clear();
    }
    
    @Benchmark
    public Object benchGetBeforeComputeIfAbsent() {
        Object result = concurrentMap.get(KEY);
        if (null == result) {
            result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
        }
        return result;
    }
    
    @Benchmark
    public Object benchComputeIfAbsent() {
        return concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
    }
}
复制代码

JDK 8 测试结果

# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent

# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration   1: 11173878.242 ops/s
# Warmup Iteration   2: 8471364.065 ops/s
# Warmup Iteration   3: 8766401.960 ops/s
Iteration   1: 8776260.796 ops/s
Iteration   2: 8632907.974 ops/s
Iteration   3: 8557264.788 ops/s

# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration   1: 7757506.431 ops/s
# Warmup Iteration   2: 8176991.807 ops/s
# Warmup Iteration   3: 8795107.589 ops/s
Iteration   1: 8668883.337 ops/s
Iteration   2: 8866318.073 ops/s
Iteration   3: 8848517.540 ops/s

# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration   1: 8154698.571 ops/s
# Warmup Iteration   2: 8317945.491 ops/s
# Warmup Iteration   3: 8884286.732 ops/s
Iteration   1: 8912555.062 ops/s
Iteration   2: 8894750.001 ops/s
Iteration   3: 8780504.227 ops/s


Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent":
  8770884.644 ±(99.9%) 210678.797 ops/s [Average]
  (min, avg, max) = (8557264.788, 8770884.644, 8912555.062), stdev = 125371.573
  CI (99.9%): [8560205.847, 8981563.442] (assumes normal distribution)


# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent

# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration   1: 1881091972.510 ops/s
# Warmup Iteration   2: 1843432746.197 ops/s
# Warmup Iteration   3: 2353506882.860 ops/s
Iteration   1: 2389458285.091 ops/s
Iteration   2: 2391001171.657 ops/s
Iteration   3: 2387181602.010 ops/s

# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration   1: 1872514017.315 ops/s
# Warmup Iteration   2: 1855584197.510 ops/s
# Warmup Iteration   3: 2342392977.207 ops/s
Iteration   1: 2378551289.692 ops/s
Iteration   2: 2374081014.168 ops/s
Iteration   3: 2389909613.865 ops/s

# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration   1: 1880210774.729 ops/s
# Warmup Iteration   2: 1804266170.900 ops/s
# Warmup Iteration   3: 2337740394.373 ops/s
Iteration   1: 2363741084.192 ops/s
Iteration   2: 2372565304.724 ops/s
Iteration   3: 2388015878.515 ops/s


Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
  2381611693.768 ±(99.9%) 16356182.057 ops/s [Average]
  (min, avg, max) = (2363741084.192, 2381611693.768, 2391001171.657), stdev = 9733301.586
  CI (99.9%): [2365255511.711, 2397967875.825] (assumes normal distribution)


# Run complete. Total time: 00:03:03

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                  Mode  Cnt           Score          Error  Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent           thrpt    9     8770884.644 ±   210678.797  ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent  thrpt    9  2381611693.768 ± 16356182.057  ops/s
复制代码

可以看到,两种方式在性能上相差了很多个数量级,直接调用 computeIfAbsent 的性能是每秒百万级,先调用 get 做检查的性能是每秒十亿级,而且这仅仅是 16 线程的测试。 在资源方面,benchComputeIfAbsent 测试期间 CPU 利用率一直维持在 20% 左右;而 benchGetBeforeComputeIfAbsent 测试期间的 CPU 利用率一直 100%。

JDK 17 测试结果

# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent

# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration   1: 1544327446.565 ops/s
# Warmup Iteration   2: 1475077923.449 ops/s
# Warmup Iteration   3: 1565544222.606 ops/s
Iteration   1: 1564346089.698 ops/s
Iteration   2: 1560062375.891 ops/s
Iteration   3: 1552569020.412 ops/s

# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration   1: 1617143507.004 ops/s
# Warmup Iteration   2: 1433136907.916 ops/s
# Warmup Iteration   3: 1527623176.866 ops/s
Iteration   1: 1522331660.180 ops/s
Iteration   2: 1524798683.186 ops/s
Iteration   3: 1522686827.744 ops/s

# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration   1: 1671732222.173 ops/s
# Warmup Iteration   2: 1462966231.429 ops/s
# Warmup Iteration   3: 1553792663.545 ops/s
Iteration   1: 1549840468.944 ops/s
Iteration   2: 1549245571.349 ops/s
Iteration   3: 1554801575.735 ops/s


Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent":
  1544520252.571 ±(99.9%) 27953594.118 ops/s [Average]
  (min, avg, max) = (1522331660.180, 1544520252.571, 1564346089.698), stdev = 16634735.479
  CI (99.9%): [1516566658.453, 1572473846.689] (assumes normal distribution)


# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration的
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent

# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration   1: 1813078468.960 ops/s
# Warmup Iteration   2: 1944438216.902 ops/s
# Warmup Iteration   3: 2232703681.960 ops/s
Iteration   1: 2233727123.664 ops/s
Iteration   2: 2233657163.983 ops/s
Iteration   3: 2229008772.953 ops/s

# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration   1: 1767187585.805 ops/s
# Warmup Iteration   2: 1900420998.518 ops/s
# Warmup Iteration   3: 2175122268.840 ops/s
Iteration   1: 2180409680.029 ops/s
Iteration   2: 2181398523.091 ops/s
Iteration   3: 2176454597.329 ops/s

# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration   1: 1822355551.990 ops/s
# Warmup Iteration   2: 1832618832.110 ops/s
# Warmup Iteration   3: 2225265888.631 ops/s
Iteration   1: 2240765668.888 ops/s
Iteration   2: 2225847700.599 ops/s
Iteration   3: 2232257415.965 ops/s


Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
  2214836294.056 ±(99.9%) 45190341.578 ops/s [Average]
  (min, avg, max) = (2176454597.329, 2214836294.056, 2240765668.888), stdev = 26892047.412
  CI (99.9%): [2169645952.478, 2260026635.633] (assumes normal distribution)


# Run complete. Total time: 00:03:03

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                  Mode  Cnt           Score          Error  Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent           thrpt    9  1544520252.571 ± 27953594.118  ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent  thrpt    9  2214836294.056 ± 45190341.578  ops/s
复制代码

JDK 17 测试结果看来,computeIfAbsent 的性能相比先 get 稍微低一些,但性能至少在同一个数量级上了。而且两个用例运行期间 CPU 都是满载的。

总结

  • 如果在 Java 8 的环境下使用 ConcurrentHashMap,一定要注意是否会并发对同一个 key 调用 computeIfAbsent,如果存在需要先尝试调用 get
Object result = concurrentMap.get(KEY);
if (null == result) {
    result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
}
return result;
复制代码
  • 或者干脆升级到 Java 11 或 Java 17。

猜你喜欢

转载自juejin.im/post/7094561581631012878