【高并发系列】24、JMH性能测试那些事儿

JMH官网

JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.

Java微基准测试框架JMH(Java Microbenchmark Harness),OpenJDK项目中发布的专用于性能测试的框架;

官网提供了一系列的代码示例,清单链接 -> JMH Samples

1、引入JMH支持的jar包:到官网下载,或者使用Maven引入依赖包;

<!-- JMH -->
<dependency>
	<groupId>org.openjdk.jmh</groupId>
	<artifactId>jmh-core</artifactId>
	<version>1.21</version>
</dependency>
<dependency>
	<groupId>org.openjdk.jmh</groupId>
	<artifactId>jmh-generator-annprocess</artifactId>
	<version>1.21</version>
	<scope>provided</scope>
</dependency>

2、官网第一个代码示例:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class JMHSample_01_HelloWorld {
    @Benchmark
    public void wellHelloThere() {
        // this method was intentionally left blank.
    }
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(JMHSample_01_HelloWorld.class.getSimpleName())
            .forks(1)
            .build();
        new Runner(opt).run();
    }
}

其中wellHelloThere()方法作为基准,是被度量的代码;

如果直接运行,会报如下错误:

Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to find the resource: /META-INF/BenchmarkList
	at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
	at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:122)
	at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:263)
	at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
	at com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.main(JMHSample_01_HelloWorld.java:25)

3、安装maven插件并配置

JMH框架在测试开始前,根据用户的测试用例,通过Java APT机制生成真正的测试代码,所以需要通过Eclipse Marketplace安装m2e-apt插件;

安装成功后,设置APT模式为自动配置:

这样就可以测试了,如果还是报上面的错,在Eclipse里执行如下命令,再测试即可;

mvn clean package

执行结果如下:

# JMH version: 1.20
# VM version: JDK 1.8.0_131, VM 25.131-b11
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.wellHelloThere

# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
# Warmup Iteration   1: ≈ 10⁻³ us/op
# Warmup Iteration   2: ≈ 10⁻³ us/op
...
# Warmup Iteration  19: ≈ 10⁻³ us/op
# Warmup Iteration  20: ≈ 10⁻³ us/op
Iteration   1: ≈ 10⁻³ us/op
Iteration   2: ≈ 10⁻³ us/op
...
Iteration  19: ≈ 10⁻³ us/op
Iteration  20: ≈ 10⁻³ us/op

Result "com.freedom.chapter03.jmh.JMHSample_01_HelloWorld.wellHelloThere":
  ≈ 10⁻³ us/op

# Run complete. Total time: 00:00:40

Benchmark                               Mode  Cnt   Score    Error  Units
JMHSample_01_HelloWorld.wellHelloThere  avgt   20  ≈ 10⁻³           us/op

测试报告先给出了本次测试的基本信息:JMH版本、JDK版本、预热迭代次数及间隔、测量代码迭代次数及间隔、超时时间、线程信息、基准模式、基准方法等信息;

然后是预热迭代的结果,预热的目的是让Java虚拟机对被测代码得到充分的JIT编译和优化,但不会作为最终的统计结果;

然后是的是每一次迭代结果,显示一个操作所花费的时间,即被测试代码执行速率;

最后是本次测试平均花费时间,为10⁻³us;

4、JMH基本概念

4.1 Mode - 模式

/**
 * Benchmark mode.
 */
public enum Mode {
    // 整体吞吐量,单位时间内可执行多少次调用
    Throughput("thrpt", "Throughput, ops/time"),
    // 平均时间
    AverageTime("avgt", "Average time, time/op"),
    // 随机取样
    SampleTime("sample", "Sampling time"),
    // 只运行一次,用于测试冷启动时性能
    SingleShotTime("ss", "Single shot invocation time"),
    // Throughput、AverageTime、SampleTime依次执行
    All("all", "All benchmark modes");
    ...
}

JMH中,吞吐量和平均时间是最为常用的模式,下面官网第2个示例;

4.11 吞吐量示例

@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public void measureThroughput() throws InterruptedException {
    TimeUnit.MILLISECONDS.sleep(100);
}

结果,表示measureThroughput()方每秒可以执行约9.807次;

Benchmark                                       Mode  Cnt  Score   Error  Units
JMHSample_02_BenchmarkModes.measureThroughput  thrpt   20  9.807 ± 0.040  ops/s

 4.12 平均时间示例

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureAvgTime() throws InterruptedException {
    TimeUnit.MILLISECONDS.sleep(100);
}

结果,显示measureAvgTime()方法每个操作需要约102毫秒;

Benchmark                                   Mode  Cnt       Score     Error  Units
JMHSample_02_BenchmarkModes.measureAvgTime  avgt   20  102056.917 ± 572.546  us/op

4.13 随机取样示例

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureSamples() throws InterruptedException {
    TimeUnit.MILLISECONDS.sleep(100);
}

结果,表示measureSamples()方法中,平均执行时间是101852.119微秒;其中50%调用在100794.368微秒内完成,95%调用在104988.672微秒内完成,全部的采样调用均在106692.608微秒内完成;

Benchmark                                                            Mode  Cnt       Score     Error  Units
JMHSample_02_BenchmarkModes.measureSamples                         sample  200  101852.119 ± 432.919  us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.00    sample       100007.936            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.50    sample       100794.368            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.90    sample       104726.528            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.95    sample       104988.672            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.99    sample       106157.834            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.999   sample       106692.608            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.9999  sample       106692.608            us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p1.00    sample       106692.608            us/op

4.2 Iteration 迭代

迭代是JMH的依次测量单位;大部分模式下,1s迭代1次;

4.3 Warmup 预热

Java虚拟机的JIT的存在,会造成同一个方法在JIT编译前后的执行时间不同的情况;

4.4 State 状态

指定一个对象的作用范围:

  • 线程范围 Thread:一个对象只会被一个线程访问;
  • 基准测试范围 Benchmark:多个线程共享一个实例对象;

官方第3个示例,分别声明Thread和Benchmark级别的模型,然后访问;

public class JMHSample_03_States {
    @State(Scope.Benchmark)
    public static class BenchmarkState {
        volatile double x = Math.PI;
    }
    @State(Scope.Thread)
    public static class ThreadState {
        volatile double x = Math.PI;
    }
    @Benchmark
    public void measureUnshared(ThreadState state) {
        state.x++;
    }
    @Benchmark
    public void measureShared(BenchmarkState state) {
        state.x++;
    }
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_03_States.class.getSimpleName())
                .threads(4)
                .forks(1)
                .build();
        new Runner(opt).run();
    }
}

对于measureUnshared()方法,每个不同的测试线程都有自己的数据复制,而对于measureShared()方法,所有测试线程共享一份数据,测试结果不同,如下:

Benchmark                             Mode  Cnt          Score         Error  Units
JMHSample_03_States.measureShared    thrpt   20   51055114.592 ±  510090.663  ops/s
JMHSample_03_States.measureUnshared  thrpt   20  302956301.034 ± 1267510.555  ops/s

4.5 Options/OptionsBuilder 配置

测试前指定一些参数,比如指定测试类(include)、使用线程个数(fork)、预热迭代次数(warmupIterations)等;

5、HashMap、Collections.synchronizedMap(new HashMap())和ConcurrentHashMap的JMH性能测试

static Map<String, String> hashMap = new HashMap<>();
static Map<String, String> syncHashMap = Collections.synchronizedMap(new HashMap<>());
static Map<String, String> concurrentHashMap = new ConcurrentHashMap<>();
@Setup
public void setup() {
    for (int i = 0; i < 10000; i++) {
        hashMap.put(String.valueOf(i), String.valueOf(i));
        syncHashMap.put(String.valueOf(i), String.valueOf(i));
        concurrentHashMap.put(String.valueOf(i), String.valueOf(i));
    }
}
@Benchmark
public void hashMapGet() {
    hashMap.get("4");
}
@Benchmark
public void syncHashMapGet() {
    syncHashMap.get("4");
}
@Benchmark
public void concurrentHashMapGet() {
    concurrentHashMap.get("4");
}
@Benchmark
public void hashMapSize() {
    hashMap.size();
}
@Benchmark
public void syncHashMapSize() {
    syncHashMap.size();
}
@Benchmark
public void concurrentHashMapSize() {
    concurrentHashMap.size();
}

JDK8单线程结果:

Benchmark                       Mode  Cnt     Score   Error   Units
JMH_Map.concurrentHashMapGet   thrpt   20   118.105 ± 0.324  ops/us
JMH_Map.concurrentHashMapSize  thrpt   20   877.783 ± 2.223  ops/us
JMH_Map.hashMapGet             thrpt   20   161.768 ± 0.361  ops/us
JMH_Map.hashMapSize            thrpt   20  1534.111 ± 8.070  ops/us
JMH_Map.syncHashMapGet         thrpt   20    39.468 ± 0.202  ops/us
JMH_Map.syncHashMapSize        thrpt   20    39.047 ± 0.069  ops/us

JDK8两个线程结果:

Benchmark                       Mode  Cnt     Score     Error   Units
JMH_Map.concurrentHashMapGet   thrpt   20   239.697 ±   4.031  ops/us
JMH_Map.concurrentHashMapSize  thrpt   20  1700.468 ±  53.626  ops/us
JMH_Map.hashMapGet             thrpt   20   300.296 ±   6.823  ops/us
JMH_Map.hashMapSize            thrpt   20  2879.248 ± 187.412  ops/us
JMH_Map.syncHashMapGet         thrpt   20    16.074 ±   0.343  ops/us
JMH_Map.syncHashMapSize        thrpt   20    17.348 ±   0.161  ops/us

使用两个线程,一般来说,吞吐量可以增加一倍;

  • hashMap完全不关心线程安全的实现,几乎可以等比增加吞吐量;
  • concurrentHashMap也基本翻倍;
  • syncHashMap,由于引入线程竞争,性能反而下降;

6、CopyOnWriteArrayList和ConcurrentLinkedQueue的JMH性能测试

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class JMH_List {
	CopyOnWriteArrayList<Object> smallCopyOnWriteList = new CopyOnWriteArrayList<>();
	ConcurrentLinkedQueue<Object> smallConcurrentList = new ConcurrentLinkedQueue<>();
	CopyOnWriteArrayList<Object> bigCopyOnWriteList = new CopyOnWriteArrayList<>();
	ConcurrentLinkedQueue<Object> bigConcurrentList = new ConcurrentLinkedQueue<>();
	@Setup
	public void setup() {
		for (int i = 0; i < 10; i++) {
			smallCopyOnWriteList.add(new Object());
			smallConcurrentList.add(new Object());
		}
		for (int i = 0; i < 1000; i++) {
			bigCopyOnWriteList.add(new Object());
			bigConcurrentList.add(new Object());
		}
	}
	@Benchmark
	public void copyOnWriteGet() {
		smallCopyOnWriteList.get(0);
	}
	@Benchmark
	public void copyOnWriteSize() {
		smallCopyOnWriteList.size();
	}
	@Benchmark
	public void concurrentListGet() {
		smallConcurrentList.peek();
	}
	@Benchmark
	public void concurrentListSize() {
		smallConcurrentList.size();
	}
	@Benchmark
	public void smallCopyOnWriteWrite() {
		smallCopyOnWriteList.add(new Object());
		smallCopyOnWriteList.remove(0);
	}
	@Benchmark
	public void smallConcurrentListWrite() {
		smallConcurrentList.add(new Object());
		smallConcurrentList.remove(0);
	}
	@Benchmark
	public void bigCopyOnWriteWrite() {
		bigCopyOnWriteList.add(new Object());
		bigCopyOnWriteList.remove(0);
	}
	@Benchmark
	public void bigConcurrentListWrite() {
		bigConcurrentList.add(new Object());
		bigConcurrentList.remove(0);
	}
	public static void main(String[] args) throws RunnerException {
		Options opt = new OptionsBuilder()
			.include(JMH_List.class.getSimpleName())
			.threads(4)
			.forks(1)
			.build();
		new Runner(opt).run();
	}
}
Benchmark                           Mode  Cnt     Score     Error   Units
JMH_List.bigConcurrentListWrite    thrpt   20     0.002 ±   0.001  ops/us
JMH_List.bigCopyOnWriteWrite       thrpt   20     0.565 ±   0.035  ops/us
JMH_List.concurrentListGet         thrpt   20  1499.011 ± 181.468  ops/us
JMH_List.concurrentListSize        thrpt   20   162.075 ±   2.613  ops/us
JMH_List.copyOnWriteGet            thrpt   20  1743.510 ±   9.192  ops/us
JMH_List.copyOnWriteSize           thrpt   20  2349.314 ± 179.757  ops/us
JMH_List.smallConcurrentListWrite  thrpt   20     0.002 ±   0.001  ops/us
JMH_List.smallCopyOnWriteWrite     thrpt   20     4.831 ±   0.177  ops/us
  • CopyOnWriteArrayList通过写复制来提升并发能力;
  • ConcurrentLinkedQueue通过CAS操作和锁分离来提高系统性能;

可以由结果看到,写的性能远远低于读的性能;

对于写性能,当CopyOnWriteArrayList内部有1000个元素时,由于复制的成本,写性能要远远低于只包含少数元素的list,但性能依然优于ConcurrentLinkedQueue;

对于读性能,进行只读不写的get操作,两者性能都不错;由于实现上的差异,ConcurrentLinkedQueue的size操作明显慢于CopyOnWriteArrayList的;

结论:在高并发场景下,即使有少许的写入,当元素总量不大时,在绝大多数场景中,CopyOnWriteArrayList类要由于ConcurrentLinkedQueue类的;

猜你喜欢

转载自blog.csdn.net/hellboy0621/article/details/87888820