Performance optimization performance from several test cases three languages (C ++, Java, C #) point of view

       Over time, the current virtual machine technology becomes more mature, in some cases, Java, .Net and other virtual machine-intensive computing performance has been similar to C ++ and, in individual cases, even more outstanding. This paper analyzes the performance of several test cases, to explore the reasons behind the phenomenon.

       Look at two simple test cases. Shown below, are 5000 cycles of continuous operation len = 1000000 memory, calculates the execution time. The left test1, right side test2.

       A similar procedure under 3.0 Preview6 core test in .net.

       Comparing the test results are as follows:

       We can see, for test1, C ++ version is much faster, for test2, C # and C ++ version version performance equivalent to or even slightly faster.

       Why would such phenomenon happen? The following specific analysis:

       Test1 assignment cycle is location-independent, and therefore, can be optimized by the compiler and other parallel SIMD computation instruction, the assignment test2 circulating position dependent compiler difficult to use SIMD parallel computing instructions to optimize the like. Can be guessed from the above results, VC compiler, has been optimized in parallel to test1, and .net core 3.0 preview6 no test1 parallel optimization.

       We validate this speculation. .net core 3.0 provides support for SIMD instructions, the manual Hereinafter test1 parallel optimize test performance:

 

       The result is 0.633s, close to the C ++ version of 0.441s. 2.289s optimized with respect to the front, more than three times the speed.

       The same procedure I use java 8 test results surprise:

 

       test1 consuming 0.654s, and the .net core parallel optimization approximation, we can see the virtual machine jvm have conducted parallel optimization. test2 consuming 1.755s, faster than C ++ version and .net core version, and a huge gap!

 

       显然,jvm对test2这种情况进行了特殊关照。要理解这一现象,就需要对Java虚拟机的机制有深入了解。HotSpot 虚拟机里内置了两个JIT编译器:Client Compiler和Server Compiler,简称为C1编译器和C2编译器。C1编译器将字节码编译为本地代码,进行简单、 可靠的优化,如有必要将加入性能监控的逻辑。C2编译会启用一些编译耗时较长的优化,甚至进行一些激进优化。

       查找文献可知,默认情况下,当方法调用次数+循环回边次数超过10000、计数器是int等几个简单类型、步增是常量时,会触发C2编译优化。test2恰恰满足这三种情况!

       下面我们再设计一个实验,将步增改为变量,看看测试结果:

       由测试可知,将步增改为变量后,测试结果为6.163秒,和C++及 .net core 测试结果近似。

       针对这个测试案例,可以猜测 C2 优化时进行了循环展开。下面,我们在 .net core 下手动展开循环,测试性能,验证我们的猜想:

 

       测试结果为1.983s,近似java8的1.755s。猜想得到验证。

 ----

       总结:随着JVM、.Net等虚拟机技术的发展,语言特性对高性能计算性能影响越来越低,对计算机体系结构、编译原理、虚拟机编译机制的理解,对性能的影响变得更为重要。JVM的自动优化做的非常的强悍,.net core 在这方面还有不小差距,不过 .net core 可以通过手工优化来弥补这一差距。

Guess you like

Origin www.cnblogs.com/xiaotie/p/perf-3langs.html