Cause
One of my projects is written in Kotlin. He is a multi-dimensional database application, so he will operate the int array very frequently. If there is a segment of the program, it needs to perform hundreds of millions of array clearing actions.
Arrays.fill(target, 0);
This Arrays.fill is actually an implementation of jdk, very simple, it is a for loop to fill the data.
So I want to improve him, write the common array length into a single implementation, for example, the method of clearing 8 lengths is as follows:
fun clear8(target: IntArray) { if(target.size < 8){ throw IndexOutOfBoundsException() } target[0] = 0 target[1] = 0 target[2] = 0 target[3] = 0 target[4] = 0 target[5] = 0 target[6] = 0 target[7] = 0 }
Don't doubt your eyes, such writing is usually effective. A good compiler will optimize the code I wrote. Of course, a better compiler will optimize the for loop of a simple array.
Then let's test it.
import java.util.* import kotlin.system.measureNanoTime fun main() { test3() } private fun test3() { val size = 8 val time2 = measureNanoTime { val target = IntArray(size) for (i in 0 until 10_0000_0000) { IntArrays.clear8(target) } } println("fill$size $time2") val time1 = measureNanoTime { val target = IntArray(size) for (i in 0 until 10_0000_0000) { Arrays.fill(target, 0) } } println("Arrays.fill$size $time1") println() } internal object IntArrays { fun clear8(target: IntArray) { if(target.size < 8){ throw IndexOutOfBoundsException() } target[0] = 0 target[1] = 0 target[2] = 0 target[3] = 0 target[4] = 0 target[5] = 0 target[6] = 0 target[7] = 0 } }
Test Results:
fill8 55,408,200
Arrays.fill8 2,262,171,100
It can be seen that, using the unfolding method, the performance is 40 times higher than the 2.2 seconds that java comes with! !
Performance comparison with Java
I lament that Kotlin ’s compiler is really strong, but think about it carefully, it ’s not right, Kotlin is based on the JVM, and the credit should be Java ’s virtual machine runtime is very powerful, so if this program is converted to java, it ’s better to write it directly. Quick, at least consistent performance. Just do it.
//IntArrays.java import java.util.Arrays; final class IntArrays { static void clear8(int[] target) { /* if (target.length < 8){ throw new IndexOutOfBoundsException(); }*/ target[0] = 0; target[1] = 0; target[2] = 0; target[3] = 0; target[4] = 0; target[5] = 0; target[6] = 0; target[7] = 0; } } // IntArraysDemoJava.java import java.util.Arrays; public final class IntArraysDemoJava { public static void main(String[] var0) { test1(); } private static void test1() { long count = 1000000000; long start = System.nanoTime(); final int[] target = new int[8]; for(int i = 0; i < count; i++) { IntArrays.clear8(target); } long time2 = System.nanoTime() - start; System.out.println("fill8 " + time2); start = System.nanoTime(); for(int i = 0; i < count; i++) { Arrays.fill(target, 0); } long time1 = System.nanoTime() - start; System.out.println("Arrays.fill8 " + time1); System.out.println(); } }
The test results are as follows:
fill8 2,018,500,800
Arrays.fill8 2,234,306,500
Oh my god, this kind of optimization has almost no effect under java. I did n’t find any concept of release compilation parameters. At most, it only has debug = false. I included it in gradle.
compileJava { options.debug = false }
So that means that the bytecode generated by Kotlin is better than the bytecode generated by Java?
Java Kotlin ALOAD 0 ALOAD 1 ICONST_0 ICONST_0 ICONST_0 ICONST_0 IASTORE ASTORE ALOAD 0 ALOAD 1 ICONST_1 ICONST_1 ICONST_0 ICONST_0 IASTORE IASTORE
The bytecode is slightly different, if you ask me why? My hen. . . . . .
Comparison with C #
As a die-hard fan of .net, this time I will think about whether c # is faster, not to mention .net core 3 has done a lot of performance optimization,
class Program { static void Main(string[] args) { Test3.test1(); } } class Test3 { public static void test1() { long count = 1000000000; var watch = System.Diagnostics.Stopwatch.StartNew(); int[] target = new int[8]; for (int i = 0; i < count; i++) { Clear8(target); } watch.Stop(); Console.WriteLine("fill8 " + watch.Elapsed); watch.Restart(); for (int i = 0; i < count; i++) { Array.Clear(target, 0,8); } watch.Stop(); Console.WriteLine("Array.Clear8 " + watch.Elapsed); Console.WriteLine(); } static void Clear8(int[] target) { /* if (target.Length < 8) { throw new IndexOutOfRangeException(); }*/ target[0] = 0; target[1] = 0; target[2] = 0; target[3] = 0; target[4] = 0; target[5] = 0; target[6] = 0; target[7] = 0; } }
Test results:
fill8 00:00:02.7462676
Array.Clear8 00:00:08.4920514
Compared with Java, it is even slower, and even the Array.clear that comes with the system is even slower. How can this make me bear it, so Span.Fill (0), the result is even more unsatisfactory.
Performance compared to Nim
Interest is mentioned, then use C language to achieve one ... Not written, I am stupid ..., then use Rust to implement one, or not realized, follow the tutorial step by step, Still not done ...
Finally tossing out a Nim environment, um, it is still simple.
import times, strutils proc clear8*[int](target: var seq[int]) = target[0] = 0 target[1] = 0 target[2] = 0 target[3] = 0 target[4] = 0 target[5] = 0 target[6] = 0 target[7] = 0 proc clear*[int](target: var seq[int]) = for i in 0..<target.len: target[i] = 0 proc test3() = const size = 8 var start = epochTime() var target = newseq[int](size) for i in 0..<10_0000_0000: target.clear8() let elapsedStr = (epochTime() - start).formatFloat(format = ffDecimal, precision = 3) echo "fill8 ", elapsedStr start = epochTime() for i in 0..<10_0000_0000: target.clear() let elapsedStr2 = (epochTime() - start).formatFloat(format = ffDecimal, precision = 3) echo "Arrays.fill ", elapsedStr2 test3()
For test results, pay attention to adding --release parameter.
fill8 3.499
Arrays.fill 5.825
Disappointment, and its disappointment.
Remarks
All tests were conducted on my desktop computer and the configuration is as follows:
AMD Ryzen 5 3600 6 Core 3.59 Ghz
8 GB RAM
Windows 10 64 Professional Edition
All tests are compiled using release.