Performance test of an array clearing method on Kotlin, Java, C # and Nim

Cause

One of my projects is written in Kotlin. He is a multi-dimensional database application, so he will operate the int array very frequently. If there is a segment of the program, it needs to perform hundreds of millions of array clearing actions.

Arrays.fill(target, 0);

This Arrays.fill is actually an implementation of jdk, very simple, it is a for loop to fill the data.

So I want to improve him, write the common array length into a single implementation, for example, the method of clearing 8 lengths is as follows:

fun clear8(target: IntArray) {
    if(target.size < 8){
        throw IndexOutOfBoundsException()
    }
    target[0] = 0
    target[1] = 0
    target[2] = 0
    target[3] = 0
    target[4] = 0
    target[5] = 0
    target[6] = 0
    target[7] = 0
}

Don't doubt your eyes, such writing is usually effective. A good compiler will optimize the code I wrote. Of course, a better compiler will optimize the for loop of a simple array.

Then let's test it.

import java.util.*
import kotlin.system.measureNanoTime

fun main() {
    test3()
}


private fun test3() {
    val size = 8
    val time2 = measureNanoTime {
        val target = IntArray(size)
        for (i in 0 until 10_0000_0000) {
            IntArrays.clear8(target)
        }
    }
    println("fill$size          $time2")

    val time1 = measureNanoTime {
        val target = IntArray(size)
        for (i in 0 until 10_0000_0000) {
            Arrays.fill(target, 0)
        }
    }
    println("Arrays.fill$size   $time1")
    println()
}

internal object IntArrays {
    fun clear8(target: IntArray) {
        if(target.size < 8){
            throw IndexOutOfBoundsException()
        }
        target[0] = 0
        target[1] = 0
        target[2] = 0
        target[3] = 0
        target[4] = 0
        target[5] = 0
        target[6] = 0
        target[7] = 0
    }
}

Test Results:

fill8                    55,408,200
Arrays.fill8    2,262,171,100

It can be seen that, using the unfolding method, the performance is 40 times higher than the 2.2 seconds that java comes with! !

Performance comparison with Java

I lament that Kotlin ’s compiler is really strong, but think about it carefully, it ’s not right, Kotlin is based on the JVM, and the credit should be Java ’s virtual machine runtime is very powerful, so if this program is converted to java, it ’s better to write it directly. Quick, at least consistent performance. Just do it.

//IntArrays.java
import java.util.Arrays;

final class IntArrays {
    static void clear8(int[] target) {
/*        if (target.length < 8){
            throw new IndexOutOfBoundsException();
        }*/
        target[0] = 0;
        target[1] = 0;
        target[2] = 0;
        target[3] = 0;
        target[4] = 0;
        target[5] = 0;
        target[6] = 0;
        target[7] = 0;
    }
}

// IntArraysDemoJava.java
import java.util.Arrays;

public final class IntArraysDemoJava {
    public static void main(String[] var0) {
        test1();
    }

    private static void test1() {
        long count = 1000000000;
        long start = System.nanoTime();
        final int[] target = new int[8];

        for(int i = 0; i < count; i++) {
            IntArrays.clear8(target);
        }
        long time2 = System.nanoTime() - start;
        System.out.println("fill8          " + time2);

        start = System.nanoTime();
        for(int i = 0; i < count; i++) {
            Arrays.fill(target, 0);
        }

        long time1 = System.nanoTime() - start;
        System.out.println("Arrays.fill8   " + time1);
        System.out.println();
    }
}
Java implementation

The test results are as follows:

fill8                   2,018,500,800
Arrays.fill8        2,234,306,500

Oh my god, this kind of optimization has almost no effect under java. I did n’t find any concept of release compilation parameters. At most, it only has debug = false. I included it in gradle.

compileJava {
    options.debug = false
}

So that means that the bytecode generated by Kotlin is better than the bytecode generated by Java?

Java               Kotlin
ALOAD 0         ALOAD 1
ICONST_0        ICONST_0
ICONST_0        ICONST_0
IASTORE         ASTORE
  
ALOAD 0         ALOAD 1
ICONST_1        ICONST_1
ICONST_0        ICONST_0
IASTORE         IASTORE

The bytecode is slightly different, if you ask me why? My hen. . . . . .

Comparison with C #

As a die-hard fan of .net, this time I will think about whether c # is faster, not to mention .net core 3 has done a lot of performance optimization,

class Program {
   static void Main(string[] args) {
       Test3.test1();
   }
}

class Test3
{
    public static void test1()
    {
        long count = 1000000000;
        var watch = System.Diagnostics.Stopwatch.StartNew();
        int[] target = new int[8];

        for (int i = 0; i < count; i++)
        {
            Clear8(target);
        }
        watch.Stop();
        Console.WriteLine("fill8          " + watch.Elapsed);

        watch.Restart();
        for (int i = 0; i < count; i++)
        {
            Array.Clear(target, 0,8);
        }

        watch.Stop();
        Console.WriteLine("Array.Clear8   " + watch.Elapsed);
        Console.WriteLine();
    }

    static void Clear8(int[] target)
    {
        /* if (target.Length < 8)
        {
            throw new IndexOutOfRangeException();
        }*/
        target[0] = 0;
        target[1] = 0;
        target[2] = 0;
        target[3] = 0;
        target[4] = 0;
        target[5] = 0;
        target[6] = 0;
        target[7] = 0;
    }
}

Test results:

fill8                     00:00:02.7462676
Array.Clear8      00:00:08.4920514

Compared with Java, it is even slower, and even the Array.clear that comes with the system is even slower. How can this make me bear it, so Span.Fill (0), the result is even more unsatisfactory.

Performance compared to Nim

Interest is mentioned, then use C language to achieve one ... Not written, I am stupid ..., then use Rust to implement one, or not realized, follow the tutorial step by step, Still not done ...

Finally tossing out a Nim environment, um, it is still simple.

import times, strutils

proc clear8*[int](target: var seq[int]) =
    target[0] = 0
    target[1] = 0
    target[2] = 0
    target[3] = 0
    target[4] = 0
    target[5] = 0
    target[6] = 0
    target[7] = 0

proc clear*[int](target: var seq[int]) =
    for i in 0..<target.len:
        target[i] = 0


proc test3() =
    const size = 8
    var start = epochTime()
    var target = newseq[int](size)
    for i in 0..<10_0000_0000:
        target.clear8()
    
    let elapsedStr = (epochTime() - start).formatFloat(format = ffDecimal, precision = 3)
    echo "fill8         ", elapsedStr

    start = epochTime()
    for i in 0..<10_0000_0000:
        target.clear()
    
    let elapsedStr2 = (epochTime() - start).formatFloat(format = ffDecimal, precision = 3)
    echo "Arrays.fill   ", elapsedStr2

test3()
it

For test results, pay attention to adding --release parameter.

fill8 3.499
Arrays.fill 5.825

Disappointment, and its disappointment.

Remarks

All tests were conducted on my desktop computer and the configuration is as follows:

AMD Ryzen 5 3600 6 Core 3.59 Ghz

8 GB RAM

Windows 10 64 Professional Edition

All tests are compiled using release.

Guess you like

Origin www.cnblogs.com/tansm/p/12684664.html