Direct ByteBuffer relative vs absolute read performance

Vladimir G. :

While I was testing the read performance of a direct java.nio.ByteBuffer I noticed that the absolute read is on average 2x times faster than the relative read. Also if I compare the source code of the relative vs absolute read, the code is pretty much the same except that the relative read maintains and internal counter. I wonder why do I see such a considerable difference in speed?

Below is the source code of my JMH benchmark:

public class DirectByteBufferReadBenchmark {

    private static final int OBJ_SIZE = 8 + 4 + 1;
    private static final int NUM_ELEM = 10_000_000;

    @State(Scope.Benchmark)
    public static class Data {

        private ByteBuffer directByteBuffer;

        @Setup
        public void setup() {
            directByteBuffer = ByteBuffer.allocateDirect(OBJ_SIZE * NUM_ELEM);
            for (int i = 0; i < NUM_ELEM; i++) {
                directByteBuffer.putLong(i);
                directByteBuffer.putInt(i);
                directByteBuffer.put((byte) (i & 1));
            }
        }
    }



    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public long testReadAbsolute(Data d) throws InterruptedException {
        long val = 0l;
        for (int i = 0; i < NUM_ELEM; i++) {
            int index = OBJ_SIZE * i;
            val += d.directByteBuffer.getLong(index);
            d.directByteBuffer.getInt(index + 8);
            d.directByteBuffer.get(index + 12);
        }
        return val;
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public long testReadRelative(Data d) throws InterruptedException {
        d.directByteBuffer.rewind();

        long val = 0l;
        for (int i = 0; i < NUM_ELEM; i++) {
            val += d.directByteBuffer.getLong();
            d.directByteBuffer.getInt();
            d.directByteBuffer.get();
        }

        return val;
    }

    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
            .include(DirectByteBufferReadBenchmark.class.getSimpleName())
            .warmupIterations(5)
            .measurementIterations(5)
            .forks(3)
            .threads(1)
            .build();

        new Runner(opt).run();
    }
}

And these are the results of my benchmark run:

Benchmark                                        Mode  Cnt   Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   15  88.605 ± 9.276  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   15  42.904 ± 3.018  ops/s

The test was run on a MacbookPro (2.2GHz Intel Core i7, 16Gb DDR3) and JDK 1.8.0_73.

UPDATE

I run the same test with JDK 9-ea b134. Both test show a ~10% speed increase but the speed difference between the two remains similar.

# JMH 1.13 (released 45 days ago)
# VM version: JDK 9-ea, VM 9-ea+134
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home/bin/java
# VM options: <none>


Benchmark                                        Mode  Cnt    Score    Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   15  102.170 ± 10.199  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   15   45.988 ±  3.896  ops/s

apangin :

JDK 8 indeed generates worse code for the loop with relative ByteBuffer access.

JMH has built-in perfasm profiler that prints generated assembly code for the hottest regions. I've used it to compare the compiled testReadAbsolute vs. testReadRelative, and here are the main differences:

Relative getLong / getInt/ get update position field of the ByteBuffer. VM does not optimize these updates: there are 3 memory writes on each loop iteration.
position range check is not eliminated: conditional branches on each loop iteration remained in compiled code.
Since redundant field updates and range checks make the loop body longer, VM unrolls only 2 iterations of the loop. The compiled version for the loop with absolute access has 16 iterations unrolled.

testReadAbsolute is compiled very well: the main loop just reads 16 longs, sums them up and jumps to the next iteration if index < 10_000_000 - 16. The state of directByteBuffer is not updated. However, JVM is not that smart for testReadRelative: seems like it cannot optimize field access of an object from outside.

There was much work in JDK 9 to optimize ByteBuffer. I've run the same test on JDK 9-ea b134, and verified that testReadRelative does not have redundant memory writes and range checks. Now it runs almost as fast as testReadAbsolute.

// JDK 1.8.0_92, VM 25.92-b14

Benchmark                                        Mode  Cnt   Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   10  99,727 ± 0,542  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   10  47,126 ± 0,289  ops/s

// JDK 9-ea, VM 9-ea+134

Benchmark                                        Mode  Cnt    Score   Error  Units
DirectByteBufferReadBenchmark.testReadAbsolute  thrpt   10  109,369 ± 0,403  ops/s
DirectByteBufferReadBenchmark.testReadRelative  thrpt   10   97,140 ± 0,572  ops/s

UPDATE

In order to help JIT compiler with optimization I've introduced local variable

ByteBuffer directByteBuffer = d.directByteBuffer

in both benchmarks. Otherwise level of indirection does not allow compiler to eliminate ByteBuffer.position field updates.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=450639&siteId=1

Direct ByteBuffer relative vs absolute read performance

区分fixed、relative、absolute

Relative path absolute path

python ---> relative and absolute paths

Absolute and relative positioning

node absolute and relative modules

Relative and absolute positioning in CSS

Relative and absolute paths in js

Relative and absolute paths in jsp

Relative and absolute paths

Happens-before for direct ByteBuffer

web base relative path vs absolute path html image tag and path

html / css relative positioning relative and absolute positioning absolute usage

CSS absolute positioning (absolute), relative positioning (relative) methods (detailed explanation)

CSS_ of absolute relative positioning

2019/09/23, relative and absolute

Relative and absolute paths (with examples explain)

Get HTML \ CSS the absolute and relative

Relative path and absolute path in Java

Python absolute path relative path

The restriction of postion relative to absolute and fixed

Detailed positioning position: relative and absolute

Absolute code to relative code in Optisystem

pyglet absolute path relative path

duilib absolute positioning and relative positioning

SSD VS disk read and write performance research

python read relative path

Html relative and absolute paths and relative paths to solve the photograph does not show

Relative positioning, the difference between absolute positioning