Find the start of a cache line for a Java byte array

Thomas Mueller :

For a high performance blocked bloom filter, I would like to align data to cache lines. (I know it's easier to do such tricks in C, but I would like to use Java.)

I do have a solution, but I'm not sure if it's correct, or if there is a better way. My solution tries to find the start of the cache line using the following algorithm:

  • for each possible offset o (0..63; I assume cache line length of 64)
  • start a thread that reads from data[o] and writes that to data[o + 8]
  • in the main thread, write '1' to data[o], and wait until that ends up in data[o + 8] (so wait for the other thread)
  • repeat that

Then, measure how fast this was, basically how many increments for a loop of 1 million (in each thread). My logic is, it is slower if the data is in a different cache line.

Here my code:

public static void main(String... args) {
    for(int i=0; i<20; i++) {
        int size = (int) (1000 + Math.random() * 1000);
        byte[] data = new byte[size];
        int cacheLineOffset = getCacheLineOffset(data);
        System.out.println("offset: " + cacheLineOffset);
    }
}

private static int getCacheLineOffset(byte[] data) {
    for (int i = 0; i < 10; i++) {
        int x = tryGetCacheLineOffset(data, i + 3);
        if (x != -1) {
            return x;
        }
    }
    System.out.println("Cache line start not found");
    return 0;
}

private static int tryGetCacheLineOffset(byte[] data, int testCount) {
    // assume synchronization between two threads is faster(?)
    // if each thread works on the same cache line
    int[] counters = new int[64];
    int testOffset = 8;
    for (int test = 0; test < testCount; test++) {
        for (int offset = 0; offset < 64; offset++) {
            final int o = offset;
            final Semaphore sema = new Semaphore(0);
            Thread t = new Thread() {
                public void run() {
                    try {
                        sema.acquire();
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }
                    for (int i = 0; i < 1000000; i++) {
                        data[o + testOffset] = data[o];
                    }
                }
            };
            t.start();
            sema.release();
            data[o] = 1;
            int counter = 0;
            byte waitfor = 1;
            for (int i = 0; i < 1000000; i++) {
                byte x = data[o + testOffset];
                if (x == waitfor) {
                    data[o]++;
                    counter++;
                    waitfor++;
                }
            }
            try {
                t.join();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
            counters[offset] += counter;
        }
    }
    Arrays.fill(data, 0, testOffset + 64, (byte) 0);
    int low = Integer.MAX_VALUE, high = Integer.MIN_VALUE;
    for (int i = 0; i < 64; i++) {
        // average of 3
        int avg3 = (counters[(i - 1 + 64) % 64] + counters[i] + counters[(i + 1) % 64]) / 3;
        low = Math.min(low, avg3);
        high = Math.max(high, avg3);
    }
    if (low * 1.1 > high) {
        // no significant difference between low and high
        return -1;
    }
    int lowCount = 0;
    boolean[] isLow = new boolean[64];
    for (int i = 0; i < 64; i++) {
        if (counters[i] < (low + high) / 2) {
            isLow[i] = true;
            lowCount++;
        }
    }
    if (lowCount != 8) {
        // unclear
        return -1;
    }
    for (int i = 0; i < 64; i++) {
        if (isLow[(i - 1 + 64) % 64] && !isLow[i]) {
            return i;
        }
    }
    return -1;
}

It prints (example):

offset: 16
offset: 24
offset: 0
offset: 40
offset: 40
offset: 8
offset: 24
offset: 40
...

So arrays in Java seems to be aligned to 8 bytes.

maaartinus :

You know that the GC can move objects... so your perfectly aligned array may get misaligned later.

I'd try ByteBuffer; I guess, a direct one gets aligned a lot (to a page boundary).

Unsafe can give you the address and with JNI, you can get an array pinned.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=93253&siteId=1