Java - cache coherence between successive parallel streams?

Kenny Wong :

Consider the following piece of code (which isn't quite what it seems at first glance).

static class NumberContainer {

    int value = 0;

    void increment() {
        value++;
    }

    int getValue() {
        return value;
    }
}

public static void main(String[] args) {

    List<NumberContainer> list = new ArrayList<>();
    int numElements = 100000;
    for (int i = 0; i < numElements; i++) {
        list.add(new NumberContainer());
    }

    int numIterations = 10000;
    for (int j = 0; j < numIterations; j++) {
        list.parallelStream().forEach(NumberContainer::increment);
    }

    list.forEach(container -> {
        if (container.getValue() != numIterations) {
            System.out.println("Problem!!!");
        }
    });
}

My question is: In order to be absolutely certain that "Problem!!!" won't be printed, does the "value" variable in the NumberContainer class need to be marked volatile?

Let me explain how I currently understand this.

In the first parallel stream, NumberContainer-123 (say) is incremented by ForkJoinWorker-1 (say). So ForkJoinWorker-1 will have an up-to-date cache of NumberContainer-123.value, which is 1. (Other fork-join workers, however, will have out-of-date caches of NumberContainer-123.value - they will store the value 0. At some point, these other workers' caches will be updated, but this doesn't happen straight away.)
The first parallel stream finishes, but the common fork-join pool worker threads aren't killed. The second parallel stream then starts, using the very same common fork-join pool worker threads.
Suppose, now, that in the second parallel stream, the task of incrementing NumberContainer-123 is assigned to ForkJoinWorker-2 (say). ForkJoinWorker-2 will have its own cached value of NumberContainer-123.value. If a long period of time has elapsed between the first and second increments of NumberContainer-123, then presumably ForkJoinWorker-2's cache of NumberContainer-123.value will be up-to-date, i.e. the value 1 will be stored, and everything is good. But what if the time elapsed between first and second increments if NumberContainer-123 is extremely short? Then perhaps ForkJoinWorker-2's cache of NumberContainer-123.value might be out of date, storing the value 0, causing the code to fail!

Is my description above correct? If so, can anyone please tell me what kind of time delay between the two incrementing operations is required to guarantee cache consistency between the threads? Or if my understanding is wrong, then can someone please tell me what mechanism causes the thread-local caches to be "flushed" in between the first parallel stream and the second parallel stream?

alf :

It should not need any delay. By the time you're out of ParallelStream's forEach, all the tasks have finished. That establishes a happens-before relation between the increment and the end of forEach. All the forEach calls are ordered by being called from the same thread, and the check, similarly, happens-after all the forEach calls.

int numIterations = 10000;
for (int j = 0; j < numIterations; j++) {
    list.parallelStream().forEach(NumberContainer::increment);
    // here, everything is "flushed", i.e. the ForkJoinTask is finished
}

Back to your question about the threads, the trick here is, the threads are irrelevant. The memory model hinges on the happens-before relation, and the fork-join task ensures happens-before relation between the call to forEach and the operation body, and between the operation body and the return from forEach (even if the returned value is Void)

See also Memory visibility in Fork-join

As @erickson mentions in comments,

If you can't establish correctness through happens-before relationships, no amount of time is "enough." It's not a wall-clock timing issue; you need to apply the Java memory model correctly.

Moreover, thinking about it in terms of "flushing" the memory is wrong, as there are many more things that can affect you. Flushing, for instance, is trivial: I have not checked, but can bet that there's just a memory barrier on the task completion; but you can get wrong data because the compiler decided to optimise non-volatile reads away (the variable is not volatile, and is not changed in this thread, so it's not going to change, so we can allocate it to a register, et voila), reorder the code in any way allowed by the happens-before relation, etc.

Most importantly, all those optimizations can and will change over time, so even if you went to the generated assembly (which may vary depending on the load pattern) and checked all the memory barriers, it does not guarantee that your code will work unless you can prove that your reads happen-after your writes, in which case Java Memory Model is on your side (assuming there's no bug in JVM).

As for the great pain, it's the very goal of ForkJoinTask to make the synchronization trivial, so enjoy. It was (it seems) done by marking the java.util.concurrent.ForkJoinTask#status volatile, but that's an implementation detail you should not care about or rely upon.

Java - cache coherence between successive parallel streams?

Guess you like