Parallel data stream processing

Receiving a number n as a parameter and returns from 1 to n and the sum of all the numbers.

    public static int intSum(int n) {
        return Stream.iterate(1, i -> i + 1)
                .limit(n)
                .reduce(0, Integer::sum);
    }

This is a simple sequential flow, if n is great, then obviously there is a problem of single-threaded, so the introduction of a parallel concept.

    public static int parallelIntSum(int n) {
        return Stream.iterate(1, i -> i + 1)
                .limit(n)
                .parallel()
                .reduce(0, Integer::sum);
    }

Icon

We can mix sequence and parallel it? NO! As an example of the error.

    public static int complexSum(int n) {
        return IntStream.iterate(0, i -> i + 1)
                .parallel()
                .filter(i -> i % 2 == 0)
                .sequential()
                .map(x -> x * 2)
                .parallel()
                .reduce(0, Integer::sum);
    }

In fact, this example is performed only last a parallel.

How does this concurrent resource allocation it?

Internal parallel streams using the default ForkJoinPool, it is the default number of threads is the number of your processor, this value is obtained from Runtime.getRuntime (). Available- Processors ().

But you can change the size of the thread through java.util.concurrent.ForkJoinPool.common parallelism, as:

　　System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism","12");

This is a global setting, so that all will flow parallel code. Let ForkJoinPool size is equal to the number of processors is a good default value, which thou have good reason, then we do not build you modify it.

Indeed front lifted a parallel example of really fast? No, because there are two problems causing it to slow:

　　iterate is actually generated boxed object, operation requires unpacking

　　It is difficult to iterate in parallel into a plurality of blocks

Use more targeted

    public static int parallelIntSum(int n) {
        return IntStream.rangeClosed(1, n)
                .parallel()
                .reduce(0, Integer::sum);
    }

In fact, this does not need to deliberately remember, is our daily static programming, we also conduct a convection type of good, in fact I think this can work to the underlying processing, but may be in order to distinguish between the dynamic programming or control to use By.

In fact, there is a parallelism overhead, first split stream, then there are sub-thread allocation flows, consolidated data ... then there is the best test to ensure a certain degree of performance, not recommended for direct use.

For parallel flow used to ensure not change the shared state.

    public static class Accumulator {
        private long total = 0;
        public void add(long value) {
            total += value;
        }
    }

    public static long sideEffectParallelSum(long n) {
        Accumulator accumulator = new Accumulator();
        LongStream.rangeClosed(1, n).parallel().forEach(accumulator::add);
        return accumulator.total;
    }

This is completely wrong, because the total operating Accumulator is a shared property of all threads, not atomic.

While the order flow to parallel flow now it looks very simple, but make sure whether it is necessary. For example, a factor to consider, how the data structures underlying support stream splitting performance

fork / join framework

import static lambdasinaction.chap7.ParallelStreamsHarness.FORK_JOIN_POOL;


public class ForkJoinSumCalculator extends RecursiveTask<Long> {

    public static final long THRESHOLD = 10_000;

    private final long[] numbers;
    private final int start;
    private final int end;

    public ForkJoinSumCalculator(long[] numbers) {
        this(numbers, 0, numbers.length);
    }

    private ForkJoinSumCalculator(long[] numbers, int start, int end) {
        this.numbers = numbers;
        this.start = start;
        this.end = end;
    }

    @Override
    protected Long compute() {
        int length = end - start;
        if (length <= THRESHOLD) {
            return computeSequentially();
        }
        ForkJoinSumCalculator leftTask = new ForkJoinSumCalculator(numbers, start, start + length/2);
        leftTask.fork();
        ForkJoinSumCalculator rightTask = new ForkJoinSumCalculator(numbers, start + length/2, end);
        Long rightResult = rightTask.compute();
        Long leftResult = leftTask.join();
        return leftResult + rightResult;
    }

    private long computeSequentially() {
        long sum = 0;
        for (int i = start; i < end; i++) {
            sum += numbers[i];
        }
        return sum;
    }

    public static long forkJoinSum(long n) {
        long[] numbers = LongStream.rangeClosed(1, n).toArray();
        ForkJoinTask<Long> task = new ForkJoinSumCalculator(numbers);
        return FORK_JOIN_POOL.invoke(task);
    }
}

This is a framework for self-realization, in fact, in terms of the Pool we are singletons, because they do not fit the user free to change. This is actually a partition of thread-level implementation.

Note the use of the place, the key is to join clog, so you want to ensure that all sub-tasks are ok then call join, or obstruction can cause performance problems.

The thread is actually more, but from the diagram algorithm point of view, is not an ideal parallel program, so there is the bank theft.

end

Parallel data stream processing

Guess you like