Java threads learning 11 - Fork / Join-Java Framework for Parallel Computations

           Parallel computing everywhere in big data today is not a new vocabulary, and even now there are stand-alone multi-core multi-machine cluster parallel computing, note here that the parallel, rather than concurrently. The strict, parallel refers to a plurality of tasks simultaneously performed in the system , and complicated means with a plurality of tasks exist within the system , different time-sliced tasks in execution mode switching, since the switching time is very short, giving the feeling seems to be performing at the same time. 
           Java after JDK7 joined the parallel computing framework Fork / Join, you can solve performance problems in our system big data computing. Fork / Join uses a divide and conquer, the fork is a big task split into several sub-tasks, sub-tasks to calculate respectively, and the results obtained are Join subtask, then combined, the process is recursive. When subtasks are assigned to execute on different cores, maximum efficiency. Pseudo-code as follows:
 

Result solve(Problem problem) {
    if (problem is small)
        directly solve problem
    else {
        split problem into independent parts
        fork new subtasks to solve each part
        join all subtasks
        compose result from subresults
    }
}

Core classes Fork / Join framework is ForkJoinPool , it can receive a ForkJoinTask, and the results obtained.

ForkJoinTask has two subclasses, RecursiveTask (return value) and RecursiveAction (no results returned), the definition of our own mission, simply select the two classes can be inherited. FIG class as follows: 

A look at the following example: computing a large array of all elements. code show as below:

import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

/**
 * @author: shuang.gao  Date: 2015/7/14 Time: 8:16
 */
public class SumTask extends RecursiveTask<Integer> {

    private static final long serialVersionUID = -6196480027075657316L;

    private static final int THRESHOLD = 500000;

    private long[] array;

    private int low;

    private int high;

    public SumTask(long[] array, int low, int high) {
        this.array = array;
        this.low = low;
        this.high = high;
    }

    @Override
    protected Integer compute() {
        int sum = 0;
        if (high - low <= THRESHOLD) {
            // 小于阈值则直接计算
            for (int i = low; i < high; i++) {
                sum += array[i];
            }
        } else {
            // 1. 一个大任务分割成两个子任务
            int mid = (low + high) >>> 1;
            SumTask left = new SumTask(array, low, mid);
            SumTask right = new SumTask(array, mid + 1, high);

            // 2. 分别计算
            left.fork();
            right.fork();

            // 3. 合并结果
            sum = left.join() + right.join();
        }
        return sum;
    }

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        long[] array = genArray(1000000);

        System.out.println(Arrays.toString(array));

        // 1. 创建任务
        SumTask sumTask = new SumTask(array, 0, array.length - 1);

        long begin = System.currentTimeMillis();

        // 2. 创建线程池
        ForkJoinPool forkJoinPool = new ForkJoinPool();

        // 3. 提交任务到线程池
        forkJoinPool.submit(sumTask);

        // 4. 获取结果
        Integer result = sumTask.get();

        long end = System.currentTimeMillis();

        System.out.println(String.format("结果 %s 耗时 %sms", result, end - begin));
    }

    private static long[] genArray(int size) {
        long[] array = new long[size];
        for (int i = 0; i < size; i++) {
            array[i] = new Random().nextLong();
        }
        return array;
    }
}

We adjust the threshold value (THRESHOLD), can be found consuming is not the same. In practical application, if necessary to split the task of a fixed size, can be tested to obtain the optimum threshold; if not fixed in size, it is necessary to design a scalable algorithms to calculate the dynamic threshold. If the sub-task a lot, not necessarily high efficiency. 
To be continued. . .

 

References
http://gee.cs.oswego.edu/dl/papers/fj.pdf 
https://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html 
https://www.ibm. COM / developerWorks / CN / Java / J-LO-forkjoin / 
http://www.ibm.com/developerworks/cn/java/j-jtp11137.html
 

Guess you like

Origin blog.csdn.net/yuhaibao324/article/details/93150322