[Concurrent programming] ForkJoinPool thread pool

Java 7 introduced a new concurrency framework-Fork/Join Framework. At the same time a new thread pool is introduced: ForkJoinPool (ForkJoinPool.coomonPool)

@sun.misc.Contended
public class ForkJoinPool extends AbstractExecutorService {
}

The main purpose of this article is to introduce the applicable scenarios, implementation principles, and sample codes of ForkJoinPool.

Speak ahead

It can be said to be an explanation, or it can be said that the following is the conclusion:

  1. ForkJoinPool is not to replace ExecutorService, but to complement it. In some application scenarios, the performance is better than ExecutorService.
  2. ForkJoinPool is mainly used to implement "divide and conquer" algorithms , especially functions called recursively after divide and conquer, such as quick sort.
  3. ForkJoinPool is most suitable for computationally intensive tasks . If there are I/O, synchronization between threads, sleep(), etc., which will cause threads to block for a long time, it is best to use ManagedBlocker.

use

First of all, I will introduce the use of Fork/Join Framework, which everyone cares about most. Let’s use a particularly simple method of finding the sum of all the elements of an integer array as the problem we need to solve now.

Question: Calculate the sum of positive integers from 1 to 10000000.

  • Option 1 : The most common for loop solution
  • The simplest, obviously, does not use any parallel programming means, only the most straightforward for-loop to achieve. The following is the specific implementation code.

For interface-oriented programming, below we define the calculation method as an interface, and different solutions can be written in different implementations.

public interface Calculator {

    /**
     * 把传进来的所有numbers 做求和处理
     *
     * @param numbers
     * @return 总和
     */
    long sumUp(long[] numbers);
}

Write an implementation through for loop. There is nothing surprising about this code, so there is not much explanation

/**
 * 通过普通的for循环 实现总和的相加 逻辑非常简单
 * @description //
 */
public class ForLoopCalculator implements Calculator {

	@Override
    public long sumUp(long[] numbers) {
        long total = 0;
        for (long i : numbers) {
            total += i;
        }
        return total;
    }
}

Write a main method for testing:

    public static void main(String[] args) {
        long[] numbers = LongStream.rangeClosed(1, 10000000).toArray();

        Instant start = Instant.now();
        Calculator calculator = new ForLoopCalculator();
        long result = calculator.sumUp(numbers);
        Instant end = Instant.now();
        System.out.println("耗时:" + Duration.between(start, end).toMillis() + "ms");

        System.out.println("结果为:" + result); 
    }
输出:
耗时:10ms
结果为:50000005000000
  • Solution 2 : ExecutorService multi-threaded implementation
    After the introduction of ExecutorService in Java 1.5, it is basically not recommended to create Thread objects directly, but to use ExecutorService uniformly . After all, in terms of the ease of use of the interface, ExecutorService is far better than the original Thread, not to mention the various thread pools, Future class, Lock class and other convenient tools provided by java.util.concurrent.

Since the above is an interface-oriented design, we only need to add an implementation class that uses ExecutorService:

/**
 * 使用ExecutorService实现多线程的求和
 * @description //
 */
public class ExecutorServiceCalculator implements Calculator {

    private int parallism;
    private ExecutorService pool;

    public ExecutorServiceCalculator() {
        parallism = Runtime.getRuntime().availableProcessors(); // CPU的核心数 默认就用cpu核心数了
        pool = Executors.newFixedThreadPool(parallism);
    }

    //处理计算任务的线程
    private static class SumTask implements Callable<Long> {
        private long[] numbers;
        private int from;
        private int to;

        public SumTask(long[] numbers, int from, int to) {
            this.numbers = numbers;
            this.from = from;
            this.to = to;
        }

        @Override
        public Long call() {
            long total = 0;
            for (int i = from; i <= to; i++) {
                total += numbers[i];
            }
            return total;
        }
    }


    @Override
    public long sumUp(long[] numbers) {
        List<Future<Long>> results = new ArrayList<>();

        // 把任务分解为 n 份,交给 n 个线程处理   4核心 就等分成4份呗
        // 然后把每一份都扔个一个SumTask线程 进行处理
        int part = numbers.length / parallism;
        for (int i = 0; i < parallism; i++) {
            int from = i * part; //开始位置
            int to = (i == parallism - 1) ? numbers.length - 1 : (i + 1) * part - 1; //结束位置

            //扔给线程池计算
            results.add(pool.submit(new SumTask(numbers, from, to)));
        }

        // 把每个线程的结果相加,得到最终结果 get()方法 是阻塞的
        // 优化方案:可以采用CompletableFuture来优化  JDK1.8的新特性
        long total = 0L;
        for (Future<Long> f : results) {
            try {
                total += f.get();
            } catch (Exception ignore) {
            }
        }

        return total;
    }
}

The main method is changed to:

    public static void main(String[] args) {
        long[] numbers = LongStream.rangeClosed(1, 10000000).toArray();

        Instant start = Instant.now();
        Calculator calculator = new ExecutorServiceCalculator();
        long result = calculator.sumUp(numbers);
        Instant end = Instant.now();
        System.out.println("耗时:" + Duration.between(start, end).toMillis() + "ms");

        System.out.println("结果为:" + result); // 打印结果500500
    }
输出:
耗时:30ms
结果为:50000005000000

Solution 3 : Using ForkJoinPool (Fork/Join) I
spent some time explaining the previous implementation method of ForkJoinPool, mainly to compare the difficulty of writing code. Now list the focus of this article-the implementation method of ForkJoinPool.

/**
 * 采用ForkJoin来计算求和
 * @date 2018/11/5 15:09
 */
public class ForkJoinCalculator implements Calculator {

    private ForkJoinPool pool;

    //执行任务RecursiveTask:有返回值  RecursiveAction:无返回值
    private static class SumTask extends RecursiveTask<Long> {
        private long[] numbers;
        private int from;
        private int to;

        public SumTask(long[] numbers, int from, int to) {
            this.numbers = numbers;
            this.from = from;
            this.to = to;
        }

        //此方法为ForkJoin的核心方法:对任务进行拆分  拆分的好坏决定了效率的高低
        @Override
        protected Long compute() {

            // 当需要计算的数字个数小于6时,直接采用for loop方式计算结果
            if (to - from < 6) {
                long total = 0;
                for (int i = from; i <= to; i++) {
                    total += numbers[i];
                }
                return total;
            } else { // 否则,把任务一分为二,递归拆分(注意此处有递归)到底拆分成多少分 需要根据具体情况而定
                int middle = (from + to) / 2;
                SumTask taskLeft = new SumTask(numbers, from, middle);
                SumTask taskRight = new SumTask(numbers, middle + 1, to);
                taskLeft.fork();
                taskRight.fork();
                return taskLeft.join() + taskRight.join();
            }
        }
    }

    public ForkJoinCalculator() {
        // 也可以使用公用的线程池 ForkJoinPool.commonPool():
        // pool = ForkJoinPool.commonPool()
        pool = new ForkJoinPool();
    }

    @Override
    public long sumUp(long[] numbers) {
        Long result = pool.invoke(new SumTask(numbers, 0, numbers.length - 1));
        pool.shutdown();
        return result;
    }
}
输出:
耗时:390ms
结果为:50000005000000

It can be seen that the implementation logic using ForkJoinPool is all concentrated in the compute() function, and the complete calculation process is realized in only 14 lines. In particular, there is no explicit "assignment of tasks to threads" in this code, but the tasks are decomposed, and the specific task-to-thread mapping is handed over to ForkJoinPool to complete.

  • Option 4 : Use parallel streams (recommended practice after JDK8)
    public static void main(String[] args) {

        Instant start = Instant.now();
        long result = LongStream.rangeClosed(0, 10000000L).parallel().reduce(0, Long::sum);
        Instant end = Instant.now();
        System.out.println("耗时:" + Duration.between(start, end).toMillis() + "ms");

        System.out.println("结果为:" + result); // 打印结果500500

    }
输出:
耗时:130ms
结果为:50000005000000

The bottom layer of the parallel stream is the Fork/Join framework, but the task splitting is well optimized.

Time-consuming and efficient explanation: Fork/Join parallel stream waits when the number of calculations is very large, the advantage can be reflected. In other words, if your calculations are relatively small, or are not CPU-intensive tasks, it is not recommended to use parallel processing

principle

I always thought that to understand the principle of something, the best thing is to try to realize it by yourself. **According to the sample code above, we can see that fork() and join() are the key to the "magic" of the Fork/Join Framework. We can assume the role of fork() and join() based on the function name:

  • fork(): Start a new thread (or reuse an idle thread in the thread pool), and hand over the task to the thread for processing.
  • join(): Wait for the processing thread of the task to finish and get the return value.

Question: When the task is broken down more and more finely, the number of threads required will increase, and most of the threads are in a waiting state?

But if we add the following code to the above sample code

System.out.println(pool.getPoolSize());

This will show the size of the current thread pool, which is 4 on my machine, which means there are only 4 worker threads. Even if we specify that the number of threads used is 1 when initializing the pool, the above program does not have any problems-except that it becomes a serial program.

public ForkJoinCalculator() {
    pool = new ForkJoinPool(1);
}

This contradiction can be derived. Our assumption is wrong. Not every fork() will cause a new thread to be created, and every join() does not necessarily cause the thread to be blocked. The implementation algorithm of Fork/Join Framework is not so "obvious", but a more complex algorithm-the name of this algorithm is called an work stealing algorithm.

Insert picture description here

  1. Each worker thread of ForkJoinPool maintains a work queue (WorkQueue), which is a double-ended queue (Deque), and the object stored in it is a task (ForkJoinTask).
  2. Each worker produces new tasks on the fly (usually because calls fork ()), will put the work end of the queue, and a worker thread in dealing with their own work queue, using the LIFO way, too That is to say, each time a task is taken out from the end of the team to execute .
  3. While each worker thread is processing its own work queue, it will try 窃取a task (either from the task just submitted to the pool, or from the work queue of other worker threads), and the stolen task is located in the work queue of other threads The head of the team, which means that the worker thread uses the FIFO mode when stealing the tasks of other worker threads .
  4. When encountering join(), if the task that requires the join has not been completed, other tasks will be processed first and wait for it to complete.
  5. When there is neither one's own task nor a task that can be stolen, it goes to sleep.

As for the details of the source level of Fork and Join, this article will not describe too much~~

There is actually no essential difference between submit() and fork(), except that the submission object becomes a submitting queue (there are some synchronization and initialization operations). The submitting queue, like other work queues, is the object "stolen" by the worker thread. Therefore, when a task in it is successfully stolen by a worker thread, it means that the submitted task really begins to enter the execution stage.

CommonPool related parameter configuration of ForkJoinPool

CommonPool is a built-in thread pool object of ForkJoinPool, and some of them use it in JDK8. How did he come? The specific source code is the static method of ForkJoinPool: makeCommonPool

   private static ForkJoinPool makeCommonPool() {
        int parallelism = -1;
        ForkJoinWorkerThreadFactory factory = null;
        UncaughtExceptionHandler handler = null;
        try {  // ignore exceptions in accessing/parsing properties
            String pp = System.getProperty
                ("java.util.concurrent.ForkJoinPool.common.parallelism");
            String fp = System.getProperty
                ("java.util.concurrent.ForkJoinPool.common.threadFactory");
            String hp = System.getProperty
                ("java.util.concurrent.ForkJoinPool.common.exceptionHandler");
            if (pp != null)
                parallelism = Integer.parseInt(pp);
            if (fp != null)
                factory = ((ForkJoinWorkerThreadFactory)ClassLoader.
                           getSystemClassLoader().loadClass(fp).newInstance());
            if (hp != null)
                handler = ((UncaughtExceptionHandler)ClassLoader.
                           getSystemClassLoader().loadClass(hp).newInstance());
        } catch (Exception ignore) {
        }
        if (factory == null) {
            if (System.getSecurityManager() == null)
                factory = defaultForkJoinWorkerThreadFactory;
            else // use security-managed default
                factory = new InnocuousForkJoinWorkerThreadFactory();
        }
        if (parallelism < 0 && // default 1 less than #cores
            (parallelism = Runtime.getRuntime().availableProcessors() - 1) <= 0)
            parallelism = 1;
        if (parallelism > MAX_CAP)
            parallelism = MAX_CAP;
        return new ForkJoinPool(parallelism, factory, handler, LIFO_QUEUE,
                                "ForkJoinPool.commonPool-worker-");
    }

Parameter explanation and custom commonPool parameters

Specify by code, it must be injected before commonPool is initialized (before the parallel stream is called, usually set after the system is started), otherwise it will not take effect.
There is no such restriction specified by the startup parameter, which is safer

  • Parallelism (that is, configuring the number of thread pools)
    can be configured through java.util.concurrent.ForkJoinPool.common.parallelism, and the maximum value cannot exceed MAX_CAP, which is 32767.
static final int MAX_CAP = 0x7fff; //32767

If not specified, the default is Runtime.getRuntime().availableProcessors()-1.

Customization: code specification (must be injected before commonPool initialization, otherwise it will not take effect)

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "8");
// 或者启动参数指定
-Djava.util.concurrent.ForkJoinPool.common.parallelism=8
  • threadFactory: The default is defaultForkJoinWorkerThreadFactory, if there is no securityManager.
  • exceptionHandler: If not set, the default is null
  • WorkQueue:
    Each worker thread that controls whether it is FIFO or LIFO ForkJoinPool maintains a work queue (WorkQueue), which is a double-ended queue (Deque), and the object stored in it is a task (ForkJoinTask).
    When each worker thread generates a new task during operation (usually because fork() is called), it will be placed at the end of the work queue, and the worker thread uses the LIFO method when processing its own work queue. That is to say, each time a task is taken out from the end of the team to execute.
    While processing its own work queue, each worker thread will try to steal a task (either from the task just submitted to the pool or from the work queue of other worker threads), and the stolen task is located in the work queue of other threads The leader of the team, which means that the worker thread uses the FIFO method when stealing the tasks of other worker threads.
  • queue capacity: queue capacity

Continue to introduce

After creating the ForkJoinPool instance, you can call the submit(ForkJoinTask task) or invoke(ForkJoinTask task) method of ForkJoinPool to execute the specified task.
Among them, ForkJoinTask represents a task that can be parallelized and merged. ForkJoinTask is an abstract class, it also has two abstract subclasses: RecusiveAction and RecusiveTask.

  • Among them, RecusiveTask represents a task with a return value,
  • RecusiveAction represents a task that has no return value.

Insert picture description here

Like ThreadPoolExecutor, it also implements the Executor and ExecutorService interfaces. It uses one 无限队列to save the tasks that need to be executed, and 线程的数量it is passed in through the constructor. If the desired number of threads is not passed in the constructor, the number of CPUs available on the current computer will be set to the number of threads as the default value.

ForkJoinPool is mainly used to use Divide-and-Conquer Algorithm to solve problems. Typical applications are quick sorting algorithms.

这里的要点在于,ForkJoinPool需要使用相对少的线程来处理大量的任务

such as

To sort 10 million data, this task will be divided into two 5 million sorting tasks and a merge task for these two groups of 5 million data. By analogy, the same segmentation process will be performed for 5 million data, and at the end a threshold will be set to specify when the data size reaches, such segmentation processing will be stopped. For example, when the number of elements is less than 10, stop split 转而使用插入排序对它们进行排序.

So in the end, all tasks will add up to about 2,000,000+. The key to the problem is that for a task, it can be executed only after all its subtasks are completed. and so:

  • When using ThreadPoolExecutor, there is a problem with the divide-and-conquer method, because ThreadPoolExecutorthe thread in the task queue cannot add another task and wait for the task to complete before continuing to execute
  • When using ForkJoinPool, you can let the threads create new tasks and suspend the current tasks. At this time, the threads can select subtasks from the queue for execution.

What is the performance difference when using ThreadPoolExecutor or ForkJoinPool?

  • Using ForkJoinPool can use a limited number of threads to complete a lot of tasks with parent-child relationships, such as using 4 threads to complete more than 2 million tasks.
  • When using ThreadPoolExecutor, it is impossible to complete, because the Thread in ThreadPoolExecutor cannot choose to prioritize the execution of child tasks. When 2 million tasks with a parent-child relationship need to be completed, 2 million threads are required. Obviously this is not feasible.

This is the advantage of work stealing mode

to sum up

After understanding the working principle of Fork/Join Framework, I believe that many precautions can be used to find the reason from the principle. For example: Why is it better not to block threads such as I/O in ForkJoinTask? , Every reader can think about it

There are some further reading content, just mention it here:

  1. ForkJoinPool has an Async Mode, the effect is that worker threads also use FIFO order when processing local tasks. ForkJoinPool in this mode is closer to a message queue, rather than used to process recursive tasks.
  2. When you need to block worker threads, you can use ManagedBlocker.
  3. The newly added CompletableFuture class in Java 1.8 can implement a promise-chain similar to Javascript, which is implemented internally by ForkJoinPool.

 

 

Guess you like

Origin blog.csdn.net/qq_41893274/article/details/112769307