9. Analysis of the Future&Fork&Join framework principle of programming

task nature type

CPU-intensive (CPU-bound)

CPU-intensive is also called computing-intensive, which means that the performance of the hard disk and memory of the system is much better than that of the CPU. ), I/O can be completed in a short time, but the CPU still has a lot of operations to process, and the CPU Loading is very high.
In a multi-program system, programs that spend most of their time doing calculations, logic judgments, and other CPU actions are called CPU bound. For example, a program that calculates pi to less than 1,000 decimal places spends most of its time on trigonometric functions and square root calculations during execution, which is a program that is CPU bound. CPU bound programs generally have a high CPU usage. This may be because the task itself does not need to access the I/O device, or it may be because the program is implemented by multi-threading, which shields the waiting time for I/O.
The number of threads is generally set as: number of threads = number of CPU cores + 1 (modern CPUs support hyperthreading)

IO-intensive (I/O bound)

IO-intensive means that the CPU performance of the system is much better than that of the hard disk and memory. At this time, when the system is running, most of the situation is that the CPU is waiting for I/O (hard disk/memory) read/write operations. At this time, CPU Loading does not not tall.
When a program with I/O bound reaches the performance limit, the CPU usage is still low. This may be because the task itself
requires a lot of I/O operations, and the pipeline is not doing well, and the processor capacity is not fully utilized. The number of threads is generally set as:
number of threads = ((thread waiting time + thread CPU time)/thread CPU time) * number of CPUs

CPU intensive vs IO intensive

We can divide tasks into computation-intensive and IO-intensive. Computation-intensive tasks are characterized by a large number of calculations that consume CPU resources, such as calculating pi, decoding high-definition videos, etc., all relying on the computing power of the CPU. Although this kind of computing-intensive task can also be completed by multi-tasking, the more tasks, the more time spent on task switching, and the lower the efficiency of CPU execution tasks. Therefore, to make the most efficient use of CPU, computing-intensive The number of simultaneous tasks should be equal to the number of CPU cores.
Since computing-intensive tasks mainly consume CPU resources, the efficiency of code execution is very important. Scripting languages like Python run very inefficiently and are totally unsuitable for computationally intensive tasks. For computationally intensive tasks, it is best written in C.
The second type of task is IO-intensive. Tasks involving network and disk IO are all IO-intensive tasks. This type of task is characterized by low CPU consumption, and most of the time of the task is waiting for IO operations to complete (because The speed of IO is much lower than the speed of CPU and memory). For IO-intensive tasks, the more tasks, the higher the CPU efficiency, but there is a limit. Most common tasks are IO-intensive tasks, such as web applications.
During the execution of IO-intensive tasks, 99% of the time is spent on IO, and very little time is spent on the CPU. Therefore, it is completely impossible to replace a scripting language such as Python with an extremely fast running C language. Improve operational efficiency. For IO-intensive tasks, the most suitable language is the language with the highest development efficiency (the least amount of code), scripting language is the first choice, and C language is the worst.

1. What is the Fork/Join framework?

The Fork/Join framework is a framework provided by Java7 for executing tasks in parallel. It is a framework for dividing a large task into several small tasks, and finally summarizing the results of each small task to obtain the result of the large task.
Fork is to divide a large task into several subtasks to execute in parallel, and Join is to merge the execution results of these subtasks to finally get the result of this large task. For example, the calculation of 1+2+...+10000 can be divided into 10 subtasks, and each subtask sums 1000 numbers respectively, and finally summarizes the results of these 10 subtasks. As shown in the figure below:
insert image description here
Fork/Jion features:

ForkJoinPool is not intended to replace ExecutorService, but its complement, and its performance is better than ExecutorService in some application scenarios. (See Java Tip: When to use ForkJoinPool vs ExecutorService )
ForkJoinPool is mainly used to implement the "divide and conquer" algorithm, especially the function called recursively after the divide and conquer, such as quick sort.
ForkJoinPool is most suitable for computing-intensive tasks. If there is I/O, inter-thread synchronization, sleep(), etc. that will cause threads to block for a long time, it is best to use ManagedBlocker.

2. Work Stealing Algorithms

The work-stealing algorithm refers to a thread stealing tasks from other queues for execution.
We need to do a relatively large task. We can divide this task into several independent subtasks. In order to reduce the competition between threads, we put these subtasks into different queues and create a queue for each queue. A separate thread executes the tasks in the queue, and the thread corresponds to the queue one by one. For example, the A thread is responsible for processing the tasks in the A queue. However, some threads will finish the tasks in their own queues first, while there are still tasks waiting to be processed in the queues corresponding to other threads. Instead of waiting, the thread that has finished its work might as well help other threads to work, so it goes to the queue of other threads to steal a task to execute. At this time, they will access the same queue, so in order to reduce the competition between the stealing task thread and the stolen task thread, a double-ended queue is usually used, and the stolen task thread always executes the task from the head of the double-ended queue. The thread that steals the task always executes the task from the tail of the double-ended queue.

The advantage of the work-stealing algorithm is to make full use of threads for parallel computing and reduce the competition among threads. The disadvantage is that there
is still competition in some cases, such as when there is only one task in the double-ended queue. And it consumes more system resources
, such as creating multiple threads and multiple double-ended queues.

insert image description here

Each worker thread of ForkJoinPool maintains a work queue (WorkQueue), which is a double-ended queue (Deque), and the objects stored in it are tasks (ForkJoinTask).
When each worker thread generates a new task during operation (usually because fork() is called), it will be placed at the end of the work queue, and when the worker thread processes its own work queue, it uses the LIFO method. That is to say, each time the task is taken from the end of the queue to execute.
While each worker thread is processing its own work queue, it will try to steal a task (either from the task just submitted to the pool, or from the work queue of other worker threads), and the stolen task is located in the work queue of other threads The leader of the team, that is to say, when the worker thread steals the tasks of other worker threads, it uses the FIFO method.
When join() is encountered, if the task that needs to join has not been completed, other tasks will be processed first and wait for it to complete.
Goes dormant when there are no quests of your own or quests to steal.

3. The use of fork/join

ForkJoinTask: To use the ForkJoin framework, we must first create a ForkJoin task. It provides a mechanism to perform fork() and join() operations in tasks. Usually, we don't need to directly inherit the ForkJoinTask class, but only need to inherit its subclasses. The Fork/Join framework provides the following two subclasses:
RecursiveAction : Used for tasks that return no results. (RecursiveAction can divide its work into smaller pieces so that they can be executed by independent threads or CPUs.)
RecursiveTask : Used for tasks that return results. (You can divide your own work into several smaller tasks, and merge the execution of
these subtasks into a collective result. There can be several levels of splitting and merging) CountedCompleter: After the task is completed, it will trigger the execution of a custom The hook function
insert image description here
ForkJoinPool: ForkJoinTask needs to be executed through ForkJoinPool, and the subtasks separated by the task will be added to the deque maintained by the current worker thread and enter the head of the queue. When there is no task in the queue of a worker thread, it will randomly get a task from the tail of the queue of other worker threads.

Scenario example:
define a fork/join task, as shown in the following example, randomly generate 2000w pieces of data in the array, and then sum

/**
 * RecursiveTask 并行计算，同步有返回值
 * ForkJoin框架处理的任务基本都能使用递归处理，比如求斐波那契数列等，但递归算法的缺陷是：
 *    一只会只用单线程处理，
 *    二是递归次数过多时会导致堆栈溢出；
 * ForkJoin解决了这两个问题，使用多线程并发处理，充分利用计算资源来提高效率，同时避免堆栈溢出发生。
 * 当然像求斐波那契数列这种小问题直接使用线性算法搞定可能更简单，实际应用中完全没必要使用ForkJoin框架，
 * 所以ForkJoin是核弹，是用来对付大家伙的，比如超大数组排序。
 * 最佳应用场景：多核、多内存、可以分割计算再合并的计算密集型任务
 */
class LongSum extends RecursiveTask<Long> {
    
    

    static final int SEQUENTIAL_THRESHOLD = 1000;
    static final long NPS = (1000L * 1000 * 1000);
    static final boolean extraWork = true; // change to add more than just a sum


    int low;
    int high;
    int[] array;

    LongSum(int[] arr, int lo, int hi) {
    
    
        array = arr;
        low = lo;
        high = hi;
    }

    /**
     * fork()方法：将任务放入队列并安排异步执行，一个任务应该只调用一次fork()函数，除非已经执行完毕并重新初始化。
     * tryUnfork()方法：尝试把任务从队列中拿出单独处理，但不一定成功。
     * join()方法：等待计算完成并返回计算结果。
     * isCompletedAbnormally()方法：用于判断任务计算是否发生异常。
     */
    protected Long compute() {
    
    

        if (high - low <= SEQUENTIAL_THRESHOLD) {
    
    
            long sum = 0;
            for (int i = low; i < high; ++i) {
    
    
                sum += array[i];
            }
            return sum;

        } else {
    
    
            int mid = low + (high - low) / 2;
            LongSum left = new LongSum(array, low, mid);
            LongSum right = new LongSum(array, mid, high);
            left.fork();
            right.fork();
            long rightAns = right.join();
            long leftAns = left.join();
            return leftAns + rightAns;
        }
    }
}

Execute fork/join tasks

public class LongSumMain {
    
    
	//获取逻辑处理器数量
	static final int NCPU = Runtime.getRuntime().availableProcessors();
	/** for time conversion */
	static final long NPS = (1000L * 1000 * 1000);

	static long calcSum;

	static final boolean reportSteals = true;

	public static void main(String[] args) throws Exception {
    
    
		int[] array = Utils.buildRandomIntArray(20000000);
		System.out.println("cpu-num:"+NCPU);
		//单线程下计算数组数据总和
 		calcSum = seqSum(array);
		System.out.println("seq sum=" + calcSum);

		//采用fork/join方式将数组求和任务进行拆分执行，最后合并结果
		LongSum ls = new LongSum(array, 0, array.length);
  		ForkJoinPool fjp  = new ForkJoinPool(NCPU); //使用的线程数
		ForkJoinTask<Long> task = fjp.submit(ls);
		System.out.println("forkjoin sum=" + task.get());

		if(task.isCompletedAbnormally()){
    
    
			System.out.println(task.getException());
		}

		fjp.shutdown();

	}


	static long seqSum(int[] array) {
    
    
		long sum = 0;
		for (int i = 0; i < array.length; ++i)
			sum += array[i];
		return sum;
	}
}

Results of the:

cpu-num:12
seq sum=990070147
forkjoin sum=990070147

4. Principle of fork/join framework

1. Exception handling

ForkJoinTask may throw an exception during execution, but we have no way to catch the exception directly in the main thread, so ForkJoinTask provides the isCompletedAbnormally() method to check whether the task has thrown an exception or has been cancelled, and can pass ForkJoinTask's The getException method gets the exception. The example is as follows

   if(task.isCompletedAbnormally()){
    
    
      System.out.println(task.getException());
}

The getException method returns a Throwable object, or a CancellationException if the task was canceled. Returns null if the task was not completed or an exception was thrown.

2. ForkJoinPool constructor

Its complete construction method is as follows

private ForkJoinPool(int parallelism,
                     ForkJoinWorkerThreadFactory factory,
                     UncaughtExceptionHandler handler,
                     int mode,
                     String workerNamePrefix) {
    
    
    this.workerNamePrefix = workerNamePrefix;
    this.factory = factory;
    this.ueh = handler;
    this.config = (parallelism & SMASK) | mode;
    long np = (long)(parallelism); // offset ctl counts
    this.ctl = ((np << AC_SHIFT) & AC_MASK) | ((np << TC_SHIFT) &
TC_MASK); }

Important parameter explanation

parallelism: parallelism (the parallelism level), by default it is consistent with the number of cpus of our machine, use Runtime.getRuntime().availableProcessors() to get the number of CPUs available when our machine is running.
factory: The factory for creating new threads. By default the ForkJoinWorkerThreadFactory defaultForkJoinWorkerThreadFactory is used.
3handler: The processor in case of thread exception (Thread.UncaughtExceptionHandler handler), which performs some processing when the task thread is interrupted due to some unforeseen errors when the thread executes the task. The default is null.
asyncMode: This parameter should be noted that in ForkJoinPool, each worker thread has an independent task queue. asyncMode indicates the way the task queue in the worker thread uses to schedule work. It can be FIFO or FIFO Last-in-first-out LIFO, if it is true, it adopts the first-in first-out working method and scheduling method, and the system defaults to false

3, ForkJoinTask fork method

fork() does only one thing, which is to push the task into the work queue of the current worker thread. You can see the source code below:

public final ForkJoinTask<V> fork() {
    
    
Thread t;
    if ((t = Thread.currentThread()) instanceof
ForkJoinWorkerThread)
        ((ForkJoinWorkerThread)t).workQueue.push(this);
    else
        ForkJoinPool.common.externalPush(this);
    return this;
}

4, ForkJoinTask join method

The work of join() is much more complicated, which is why join() can prevent threads from being blocked-unlike Thread.join() of the same name.

Checks if the thread calling join() is a ForkJoinThread thread. If not (such as the main thread), block the current thread and wait for the task to complete. If yes, do not block.
Check the completion status of the task, and return the result directly if it has been completed.
If the task has not yet completed, but is in its own work queue, complete it.
If the task has been stolen by other worker threads, steal the task in the thief's work queue (by means) and execute it, in order to help it complete the task it wants to join as soon as possible.
If the thief who stole the task has finished all his tasks and is waiting for the task that needs to join, then find the thief's thief and help him complete his task.
Step 5 is performed recursively.

5, ForkJoinPool.submit method

public <T> ForkJoinTask<T> submit(ForkJoinTask<T> task) {
    
    
    if (task == null)
throw new NullPointerException(); //提交到工作队列
    externalPush(task);
    return task;
}

ForkJoinPool itself has work queues, which are used to receive tasks submitted by external threads (non-ForkJoinThread threads), and these work queues are called.
There is actually no essential difference between submit() and fork(), except that the submission object becomes a submitting queue (there are also some synchronization and initialization operations). The submitting queue, like other work queues, is the object of "stealing" by the worker thread, so when a task in it is successfully stolen by a worker thread, it means that the submitted task really starts to enter the execution phase.

6. Fork/Join framework execution process

insert image description here