What is Fork/Join?

The Fork/Join framework is a parallel execution task framework provided by Java7. The idea is to decompose large tasks into small tasks, and then small tasks can continue to be decomposed, and then the results of each small task are calculated separately and then combined, and finally the summarized result is used as Big task result. The idea is very similar to that of MapReduce. For task division, each subtask is required to be independent of each other, and to be able to execute tasks independently in parallel, without affecting each other.

The operation flow chart of Fork/Join is as follows:

imageimage

We can understand this framework through the literal meaning of the word Fork/Join. Fork is the meaning of fork, that is, the big task is broken down into parallel small tasks, and Join is the meaning of connection and combination, that is, the execution results of all parallel small tasks are aggregated.

Insert picture description here

The work-stealing algorithm
ForkJoin uses a work-stealing algorithm. If the task queue of a worker thread is empty and there is no task to execute, it will obtain tasks from other worker threads for active execution. In order to achieve work stealing, a double-ended queue is maintained in the worker thread, the stealing task thread gets the task from the end of the queue, and the stolen task thread gets the task from the head of the queue. This mechanism makes full use of threads for parallel computing, reducing thread competition. But when there is only one task in the queue, two threads to fetch it will cause a waste of resources.

The operation flow chart of work stealing is as follows:

Insert picture description here

Fork/Join core class
Fork/Join framework is mainly composed of two parts: subtasks and task scheduling. The class hierarchy diagram is as follows.

image
image

ForkJoinPool

ForkJoinPool is the task scheduler in the ForkJoin framework. It implements its own thread pool like ThreadPoolExecutor and provides three methods for scheduling subtasks:

execute: execute the specified task asynchronously, no result is returned;
invoke, invokeAll: execute the specified task asynchronously, wait for the completion to return the result;
submit: execute the specified task asynchronously, and return a Future object immediately;
ForkJoinTask
actual execution in the Fork/Join framework The task class has the following two implementations. Generally, these two implementation classes can be inherited.

RecursiveAction: used for subtasks with no results returned;
RecursiveTask: used for subtasks with results returned;

Fork/Join framework combat

The following is a small example of Fork/Join. From 1+2+…1 billion, each task can only process 1000 numbers and add them. If more than 1000 are automatically decomposed into small tasks for parallel processing; Comparison of time consumption between Join and use.

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class ForkJoinTask extends RecursiveTask<Long> {
    
    

	private static final long MAX = 1000000000L;
	private static final long THRESHOLD = 1000L;
	private long start;
	private long end;

	public ForkJoinTask(long start, long end) {
    
    
		this.start = start;
		this.end = end;
	}

	public static void main(String[] args) {
    
    
		test();
		System.out.println("--------------------");
		testForkJoin();
	}

	private static void test() {
    
    
		System.out.println("test");
		long start = System.currentTimeMillis();
		Long sum = 0L;
		for (long i = 0L; i <= MAX; i++) {
    
    
			sum += i;
		}
		System.out.println(sum);
		System.out.println(System.currentTimeMillis() - start + "ms");
	}

	private static void testForkJoin() {
    
    
		System.out.println("testForkJoin");
		long start = System.currentTimeMillis();
		ForkJoinPool forkJoinPool = new ForkJoinPool();
		Long sum = forkJoinPool.invoke(new ForkJoinTask(1, MAX));
		System.out.println(sum);
		System.out.println(System.currentTimeMillis() - start + "ms");
	}

	@Override
	protected Long compute() {
    
    
		long sum = 0;
		if (end - start <= THRESHOLD) {
    
    
			for (long i = start; i <= end; i++) {
    
    
				sum += i;
			}
			return sum;
		} else {
    
    
			long mid = (start + end) / 2;

			ForkJoinTask task1 = new ForkJoinTask(start, mid);
			task1.fork();

			ForkJoinTask task2 = new ForkJoinTask(mid + 1, end);
			task2.fork();

			return task1.join() + task2.join();
		}
	}

}

The calculation result is needed here, so the task inherits the RecursiveTask class. ForkJoinTask needs to implement the compute method. In this method, you first need to determine whether the task is less than or equal to the threshold 1000, and if it is, execute the task directly. Otherwise, it is divided into two subtasks. When each subtask calls the fork method, it will enter the compute method again to see if the current subtask needs to continue to be divided into grandchild tasks. If the division does not need to continue, the current subtask is executed and the result is returned. Using the join method will block and wait for the subtask to complete and get its result.

Program output:

test
500000000500000000
4992ms
--------------------
testForkJoin
500000000500000000
508ms

It can be seen from the results that the time consumption of parallel is significantly less than that of serial, which is the advantage of parallel tasks.

Nevertheless, you must be careful when using Fork/Join, don't use it blindly.

If the task is deeply disassembled, the number of threads in the system will accumulate, resulting in a serious degradation of system performance;
if the function call stack is deep, it will cause the stack memory to overflow;

[Java] Fork/Join in java

What is Fork/Join?

ForkJoinPool

Fork/Join framework combat

Guess you like