Concurrent programming 5: How to execute tasks?

Table of contents

1. The way to execute tasks in threads

2. Executor framework

2.1 - Thread execution strategy

2.2 - Thread pool

2.3 - Executor life cycle

2.4 - Deferred tasks and periodic tasks

3. Find out the available parallelism - code example

3.1 - Single-threaded I/O operations

3.2 - Callable and Future carrying task results (important)

3.3 - Using Future to implement page renderer

3.5 - CompletionService：Executor 与 BlockingQueue

Most concurrent applications are structured around "task execution" : tasks are usually abstract and discrete units of work. By decomposing the application's work into multiple tasks, it simplifies the organization of the program, provides a transaction boundary to optimize the error recovery process, and provides a parallel work structure to improve concurrency. //Purpose: How to disassemble a job into multiple tasks, execute concurrently -> clear task boundaries (independent tasks are conducive to concurrency)

1. The way to execute tasks in threads

Serial execution: In server applications, serial processing mechanisms are generally not capable of high throughput or responsiveness. // Only one request can be executed at a time, the main thread is blocked

Parallel Execution: Serving through multiple threads for greater responsiveness. //Multi-threaded execution, without blocking the main thread (separate tasks from the main thread) -> faster responsiveness and higher throughput

It should be noted that the overhead of the thread life cycle is very high (creating threads in Java requires kernel mode support). Active threads consume system resources, especially memory (TCB).

Therefore, within a certain range, adding threads can improve the throughput of the system, but if it exceeds this range, creating more threads will only slow down the execution speed of the program, and if too many threads are created, the entire application The program will crash. To avoid this danger, you should limit the number of threads your application can create, and test your application thoroughly to ensure that it doesn't run out of resources when the limit is reached . //Choose an appropriate number of threads, and need to manage thread resources (avoid creating threads infinitely)

2. Executor framework

Executor is based on the producer-consumer model . The operation of submitting tasks is equivalent to the producer , and the thread that executes the task is equivalent to the consumer . If you want to implement a producer-consumer design in your program, the easiest way is usually to use Executor.

In TaskExecutionWebServer, by using Executor, the submission of the request processing task is decoupled from the actual execution of the task. The code is as follows: // Executor is an interface that can be implemented by Java or by yourself

public class TaskExecutionWebServer {
    private static final int NTHREADS = 100;
    private static final Executor exec = Executors.newFixedThreadPool(NTHREADS);

    public static void main(String[] args) throws IOException {
        ServerSocket socket = new ServerSocket(80);
        while (true) {
            final Socket connection = socket.accept();
            //任务
            Runnable task = () -> handleRequest(connection);
            //使用Executor执行任务
            exec.execute(task);
        }
    }

    private static void handleRequest(Socket connection) {
        // request-handling logic here
    }
}

2.1 - Thread execution strategy

By decoupling task submission from execution, execution policies can be specified and modified for certain types of tasks without too much difficulty. The "What, Where, When, How" and other aspects of task execution are defined in the execution policy, including:

In what thread is the task executed?
In what order are tasks executed (FIFO, LIFO, priority)?
How many tasks can be executed concurrently?
How many tasks are waiting to be executed in the queue?
If the system needs to reject a task due to overload, which task should be chosen? Also, how to notify the application that a task was rejected?
What actions should be performed before or after performing a task?

The various execution policies are a resource management tool, and the best policy depends on the available computing resources and the demand for quality of service. By limiting the number of concurrent tasks , you can ensure that your application does not fail due to resource exhaustion, or severely impact performance due to competition on scarce resources. By decoupling the submission of a task from its execution strategy , it helps to choose the execution strategy that best matches the available hardware resources during the deployment phase. //Description of how to perform the task

2.2 - Thread pool

Thread pool refers to a resource pool that manages a group of isomorphic worker threads. The thread pool is closely related to the work queue (Work Oueue) , in which all tasks waiting to be executed are saved in the work queue. The task of the worker thread (Worker Thread) is simple: get a task from the work queue, execute the task, and then return to the thread pool and wait for the next task. //Thread pool -> storage thread, work queue -> storage task

The Java class library provides a flexible thread pool with some useful default configurations. A thread pool can be created by calling one of the static factory methods in Executors:

newFixedThreadPool . newFixedThreadPool will create a fixed-length thread pool, and create a thread every time a task is submitted, until the maximum number of thread pools is reached, at which point the size of the thread pool will no longer change (if a thread occurs due to an unexpected Exception, then the thread pool will add a new thread). // limit the number of threads

newCachedThreadPool . newCachedThreadPool will create a cacheable thread pool. If the current size of the thread pool exceeds the processing demand, idle threads will be recycled. When the demand increases, new threads can be added. There is no limit to the size of the thread pool . // Unlimited number of threads

newSingleThreadExecutor . newSingleThreadExecutor is a single-threaded Executor that creates a single worker thread to perform tasks. If this thread ends abnormally, another thread will be created to replace it. newSingleThreadExecutor can ensure that tasks are executed serially according to the order in the queue (eg FIFO, LIFO priority). // Execute tasks in sequence

newScheduledThreadPool . newScheduledThreadPool creates a fixed-length thread pool, and executes tasks in a delayed or timing manner, similar to Timer. //Execute scheduled tasks

2.3 - Executor life cycle

In order to solve the life cycle problem of execution service, Executor extends the ExecutorService interface , adds some methods for life cycle management, and also has some convenient methods for task submission.

/*
 * 该Executor提供管理线程池终止的方法，以及提供用于跟踪一个或多个异步任务进度的Future的方法。
 */
public interface ExecutorService extends Executor {
    //1-关闭执行器(线程池)：不再接收新任务，然后等待正在执行的线程执行完毕
    void shutdown();
    //关闭执行器(线程池)：停止所有正在执行的任务，并返回等待执行的任务列表
    List<Runnable> shutdownNow();
    
    //2-判断执行器(线程池)是否关闭
    boolean isShutdown();

    //3-判断执行器(线程池)关闭后所有任务是否已完成
    //注意，除非先调用shutdown/shutdownNow，否则isTerminated永远不会为true。
    boolean isTerminated();

    //4-阻塞当前线程，直到所有任务完成或中断或当前等待超时
    boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException;

    //5-执行给定的任务，并返回表示该任务的Future。
    <T> Future<T> submit(Callable<T> task);
    <T> Future<T> submit(Runnable task, T result);
    Future<?> submit(Runnable task);

    //6-执行给定的任务，并在所有任务完成后返回保存其状态和结果的future列表。
    <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks) throws InterruptedException;
    <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit) throws InterruptedException;
    
    //7-执行给定的任务，如果有成功完成的任务，则返回成功完成的任务的结果
    //未完成的任务将被取消
    <T> T invokeAny(Collection<? extends Callable<T>> tasks) throws InterruptedException, ExecutionException;
    <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException;
}

The life cycle of ExecutorService has 3 states: running , shutting down and terminated . ExecutorService is running when it is initially created.

The shutdown method will perform a graceful shutdown process: no new tasks will be accepted, and at the same time, it will wait for the tasks that have been submitted to complete - including those that have not yet started.
The shutdownNow method will perform a brutal shutdown: it will attempt to cancel all running tasks and will not start any tasks in the queue that have not yet started executing.

Tasks submitted after the ExecutorService is closed will be handled by a "Reiected Execution Handler" , which either discards the task, or causes the execute method to throw an unchecked ReiectedExecutionException. After all tasks are completed, the ExecutorService will enter the terminated state.

You can call awaitTermination to wait for the ExecutorService to reach the terminated state, or call isTerminated to poll whether the ExecutorService has terminated. Normally shutdown is called immediately after calling awaitTermination, which has the effect of shutting down the ExecutorService synchronously . //Two methods are used in combination to safely close ExecutorService: awaitTermination + shutdown

2.4 - Deferred tasks and periodic tasks

The Timer class is responsible for managing delayed tasks and periodic tasks. However, Timer has some drawbacks, so you should consider using ScheduledThreadPoolExecutor instead. Objects of this class can be created through the constructor of ScheduledThreadPoolExecutor or the newScheduledThreadPool worker method.

Problems with the Timer class:

(1) Timer will only create one thread when executing all timing tasks. If a task takes too long to execute, it will destroy the timing precision of other TimerTasks. //Problems caused by overlapping task execution time can be avoided by using multithreading

(2) The Timer thread does not catch the exception, so when the TimerTask throws an unchecked exception, the timer thread will be terminated. In this case, the Timer will not resume the execution of the thread, but will mistakenly think that the entire Timer has been cancelled . Therefore, the TimerTask that has been scheduled but not yet executed will not be executed again, and new tasks cannot be scheduled. This problem is called "thread leak" . // can't handle exception, can't recover from exception

3. Find out the available parallelism - code example

//summarize method from example

3.1 - Single-threaded I/O operations

For example, in the page renderer program below, most of the image download process in the program is waiting for I/O operations to complete, during which the CPU hardly does any work. Therefore, this serial execution method does not fully utilize the CPU, causing the user to wait an excessively long time before seeing the final page.

import java.util.*;

/**
 * 使用单线的程渲染器
 */
public abstract class SingleThreadRenderer {

    /**
     * 渲染页面
     */
    void renderPage(CharSequence source) {
        //1-加载文本数据
        renderText(source);
        List<ImageData> imageData = new ArrayList<>();
        //下载多个图片资源
        for (ImageInfo imageInfo : scanForImageInfo(source)) {
            //TODO:图像下载大部分时间都是I/O操作
            imageData.add(imageInfo.downloadImage());
        }
        for (ImageData data : imageData) {
            //2-加载图片数据
            renderImage(data);
        }
    }

    interface ImageData {

    }

    interface ImageInfo {

        ImageData downloadImage();
    }

    abstract void renderText(CharSequence s);

    abstract List<ImageInfo> scanForImageInfo(CharSequence s);

    abstract void renderImage(ImageData i);
}

By decomposing the problem into multiple independent tasks for concurrent execution, higher CPU utilization and response sensitivity can be obtained.

3.2 - Callable and Future carrying task results (important)

The Executor framework uses Runnable as its basic task representation. However, Runnable has great limitations. Although the run method can save the execution result by writing to a log file or putting the result into a shared data structure, it cannot return a value or throw a checked exception . //Runnable does not have a return value

Many tasks actually have delayed calculations, such as executing database queries, obtaining resources from the network, or calculating a complex function. For these tasks, Callable is a better abstraction: it assumes that the main entry point (i.e. call) will return a value and possibly throw an exception . //The Future object returned by Callable can cancel unexecuted tasks

Both Runnable and Callable describe abstract computing tasks. These tasks are usually scoped, that is, have a clear starting point, and will eventually end . Tasks executed by Executors have 4 lifecycle phases: Create, Submit, Start , and Finish . Because some tasks may take a long time to execute, it is often desirable to be able to cancel these tasks. In the Executor framework, tasks that have been submitted but not yet started can be canceled, but those tasks that have already started can only be canceled if they can respond to interrupts. Canceling a completed task will have no effect. //The task can be submitted or canceled, and the execution task has a life cycle

Future represents the life cycle of a task , and provides corresponding methods to determine whether the task has been completed or canceled, as well as obtain the result of the task and cancel the task, etc. The implicit meaning contained in the Future specification is thatthe life cycle of a task can only go forward, not backward , just like the life cycle of ExecutorService. When a task completes, it stays in the "Completed" state forever. //The order of the task execution life cycle cannot be disturbed, it must be performed in the prescribed order

The behavior of the get method depends on the state of the task ( not yet started, running, completed) .

If the task has been completed, then get will return immediately or throw an Exception , if the task is not completed, then get will block until the task is completed. // The get method may get the result or throw an exception

If the task throws an exception, get encapsulates the exception as ExecutionException and rethrows it. If the task is canceled then get will throw CancellationException . If get throws ExecutionException , then getCause can be used to get the encapsulated initial exception.

//Future接口
public interface Future<V> {

    //尝试取消执行此任务。
    //如果任务已经完成或取消，或者由于其他原因无法取消，则此方法不起作用(返回false)。
    //    否则，如果在调用cancel时该任务尚未启动，则该任务不应运行。
    //如果任务已经启动，那么mayInterruptIfRunning参数决定是否中断正在执行的任务，
    //    true 进行中断，false允许程序执行完成。
    boolean cancel(boolean mayInterruptIfRunning);

    //如果此任务在正常完成之前被取消，则返回true。
    boolean isCancelled();

    //如果此任务完成，则返回true。
    //完成可能是由于正常终止、异常或取消——在所有这些情况下，此方法都将返回true。
    boolean isDone();

    //等待计算完成，然后检索其执行结果。
    V get() throws InterruptedException, ExecutionException;
    V get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException;
}

A Future can be created to describe a task in a number of ways . All submit methods in ExecutorService will return a Future to submit a Runnable or Callable to Executor, and get a Future to get the execution result of the task or cancel the task.

You can also explicitly instantiate a FutureTask for a given Runnable or Callable . Since FutureTask implements Runnable, it can be submitted to Executor for execution or its run method can be called directly.

3.3 - Using Future to implement page renderer

In order to make the page renderer achieve higher concurrency, the rendering process is first decomposed into two tasks, one is to render all the text, and the other is to download all the images . Because one of the tasks is CPU-intensive and the other is I/O-intensive, this approach improves performance even on single-CPU systems. //Idea 1: Separate I/O-intensive tasks from CPU-intensive tasks

import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

/**
 * 使用Future的渲染器
 */
public abstract class FutureRenderer {

    private final ExecutorService executor = Executors.newCachedThreadPool();

    void renderPage(CharSequence source) {
        //1-获取图片路径信息
        final List<ImageInfo> imageInfos = scanForImageInfo(source);
        //2-下载图片任务-> I/O密集型
        Callable<List<ImageData>> task = () -> {
            List<ImageData> result = new ArrayList<>();
            for (ImageInfo imageInfo : imageInfos) {
                //下载图片资源
                result.add(imageInfo.downloadImage());
            }
            return result;
        };
        //3-使用线程池执行下载图片任务
        Future<List<ImageData>> future = executor.submit(task);
        //4-加载文本数据-> CPU密集型
        renderText(source);
        //5-加载图片数据
        try {
            //TODO:同步获取所有结果，线程阻塞
            List<ImageData> imageData = future.get();
            for (ImageData data : imageData) {
                renderImage(data);
            }
        } catch (InterruptedException e) {
            // 重新设置线程的中断标记
            Thread.currentThread().interrupt();
            // 我们不需要这个结果，所以也取消这个任务
            future.cancel(true);
        } catch (ExecutionException e) {
            throw launderThrowable(e.getCause());
        }
    }

    interface ImageData {

    }

    interface ImageInfo {

        ImageData downloadImage();
    }

    abstract void renderText(CharSequence s);

    abstract List<ImageInfo> scanForImageInfo(CharSequence s);

    abstract void renderImage(ImageData i);
}

FutureRenderer allows the task of rendering text to be executed concurrently with the task of downloading image data. When all images are downloaded, they will be displayed on the page. This will improve the user experience, not only allow the user to see the results faster, but also make efficient use of parallelism, but we can do better. Instead of waiting for all the images to download, the user would like to see an image displayed as soon as it is downloaded . //It is required not to wait until all the results come out before rendering, but to render one by one when they come out

3.5 - CompletionService：Executor 与 BlockingQueue

If you submit a set of calculation tasks to Executor and want to get the results after the calculation is completed, you can keep the Future associated with each task, then use the get method repeatedly, and specify the parameter timeout as 0, so as to judge by polling Whether the task is completed. This method, while feasible, is somewhat cumbersome. Fortunately, there's a better way: a completion service ( CompletionService ).

CompletionService combines the functionality of Executor and BlockingQueue. You can submit Callable tasks to it for execution, and then use methods similar to queue operations like take and poll to get completed results, and these results will be encapsulated as Future when completed. ExecutorCompletionService implements CmpletionService and delegates the calculation part to an Executor.

The implementation of ExecutorCompletionService is very simple. Create a BlockingQueue in the constructor to save the results of the calculation. When the calculation is complete, call the done method in the FutureTask. When submitting a task, the task will first be packaged as a QueueingFuture, which is a subclass of FutureTask, and then rewrite the done method of the subclass, and put the result into BlockingQueue.

Implement a page renderer using CompletionService:

import java.util.List;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;

/**
 * 使用 CompletionService 实现页面渲染器
 */
public abstract class Renderer {

    private final ExecutorService executor;

    Renderer(ExecutorService executor) {
        this.executor = executor;
    }

    void renderPage(CharSequence source) {
        final List<ImageInfo> info = scanForImageInfo(source);
        //1-使用CompletionService执行任务
        CompletionService<ImageData> completionService = new ExecutorCompletionService<>(executor);
        for (final ImageInfo imageInfo : info) {
            completionService.submit(() -> imageInfo.downloadImage());
        }

        renderText(source);

        try {
            //2-从CompletionService中获取执行任务的结果，遍历次数为提交任务的数量
            for (int t = 0, n = info.size(); t < n; t++) {
                Future<ImageData> f = completionService.take();
                ImageData imageData = f.get();
                renderImage(imageData);
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        } catch (ExecutionException e) {
            throw launderThrowable(e.getCause());
        }
    }

    interface ImageData {

    }

    interface ImageInfo {

        ImageData downloadImage();
    }

    abstract void renderText(CharSequence s);

    abstract List<ImageInfo> scanForImageInfo(CharSequence s);

    abstract void renderImage(ImageData i);
}

The performance of the page dyer is improved in two ways through CompletionService: reducing the total running time and improving responsiveness. Create a separate task for each image download and execute them in the thread pool, thus converting the serial download process into a parallel process, which will reduce the total time to download all images. Additionally, by fetching the results from the CompletionService and having each image displayed as soon as it is downloaded, the user gets a more dynamic and responsive user interface.

So far, the full text is over.