Production Practice - Thread Pool and Asynchronous Task Orchestration

Most of the servers we use today are multi-processor and multi-core configurations with sufficient resources. In order to make full use of the server performance , decoupling the calling thread and the asynchronous thread and improving the response speed , using concurrent programming has become our better choice. This article will explain how to open the thread pool with the example of file uploading in the JDKprovided thread pool.

1. Introduction to the thread pool

The core implementation class of the thread pool provided in the JDK is ThreadPoolExecutor, use IDEA show Diagrams to view the class inheritance relationship as follows:

1649818738(1).png

  • The top-level interface Executoronly provides one void execute(Runnable command)method to decouple task definition and task execution, and users only need to define Runnabletasks.
  • ExecutorServiceThe interface inherits the Executorinterface. On the basis of task execution, it adds <T> Future<T> submit(Callable<T> task)methods with return, and management functions such as batch execution of asynchronous tasks and thread pool start and stop .
  • AbstractExecutorServiceIt is realized that ExecutorService, as a task template, the process of task execution is connected in series , so that the lower-level implementation class only needs to focus on task execution.
  • ThreadPoolExecutorIt implements functions such as task management , thread management , and thread pool life cycle management .

2. Task execution process

Next, let's look at the default execution flow of the thread pool through the source code:

...
// 获取ctl参数,高3位表示运行状态,低29位表示工作线程数
int c = ctl.get();
// 工作线程数小于核心线程数,尝试创建线程
if (workerCountOf(c) < corePoolSize) {
        // 线程数和运行状态符合预期,新增工作线程
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
   //工作线程数大于等于核心线程数,检查运行状态并尝试进入任务队列
   if (isRunning(c) && workQueue.offer(command)) {
       int recheck = ctl.get();
       // 再次检查运行状态,如果状态异常(如执行shutdownNow),则移除任务并回调拒绝策略。
       if (!isRunning(recheck) && remove(command))
           // 执行拒绝策略
           reject(command);
       // 如果工作线程为0, 则初始化一个工作线程。
       // 极限情况,刚入队时,线程都被回收。
       else if (workerCountOf(recheck) == 0)
           // 新增线程
           addWorker(null, false);
   }
    // 运行线程数大于等于核心线程数且队列已满尝试新增线程
   else if (!addWorker(command, false))
       // 新增失败执行拒绝策略
       reject(command);
}
复制代码

flow chart:

image.png

当然我们这里介绍的是线程池默认的执行流程,这类流程适合CPU密集型应用,目前也有不少中间件基于ThreadPoolExecutor进行二次开发。例如TomcatNettyDubbo等都有相应的实现,tomcat将执行流程改为,先将线程数提升到最大线程再进入队列,从而减少IO密集型应用阻塞时的资源浪费

二、自定义线程池

2.1 线程池创建

JDK本身提供一些开箱即用的线程池,如FixedThreadPoolCachedThreadPool等,但参数设定固定且部分线程池使用无界队列,在系统并发量过高或程序设计出现缺陷时,极容易导致内存溢出(out of memory)或其他一些不可预知的异常。

这里我们使用ThreadPoolExecutor如下构造函数进行线程池的创建。

public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory,
                          RejectedExecutionHandler handler)
复制代码

线程池创建代码如下:

/**
 * @author winsonWu
 * @Description: thread pool creating configuration
 * @date Date : 2021.04.13 16:00
 */
@Configuration
public class ThreadPoolCreator {

    /**
     * 核心线程数
     */
    private static int corePoolSize = Runtime.getRuntime().availableProcessors() + 1;

    /**
     * 最大线程数 避免内存交换 设置为核心核心线程数
     */
    private static int maximumPoolSize = corePoolSize;

    /**
     * 最大空闲时间
     */
    private static long keepAliveTime = 3;

    /**
     * 最大空闲时间单位
     */
    private static TimeUnit unit = TimeUnit.MINUTES;

    /**
     * 使用有界队列,避免内存溢出
     */
    private static BlockingQueue<Runnable> workQueue = new LinkedBlockingDeque<>(500);

    /**
     * 线程工厂,这里我们使用可命名的线程工厂,方便业务区分以及生产问题排查。
     */
    private static ThreadFactory threadFactory = new NamedThreadFactory("taskResolver");

    /**
     * 拒绝策略 根据业务选择或者自定义
     */
    private static RejectedExecutionHandler handler = new ThreadPoolExecutor.AbortPolicy();

    @Bean
    public ThreadPoolExecutor threadPoolExecutor(){
        return new ThreadPoolExecutor(
                corePoolSize,
                maximumPoolSize,
                keepAliveTime, unit,
                workQueue,
                threadFactory,
                handler);
    }
}
复制代码

2.2 核心线程数配置

并发任务一般分为CPU密集型任务IO密集型任务两类。

CPU密集型任务,需要CPU进行复杂、高密度的运算。这种类型的任务不能创建过多的线程,否则将会频繁引起上文切换,降低资源使用率,降低任务处理速度; IO密集型任务,线程则不会对CPU资源要求过于苛刻,可能大部分时间阻塞在IO,增加线程数量可以提高并发度,尽可能多处理任务。一般经验化配置:

CPU密集型 N + 1 但尽量不超过操作系统核数2倍
IO密集型 2N + 1 
N为服务器核数。
复制代码

生产环境,建议具体设置根据压测结果决定。

2.3 阻塞队列

阻塞队列(BlockingQueue) 在队列为空时,获取元素的线程会阻塞,等待队列变为非空。当队列满时,存储元素的线程会阻塞,等待队列被获取消费,天然支持线程池这类生产消费者模型。常见阻塞队列如下: image.png 大多数场景,我们使用LinkedBlockingQueue即可解决,这里我们也选择使用LinkedBlockingQueue

2.4 拒绝策略

默认情况下,线程池阻塞队列已满且线程池已达到最大线程数,会执行拒绝策略,JDK也为我们内置了四种拒绝策略:

image.png 我们这里以两个场景举例说明一下,

  • 场景一:系统监控。通过自定义注解AOP解析需要监控的接口并获取其出入参,然后通过异步任务将日志存储到HDFS。

    日志采集这种场景,少量的数据丢失对业务影响并不大,因而我们可以配置核心线程数为1,以降低对服务器资源的占用,阻塞队列容量可以根据压测结果适当增加,拒绝策略则使用DiscardPolicyAbortPolicy,但要注意,如果使用AbortPolicyexecute(...)方法使用Try CatchUncaughtExceptionHandler(不推荐) 进行异常处理,submit(...)方法使用Future.get()获取异常进行处理。

  • 场景二:消息队列消费。消息堆积可以通过异步线程进行批量处理。

    这种场景,数据不能丢失,因此我们采用CallerRunsPolicy让调用线程执行消息处理逻辑。但这种方式会对调用线程执行的业务产生影响,更好的方式可以采用自定义拒绝策略进行持久化或者放入队列。我们先看下CallerRunsPolicy拒绝策略的实现:

    public static class CallerRunsPolicy implements RejectedExecutionHandler {
       /**
        * Creates a {@code CallerRunsPolicy}.
        */
       public CallerRunsPolicy() { }
    
       /**
        * Executes task r in the caller's thread, unless the executor
        * has been shut down, in which case the task is discarded.
        *
        * @param r the runnable task requested to be executed
        * @param e the executor attempting to execute this task
        */
       public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
           if (!e.isShutdown()) {
               r.run();
           }
       }
    }
    复制代码

    可以看到我们自定义拒绝策略,只需要实现RejectedExecutionHandler接口,并覆写rejectedExecution(Runnable r, ThreadPoolExecutor e),自定义示例如下:

    /**
     * @Author winsonWu
     * @Description: 持久化拒绝策略
     * @date Date : 2022.04.14 9:38
     **/
    public class DataBaseStoragePolicy implements RejectedExecutionHandler {
        @Override
        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
            // todo duration or something
        }
    }
    复制代码

2.5 线程预热与回收

  • 线程预热

    如果可预见服务器启动后就会产生大量请求,如拒绝策略模块中提到的堆积消息处理的场景,我们可以使用线程池预热,提前创建核心线程,以提升服务相应速度。在ThreadPoolExecutor中有三个方法:

    // 启动一个核心线程
    public boolean prestartCoreThread() {
       return workerCountOf(ctl.get()) < corePoolSize &&
           addWorker(null, true);
    }
    复制代码
    // 启动所有核心线程
    public int prestartAllCoreThreads() {
        int n = 0;
        while (addWorker(null, true))
            ++n;
        return n;
    }
    复制代码
    // 保证至少一个核心线程启动
    void ensurePrestart() {
        int wc = workerCountOf(ctl.get());
        if (wc < corePoolSize)
            addWorker(null, true);
        else if (wc == 0)
            addWorker(null, false);
    }
    复制代码
  • 线程回收

    默认情况下,当workerCount大于corePoolSize的时候,空闲线程的空闲时间超过了keepAliveTime所设置的时间,线程池就会自动回收该线程,另外核心线程数如果设置allowCoreThreadTimeOut参数,也同样可以被回收,以提高资源使用率。

2.6 线程池监控

ThreadPoolExecutor自身提供了一些状态查询方法,可以获取一些线程池状态信息,我们修改前面的Bean定义来看一下,相关方法在代码注释中已经写出:

...
@Bean
public ThreadPoolExecutor threadPoolExecutor(){
    return new ThreadPoolExecutor(
            corePoolSize,
            maximumPoolSize,
            keepAliveTime, unit,
            workQueue,
            threadFactory,
            handler){

        // 设定任务前执行动作
        @Override
        protected void beforeExecute(Thread t, Runnable r) {
            // 获取线程池大小
            System.out.println("线程池大小:" + this.getPoolSize());
            // 获取核心线程数
            System.out.println("核心线程数:" + this.getCorePoolSize());
            // 获取最大线程数
            System.out.println("最大线程数:" + this.getLargestPoolSize());
            // 获取活跃线程数
            System.out.println("活跃线程数:" + this.getActiveCount());
        }

        // 设定任务后执行动作
        @Override
        protected void afterExecute(Runnable r, Throwable t) {
            // 获取活跃线程数
            System.out.println("活跃线程数:" + this.getActiveCount());
            // 获取任务数
            System.out.println("任务数:" + this.getTaskCount());
        }

        // 设定线程池终止执行动作
        @Override
        protected void terminated() {
            // 获取已完成任务数
            System.out.println("已完成任务数:" + this.getCompletedTaskCount());
        }
    };
}
复制代码

2.7 线程池生命周期

线程池主要有五种状态,代码定义如下:

private static final int RUNNING    = -1 << COUNT_BITS;
private static final int SHUTDOWN   =  0 << COUNT_BITS;
private static final int STOP       =  1 << COUNT_BITS;
private static final int TIDYING    =  2 << COUNT_BITS;
private static final int TERMINATED =  3 << COUNT_BITS;
复制代码

状态转换如下:

image.png 这里需要注意的是我们执行shutdown()方法后,不再接收新任务,但会处理阻塞队列中剩余的任务,而shutdownNow()方法,不再接收新任务的同时也会中断阻塞队列中剩余的任务。

三、线程池实践

3.1 基础实践

前面我们已经定义好了线程池,我们先来尝试下基础方法的使用:

@Resource
private ThreadPoolExecutor threadPoolExecutor;

@Test
public void testMultiThread() throws InterruptedException {
    // 线程池预热,提前启动所有核心线程
    threadPoolExecutor.prestartAllCoreThreads();
    StopWatch stopwatch = new StopWatch("线程池测试");
    stopwatch.start("execute");
    // execute(e(Runnable command)
    CountDownLatch forExecute = new CountDownLatch(1);
    threadPoolExecutor.execute(() -> {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            System.out.println("interrupted ignore");
        }
        System.out.println("execute(Runnable command) test");
        forExecute.countDown();
    });
    forExecute.await();
    stopwatch.stop();
    stopwatch.start("submit");
    // submit(Runnable command)
    CountDownLatch forSubmit = new CountDownLatch(1);
    final Future<String> future = threadPoolExecutor.submit(() -> {
        System.out.println("submit(Runnable command) test");
        forSubmit.countDown();
        return "submit(Runnable command) test";
        });
    try {
        final String result = future.get();
        System.out.println("result: " + result);
    } catch (ExecutionException e) {
        //todo 自定义异常处理
    }
    forSubmit.await();
    stopwatch.stop();
    System.out.println(stopwatch.prettyPrint());
}
复制代码

执行结果如下:

1649913889(1).png

execute(e(Runnable command)The difference between and has been mentioned before submit(Runnable command), and will not be repeated here. In addition, we can see that we use CountDownLatch to perform thread coordination. After the execution of execute is completed, submit begins to execute.

3.2 File upload practice

Next, we use a file upload function to demonstrate the use CompletableFutureof asynchronous task scheduling. The description of the business we want to implement is as follows:

  1. Implement batch file upload
  2. Returns a list of file names and file IDs after the file upload is complete
  3. When an exception occurs, print the log to the console (demo, the production environment can be customized)

The implementation code is as follows:

Define the return object:

@Data
public class FileEntry implements Serializable {

    /**
     * 文件ID
     */
    private String fileId;

    /**
     * 文件名
     */
    private String fileName;

}
复制代码

File upload logic:

/**
 * 测试代码,这里我们直接上传到固定目录
 * @param eachFile
 * @return
 */
private FileEntry createFileEntry(MultipartFile eachFile){
    // 生成文件ID
    String fileId = UUID.randomUUID().toString().replace("-", "");
    File desFile = new File(FILE_LOCATION   + fileId + "_" + eachFile.getOriginalFilename());
    try {
        eachFile.transferTo(desFile);
    } catch (IOException e) {
        throw new BizException("文件上传失败");
    }
    // 文件上传成功,构建返回参数
    FileEntry fileEntry = new FileEntry();
    fileEntry.setFileName(eachFile.getOriginalFilename());
    fileEntry.setFileId(fileId);
    return fileEntry;
}
复制代码

Main logic:

/**
 * 文件上传
 * @param files
 */
public ArrayList<FileEntry> uploadFile(MultipartFile[] files){
    // 初始化返回值
    ArrayList<FileEntry> fileEntryList = new ArrayList<>(files.length);
    List<CompletableFuture<FileEntry>> futureList = new ArrayList<>(files.length);
    for (MultipartFile eachFile : files){
        // 使用之前定义的线程池执行文件上传逻辑
        CompletableFuture<FileEntry> future = CompletableFuture.supplyAsync(() -> 
        createFileEntry(eachFile), threadPoolExecutor);
        // 添加到future列表
        futureList.add(future);
    }
    CompletableFuture<Void> fileUploadFuture = CompletableFuture
            .allOf(futureList.toArray(new CompletableFuture[futureList.size()]))
            .whenComplete((v, t) -> futureList.forEach(future -> {
                // 添加返回结果到返回值列表
                fileEntryList.add(future.getNow(null));
            }))
            .exceptionally(exception -> {
                // todo 自定义逻辑
                System.out.println("error occurred:" + exception.getMessage());
                return null;
            });
    // 阻塞主线程,等待文件全部上传
    fileUploadFuture.join();
    // 返回entry列表
    return fileEntryList;
}
复制代码

So far, we have completed the simplified logic implementation, and the CompletableFutureusage can be found in the References section.

4. References

Guess you like

Origin juejin.im/post/7086351322944913438