Talking about a Java thread pool parameter that almost caused an online accident

I. Introduction

Recently, the refactored Dubbo service thread pool was tuned, and the worker thread used the CachedThreadPool thread strategy. However, after going online, the thread pool increased all the way, which almost caused an online accident.

89d01d99f0a4a06dca7fc71cc65373b3.png

So this article uncovers the mystery of the thread pool.

2. Introduction to Dubbo thread pool

CachedThreadPool source code in Dubbo

package org.apache.dubbo.common.threadpool.support.cached;

import org.apache.dubbo.common.URL;
import org.apache.dubbo.common.threadlocal.NamedInternalThreadFactory;
import org.apache.dubbo.common.threadpool.ThreadPool;
import org.apache.dubbo.common.threadpool.support.AbortPolicyWithReport;

import java.util.concurrent.Executor;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.SynchronousQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import static org.apache.dubbo.common.constants.CommonConstants.ALIVE_KEY;
import static org.apache.dubbo.common.constants.CommonConstants.CORE_THREADS_KEY;
import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_ALIVE;
import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_CORE_THREADS;
import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_QUEUES;
import static org.apache.dubbo.common.constants.CommonConstants.DEFAULT_THREAD_NAME;
import static org.apache.dubbo.common.constants.CommonConstants.QUEUES_KEY;
import static org.apache.dubbo.common.constants.CommonConstants.THREADS_KEY;
import static org.apache.dubbo.common.constants.CommonConstants.THREAD_NAME_KEY;

/**
 * This thread pool is self-tuned. Thread will be recycled after idle for one minute, and new thread will be created for
 * the upcoming request.
 *
 * @see java.util.concurrent.Executors#newCachedThreadPool()
 */
public class CachedThreadPool implements ThreadPool {

    @Override
    public Executor getExecutor(URL url) {
        //1 获取线程名称前缀 如果没有 默认是Dubbo
        String name = url.getParameter(THREAD_NAME_KEY, DEFAULT_THREAD_NAME);
        //2. 获取线程池核心线程数大小
        int cores = url.getParameter(CORE_THREADS_KEY, DEFAULT_CORE_THREADS);
        //3. 获取线程池最大线程数大小,默认整型最大值
        int threads = url.getParameter(THREADS_KEY, Integer.MAX_VALUE);
        //4. 获取线程池队列大小
        int queues = url.getParameter(QUEUES_KEY, DEFAULT_QUEUES);
        //5. 获取线程池多长时间被回收 单位毫秒
        int alive = url.getParameter(ALIVE_KEY, DEFAULT_ALIVE);
        //6. 使用JUC包里的ThreadPoolExecutor创建线程池
        return new ThreadPoolExecutor(cores, threads, alive, TimeUnit.MILLISECONDS,
                queues == 0 ? new SynchronousQueue<Runnable>() :
                        (queues < 0 ? new LinkedBlockingQueue<Runnable>()
                                : new LinkedBlockingQueue<Runnable>(queues)),
                new NamedInternalThreadFactory(name, true), new AbortPolicyWithReport(name, url));
    }
}

It can be seen that Dubbo essentially uses the ThreadPoolExecutor in the JUC package to create a thread pool. The source code is as follows

public ThreadPoolExecutor(int corePoolSize,
                              int maximumPoolSize,
                              long keepAliveTime,
                              TimeUnit unit,
                              BlockingQueue<Runnable> workQueue,
                              ThreadFactory threadFactory,
                              RejectedExecutionHandler handler) {
        if (corePoolSize < 0 ||
            maximumPoolSize <= 0 ||
            maximumPoolSize < corePoolSize ||
            keepAliveTime < 0)
            throw new IllegalArgumentException();
        if (workQueue == null || threadFactory == null || handler == null)
            throw new NullPointerException();
        this.acc = System.getSecurityManager() == null ?
                null :
                AccessController.getContext();
        this.corePoolSize = corePoolSize;
        this.maximumPoolSize = maximumPoolSize;
        this.workQueue = workQueue;
        this.keepAliveTime = unit.toNanos(keepAliveTime);
        this.threadFactory = threadFactory;
        this.handler = handler;
    }

The general flow chart is as follows:

81889daa32d888b5e8358489a8c00d17.png

1. When the thread pool is smaller corePoolSize, the new task will create a new thread, even if there are idle threads in the thread pool at this time.

2. When the thread pool is reached corePoolSize, the newly submitted task will be put into the queue workQueue, waiting for the thread pool task to be scheduled for execution.

3. When workQueueit is full and maximumPoolSize> corePoolSize, the new task will create a new thread to execute the task.

4. When the number of submitted tasks exceeds maximumPoolSize, the newly submitted tasks will be RejectedExecutionHandlerprocessed by .

5. When the thread pool is exceeded corePoolSize, keepAliveTimewhen the idle time is reached, close the idle thread.

In addition, when set allowCoreThreadTimeOut(true), corePoolSizethe idle time of the thread in the thread pool keepAliveTimewill also be closed.

RejectedExecutionHandler provides four rejection strategies by default

1. AbortPolicy strategy: This strategy will directly throw an exception to prevent the system from working normally;

2. CallerRunsPolicy policy: If the number of threads in the thread pool reaches the upper limit, this policy will put the tasks in the task queue to run in the caller thread;

3. DiscardOledestPolicy policy: This policy discards the oldest task in the task queue, that is, the task that is added first in the current task queue and is about to be executed, and tries to submit it again.

4. DiscardPolicy policy: This policy will silently discard tasks that cannot be processed without any processing. Of course, using this strategy, the loss of tasks needs to be allowed in business scenarios;

It is worth noting that the rejection policy AbortPolicyWithReport in Dubbo actually inherits the ThreadPoolExecutor.AbortPolicy policy, mainly printing some key information and stack information.

3. About thread pool configuration

Thread pool configuration is very important, but it is often easily overlooked. If the configuration is unreasonable or the number of thread pool reuses is small, accounts will still be created and canceled frequently.

  1. How to reasonably calculate the number of core threads?

We can calculate through the average response time of the interface and the QPS that the service needs to support. For example: the average RT of our interface is 0.005s, then one worker thread can handle 200 tasks. If a single machine needs to support QPS 3W, then it can be calculated that the number of core threads required is 150

That is the formula: QPS ➗ (1 ➗ average RT) = QPS * RT

  1. The easily overlooked @Async annotation

The default thread pool using @Async annotation in Spring is SimpleAsyncTaskExecutor. By default, if there is no configuration, the thread pool is not used, because it will recreate a new thread every time and will not be reused.

So remember, if you use @Async, you must configure it.

@EnableAsync
@Configuration
@Slf4j
public class ThreadPoolConfig {
    private static final int corePoolSize = 100;             // 核心线程数(默认线程数)
    private static final int maxPoolSize = 400;             // 最大线程数
    private static final int keepAliveTime = 60;            // 允许线程空闲时间(单位:默认为秒)
    private static final int queueCapacity = 0;         // 缓冲队列数
    private static final String threadNamePrefix = "Async-Service-"; // 线程池名前缀

    @Bean("taskExecutor") 
    public ThreadPoolTaskExecutor getAsyncExecutor(){
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(corePoolSize);
        executor.setMaxPoolSize(maxPoolSize);
        executor.setQueueCapacity(queueCapacity);
        executor.setKeepAliveSeconds(keepAliveTime);
        executor.setThreadNamePrefix(threadNamePrefix);

        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.AbortPolicy());
        // 初始化
        executor.initialize();
        return executor;
    }
}

4. How did the thread pool soar?

We configure the Dubbo server worker thread as follows:

corethreads: 150
threads: 800
threadpool: cached
queues: 10

Does it seem reasonable? Setting a small number of queues is to prevent short-term thread pool shortage caused by jitter. From the above, it seems that there is no problem. In terms of daytime business volume, the number of core threads is completely sufficient (RT<5ms, QPS<1w). However, after going online, the thread pool has soared all the way, reaching the maximum threshold value of 800, and the alarm information is as follows:

org.apache.dubbo.remoting.RemotingException("Server side(IP,20880) thread pool is exhausted, detail msg:Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-IP:20880, Pool Size: 800 (active: 4, core: 300, max: 800, largest: 800), Task: 4101304 (completed: 4101301), Executor status:(isShutdown:false, isTerminated:false, isTerminating:false), in dubbo://IP:20880!"

It can be seen from the above that when the maximum number of threads is reached, the number of active threads is very small, which is completely unexpected.

5. Scene simulation

by source code

queues == 0 ? new SynchronousQueue<Runnable>() :
                        (queues < 0 ? new LinkedBlockingQueue<Runnable>()
                                : new LinkedBlockingQueue<Runnable>(queues))

It can be seen that:

When the queue element is 0, the blocking queue uses SynchronousQueue; when the queue element is less than 0, the unbounded blocking queue LinkedBlockingQueue is used; when the queue element is greater than 0, the bounded queue LinkedBlockingQueue is used.

The number of core threads and the maximum number of threads will definitely not be a problem, so I guess whether there is a problem with the number of queues.

In order to reproduce, I wrote a simple code simulation

package com.bytearch.fast.cloud;

import java.util.concurrent.*;

public class TestThreadPool {

    public final static int queueSize = 10;
    public static void main(String[] args) {
        ExecutorService executorService = getThreadPool(queueSize);
        for (int i = 0; i < 100000; i++) {
            int finalI = i;

            try {
                executorService.execute(new Runnable() {
                    @Override
                    public void run() {
                        doSomething(finalI);
                    }
                });
            } catch (Exception e) {
                System.out.println("emsg:" + e.getMessage());
            }
            if (i % 20 == 0) {
                try {
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }

        System.out.println("all done!");
        try {
            Thread.sleep(1000000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

    public static ExecutorService getThreadPool(int queues) {
        int cores = 150;
        int threads = 800;
        int alive = 60 * 1000;


        return new ThreadPoolExecutor(cores, threads, alive, TimeUnit.MILLISECONDS,
                queues == 0 ? new SynchronousQueue<Runnable>() :
                        (queues < 0 ? new LinkedBlockingQueue<Runnable>()
                                : new LinkedBlockingQueue<Runnable>(queues)));
    }

    public static void doSomething(final int i) {
        try {
            Thread.sleep(5);
            System.out.println("thread:" + Thread.currentThread().getName() +  ", active:" + Thread.activeCount() + ", do:" + i);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

Simulation results:

queueSize value Phenomenon
0 no exception
10 A rejection exception occurred
100 no exception

The exception is as follows:

emsg:Task com.bytearch.fast.cloud.TestThreadPool$1@733aa9d8 rejected from java.util.concurrent.ThreadPoolExecutor@6615435c[Running, pool size = 800, active threads = 32, queued tasks = 9, completed tasks = 89755]
all done!

Obviously, when the concurrency is high, when the LinkedBlockingQueue bounded queue is used, and the number of queues is relatively small, the thread pool will have problems.

Change the queues configuration to 0 and go online, and it will return to normal.

As for the deeper reasons, interested students can analyze it in depth, or communicate with me on the background of the official account.

6. Summary

This time, I shared the basic principle of the thread pool ThreadPoolExecutor, the calculation method of the thread pool configuration, and the easily overlooked @Async configuration problem.

In addition, we introduce the weird problems we encountered when using the thread pool, a parameter problem, which may lead to unpredictable consequences.

Hope the above sharing is helpful to you.

Guess you like

Origin blog.csdn.net/weixin_38130500/article/details/120359848