Java threads - how to use the Java thread pool correctly

Java frameworks such as Tomcat and Dubbo are inseparable from the thread pool. Where these frameworks use threads, the thread pool will be responsible. When we use these frameworks, we will set thread pool parameters to improve performance. So how many threads are appropriate? Today we will learn about thread pool around this issue.

Why use a thread pool

Usually when we use java threads, we directly create a Thread object. The creation and destruction of java threads will involve the creation and destruction of Thread objects, thread switching and other issues. To create a Thread object is just to allocate a block of memory in the JVM heap; to create a thread, you need to call the API of the operating system kernel, and then the operating system has to allocate a series of resources for the thread, which is very expensive. So thread is a heavyweight object, frequent creation and destruction should be avoided.

Generally, the above problems can be solved through the idea of "pooling", and the thread pool implementation provided in JDK is based on ThreadPoolExecutor.

Using a thread pool can bring a series of benefits:

Reduce resource consumption : reuse created threads through pooling technology to reduce the loss caused by thread creation and destruction.
Improved responsiveness : When a task arrives, it executes immediately without waiting for thread creation.
Improve the manageability of threads : Threads are scarce resources. If they are created without limit, it will not only consume system resources, but also lead to unbalanced resource scheduling due to unreasonable distribution of threads and reduce system stability. Use the thread pool for unified allocation, tuning, and monitoring.
Provide more and more powerful functions : The thread pool is scalable, allowing developers to add more functions to it. For example, the delayed timing thread pool ScheduledThreadPoolExecutor allows tasks to be deferred or periodically executed.

Thread pool core design and implementation

overall design

The top-level interface is Executor, java.util.concurrent.Executor#execute, the user only needs to provide the Runnable object, submit the running logic of the task to the executor (Executor), and the Executor framework will complete the allocation of threads and the execution of the task.
The ExecutorService interface extends Executor and adds some capabilities:
- Expand the ability to execute tasks, by calling the submit() or invokeAll() method to generate a Future method for one or a batch of asynchronous tasks;
- Provides methods to manage and control the thread pool, such as calling shutdown() and other methods to stop the thread pool from running.
AbstractExecutorService is an abstract class of the upper layer, which connects the process of executing tasks in series, ensuring that the implementation of the lower layer only needs to focus on one method of executing tasks.
The specific implementation class is ThreadPoolExecutor. ThreadPoolExecutor will maintain its own life cycle on the one hand, and manage threads and tasks on the other hand, so that the two can be well combined to execute parallel tasks.
ScheduledThreadPoolExecutor extends the ThreadPoolExecutor and ScheduledExecutorService interfaces, adding scheduling capabilities so that tasks can be delayed and scheduled.
There is also a class Executors that provides a factory method for thread pool creation to create a thread pool.

This chapter mainly explains the implementation principle of ThreadPoolExecutor, which will be discussed in the next chapter of ScheduledThreadPoolExecutor.

Implementation principle of ThreadPoolExecutor

Description of ThreadPoolExecutor construction parameters

ThreadPoolExecutor(
  int corePoolSize,
  int maximumPoolSize,
  long keepAliveTime,
  TimeUnit unit,
  BlockingQueue<Runnable> workQueue,
  ThreadFactory threadFactory,
  RejectedExecutionHandler handler)

corePoolSize : Indicates the minimum number of threads held by the thread pool. The number of core threads, once created, these core threads will not be destroyed. On the contrary, if it is a non-core thread, it will be destroyed after the task is executed and has not been used for a long time.
maximumPoolSize : Indicates the maximum number of threads created by the thread pool.
keepAliveTime&unit : If a thread does not perform tasks for a period of time, it means that it is very idle. keepAliveTime and unit are parameters used to define this period of time. That is to say, if the thread has been idle for keepAliveTime and unit for so long, and the number of threads is greater than corePoolSize, then this idle thread will be recycled.
workQueue : Used to store tasks. When a new task requests thread processing, if the core thread pool is full, the new task will be added to the workQueue queue, which is a blocking queue.
threadFactory : Through this parameter, you can customize how to create threads.
handler : Through this parameter, the rejection strategy of the task can be customized. If all the threads in the thread pool are busy and the work queue is full (provided that the work queue is a bounded queue), then if the task is submitted at this time, the thread pool will refuse to accept it. As for the rejection strategy, you can specify it through this parameter
ThreadPoolExecutor already provides four strategies.
CallerRunsPolicy: The thread that submits the task executes the task by itself.
AbortPolicy: The default rejection policy will throws RejectedExecutionException.
DiscardPolicy: Discard tasks directly without any abnormal output.
DiscardOldestPolicy: Discarding the oldest task is actually discarding the task that first entered the work queue, and then adding new tasks to the work queue.

ThreadPoolExecutor execution process

public void execute(Runnable command) {
    if (command == null)
        throw new NullPointerException();
    int c = ctl.get();
    if (workerCountOf(c) < corePoolSize) {
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
    if (isRunning(c) && workQueue.offer(command)) {
        int recheck = ctl.get();
        if (! isRunning(recheck) && remove(command))
            reject(command);
        else if (workerCountOf(recheck) == 0)
            addWorker(null, false);
    }
    else if (!addWorker(command, false))
        reject(command);
}

First, check the running status of the thread pool. If it is not RUNNING, reject it directly. The thread pool must ensure that tasks are executed in the RUNNING state.
If workerCount < corePoolSize, create and start a thread to execute newly submitted tasks.
If workerCount >= corePoolSize, and the blocking queue in the thread pool is not full, add tasks to the blocking queue.
If workerCount >= corePoolSize && workerCount < maximumPoolSize, and the blocking queue in the thread pool is full, create and start a thread to execute the newly submitted task.
If workerCount >= maximumPoolSize, and the blocking queue in the thread pool is full, the task will be processed according to the rejection strategy, and the default processing method is to throw an exception directly.

thread pool running status

The running state of the thread pool is maintained internally by the thread pool. The AtomicInteger variable is used inside the thread pool to maintain the running state runState and the number of worker threads workerCount. The upper 3 bits save the runState, and the lower 29 bits save the workerCount. The two variables are mutually exclusive. interference. Using one variable to store two values can avoid inconsistencies when making relevant decisions, and it is not necessary to occupy lock resources in order to maintain the consistency between the two.

private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0));

// COUNT_BITS=29,（对于int长度为32来说）表示线程数量的字节位数
private static final int COUNT_BITS = Integer.SIZE - 3;
// 状态掩码，高三位是1，低29位全是0，可以通过 ctl&COUNT_MASK 运算来获取线程池状态
private static final int COUNT_MASK = (1 << COUNT_BITS) - 1;


private static final int RUNNING    = -1 << COUNT_BITS; // 111 00000 00000000 00000000 00000000;
private static final int SHUTDOWN   =  0 << COUNT_BITS; // 000 00000 00000000 00000000 00000000; 
private static final int STOP       =  1 << COUNT_BITS; // 001 00000 00000000 00000000 00000000;
private static final int TIDYING    =  2 << COUNT_BITS; // 010 00000 00000000 00000000 00000000;
private static final int TERMINATED =  3 << COUNT_BITS; // 011 00000 00000000 00000000 00000000;

// 计算当前运行状态
private static int runStateOf(int c)     { return c & ~COUNT_MASK; }
// 计算当前线程数量
private static int workerCountOf(int c)  { return c & COUNT_MASK; }
//通过状态和线程数生成ctl
private static int ctlOf(int rs, int wc) { return rs | wc; }

state	describe
RUNNING	Can accept new tasks and process tasks in the blocking queue
SHUTDOWN	Closed state, cannot accept new tasks, can only process tasks in the blocking queue
STOP	It cannot accept new tasks, nor can it process tasks in the blocking queue, which will interrupt the thread that is processing the task
TIDYING	All tasks are stopped, workerCount is 0
TERMINATED	After executing the terminated() method, it will enter this state

State transition:

blocking queue

When introducing the overall design of the thread pool, I said that the design of the thread pool adopts the producer-consumer mode, and its implementation is mainly realized through BlockingQueue. The purpose is to decouple tasks and threads and block queues. To cache tasks, worker threads get tasks from the blocking queue.

Using different queues can implement different task access strategies. Here, we can introduce the members of the blocking queue again:

blocking queue	describe
ArrayBlockingQueue	Bounded queue based on array, supports fair lock and unfair lock
LinkedBlockingQueue	A bounded queue based on a linked list, the queue size defaults to Integer.MAX_VALUE, so creating this queue by default will have a capacity risk
PriorityBlockingQueue	Unbounded queues that support priority sorting, the order of the same priority cannot be guaranteed
DelayQueue	The deferred queue based on PriorityBlockingQueue can only take out elements from it when the delay expires
SynchronousQueue	The synchronous queue does not store any elements, and calling put() once must wait for take() to be called. Support fair lock and unfair lock
LinkedTransferQueue	Unbounded queue based on linked list, with more transfer() and tryTransfer() methods
LinkedBlockingDeque	Based on the queue implemented by the doubly linked list, when multiple threads are concurrent, the lock competition can be reduced to half at most

Worker

Worker overall design

Worker inherits AQS and uses AQS to realize the function of exclusive lock. ReentrantLock is not used, but AQS is used to realize the non-reentrant feature to reflect the current execution state of the thread.
Worker implements the Runnable interface, holds a thread thread, and an initialized task firstTask. Thread is a thread created by ThreadFactory when calling the constructor, which can be used to perform tasks;

private final class Worker extends AbstractQueuedSynchronizer implements Runnable{
    final Thread thread;//Worker持有的线程
    Runnable firstTask;//初始化的任务，可以为null
  
    Worker(Runnable firstTask) {
      setState(-1); // inhibit interrupts until runWorker
      this.firstTask = firstTask;
      this.thread = getThreadFactory().newThread(this);
    }
  
    public void run() {
      runWorker(this);
    }
  
  // ...省略其余代码
}

How does Worker add tasks

private boolean addWorker(Runnable firstTask, boolean core) {
    retry:
    for (int c = ctl.get();;) {
        // Check if queue empty only if necessary.
        if (runStateAtLeast(c, SHUTDOWN)
            && (runStateAtLeast(c, STOP)
                || firstTask != null
                || workQueue.isEmpty()))
            return false;

        for (;;) {
            if (workerCountOf(c)
                >= ((core ? corePoolSize : maximumPoolSize) & COUNT_MASK))
                return false;
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateAtLeast(c, SHUTDOWN))
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

    boolean workerStarted = false;
    boolean workerAdded = false;
    Worker w = null;
    try {
        w = new Worker(firstTask);
        final Thread t = w.thread;
        if (t != null) {
            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                // Recheck while holding lock.
                // Back out on ThreadFactory failure or if
                // shut down before lock acquired.
                int c = ctl.get();

                if (isRunning(c) ||
                    (runStateLessThan(c, STOP) && firstTask == null)) {
                    if (t.getState() != Thread.State.NEW)
                        throw new IllegalThreadStateException();
                    workers.add(w);
                    workerAdded = true;
                    int s = workers.size();
                    if (s > largestPoolSize)
                        largestPoolSize = s;
                }
            } finally {
                mainLock.unlock();
            }
            if (workerAdded) {
                t.start();
                workerStarted = true;
            }
        }
    } finally {
        if (! workerStarted)
            addWorkerFailed(w);
    }
    return workerStarted;
}

The addWorker() method has two parameters:

firstTask uses it to hold the first task passed in, which can be null or null. If this value is non-null, then the thread will execute this task immediately at the beginning of startup, which corresponds to the situation when the core thread is created; if this value is null, then a thread needs to be created to execute the task in the workQueue, that is Creation of non-core threads.
If the core parameter is true, it will determine whether the current number of active threads is less than corePoolSize when adding a new thread, and false means that it needs to determine whether the number of current active threads is less than the maximumPoolSize before adding a new thread.

The specific process is as follows:

How Worker Gets Tasks

There are two possibilities for task execution: one is that the task is directly executed by the newly created thread. The other is that the thread obtains the task from the task queue and then executes it. After executing the task, the idle thread will apply for the task from the queue again and then execute it.

The first one is in the above addWorker() method, if firstTask is not empty, it will run directly. The second firstTask is empty, the task will be obtained from the workQueue, and the getTask() method will be called

private Runnable getTask() {
        boolean timedOut = false; // Did the last poll() time out?

        for (;;) {
            int c = ctl.get();
            // Check if queue empty only if necessary.
            if (runStateAtLeast(c, SHUTDOWN)
                && (runStateAtLeast(c, STOP) || workQueue.isEmpty())) {
                decrementWorkerCount();
                return null;
            }
            int wc = workerCountOf(c);
            // Are workers subject to culling?
            boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

            if ((wc > maximumPoolSize || (timed && timedOut))
                && (wc > 1 || workQueue.isEmpty())) {
                if (compareAndDecrementWorkerCount(c))
                    return null;
                continue;
            }
            try {
                Runnable r = timed ?
                    workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                    workQueue.take();
                if (r != null)
                    return r;
                timedOut = true;
            } catch (InterruptedException retry) {
                timedOut = false;
            }
        }
    }

specific process:

How Workers Run Tasks

// java.util.concurrent.ThreadPoolExecutor#runWorker
final void runWorker(Worker w) {
  Thread wt = Thread.currentThread();
  Runnable task = w.firstTask;
  w.firstTask = null;
  w.unlock(); // allow interrupts
  boolean completedAbruptly = true;
  try {
    while (task != null || (task = getTask()) != null) {
      w.lock();
      // If pool is stopping, ensure thread is interrupted;
      // if not, ensure thread is not interrupted.  This
      // requires a recheck in second case to deal with
      // shutdownNow race while clearing interrupt
      if ((runStateAtLeast(ctl.get(), STOP) ||
           (Thread.interrupted() &&
            runStateAtLeast(ctl.get(), STOP))) &&
          !wt.isInterrupted())
        wt.interrupt();
      try {
        beforeExecute(wt, task);
        try {
          task.run();
          afterExecute(task, null);
        } catch (Throwable ex) {
          afterExecute(task, ex);
          throw ex;
        }
      } finally {
        task = null;
        w.completedTasks++;
        w.unlock();
      }
    }
    completedAbruptly = false;
  } finally {
    processWorkerExit(w, completedAbruptly);
  }
}

specific process:

The while loop continuously obtains tasks through the getTask() method.
If the thread pool is stopping, then ensure that the current thread is in an interrupted state, otherwise, ensure that the current thread is not in an interrupted state.
perform tasks.
If the result of getTask is null, jump out of the loop, execute the processWorkerExit() method, and destroy the thread.

How Worker Threads Are Recycled

The destruction of threads depends on the automatic recycling of the JVM, but the core threads in the thread pool cannot be recycled by the JVM, so when the thread pool decides which threads need to be recycled, it only needs to eliminate their references. After the Worker is created, it will continue to poll, and then obtain tasks to execute. The core thread can wait indefinitely to obtain the task, and the non-core thread must obtain the task within a limited time. When the Worker cannot obtain the task, that is, when the obtained task is empty, the cycle will end, and the Worker will actively eliminate its own reference in the thread pool.

Its main logic is in the processWorkerExit() method

private void processWorkerExit(Worker w, boolean completedAbruptly) {
    if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
        decrementWorkerCount();

    final ReentrantLock mainLock = this.mainLock;
    mainLock.lock();
    try {
        completedTaskCount += w.completedTasks;
        workers.remove(w);
    } finally {
        mainLock.unlock();
    }

    tryTerminate();

    int c = ctl.get();
    if (runStateLessThan(c, STOP)) {
        if (!completedAbruptly) {
            int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
            if (min == 0 && ! workQueue.isEmpty())
                min = 1;
            if (workerCountOf(c) >= min)
                return; // replacement not needed
        }
        addWorker(null, false);
    }
}

specific process:

Best Practices for Using Thread Pools

Executors

Considering that the implementation of ThreadPoolExecutor's constructor is somewhat complicated, java provides a static factory class of thread pool, Executors, which can quickly create thread pools by using Executors. However, large manufacturers do not recommend using Executors. The reason is that many methods of Executors use the LinkedBlockQueue constructed without parameters by default. The default size is Integer.MAX_VALUE. Under high load conditions, the queue can easily lead to OOM. And OOM will cause all requests to be unable to process. It is strongly recommended to use ArrayBlockingQueue bounded queue.

Using a bounded queue, when there are too many tasks, the thread pool will trigger the execution rejection strategy. The default rejection strategy of the thread pool will throw the runtime exception RejectedExecutionException, so developers can easily ignore it, so the default rejection strategy needs to be used with caution. If the thread processing task is very important, it is recommended to customize the rejection strategy. In actual development, the custom rejection strategy is often used in conjunction with the degradation strategy.

Here are the commonly used methods:

newFixedThreadPool()

The newFixedThreadPool() function is used to create a thread pool with a fixed size.
The maximumPoolSize in ThreadPoolExecutor is equal to corePoolSize, therefore, the threads in the thread pool are all core threads, and will not be destroyed once created.
workQueue is LinkedBlockingQueue, the default size is Integer.MAX_VALUE, the size is very large, equivalent to an unbounded blocking queue. Tasks can be submitted to the workQueue indefinitely, and the rejection policy will never be triggered.

public static ExecutorService newFixedThreadPool(int nThreads) {
  return new ThreadPoolExecutor(nThreads, nThreads,
                                0L, TimeUnit.MILLISECONDS,
                                new LinkedBlockingQueue<Runnable>());
}

newSingleThreadExecutor()

The newSingleThreadExecutor() function is used to create a single-threaded executor.
Both maximumPoolSize and corePoolSize in ThreadPoolExecutor are equal to 1.
workQueue is also a LinkedBlockingQueue whose size is Integer.MAX_VALUE.

public static ExecutorService newSingleThreadExecutor() {
    return new FinalizableDelegatedExecutorService
        (new ThreadPoolExecutor(1, 1,
                                0L, TimeUnit.MILLISECONDS,
                                new LinkedBlockingQueue<Runnable>()));
}

newCachedThreadPool()

The thread pool created by the newCachedThreadPool() function only contains non-core threads, and the threads will be destroyed if they are idle for more than 60 seconds.
workQueue is of type SynchronousQueue, and SynchronousQueue is a blocking queue with a length of 0, so workQueue does not store any tasks waiting to be executed.
- If there is an idle thread in the thread pool, the newly submitted task will be executed by the idle thread
- If there is no idle thread in the thread pool, the thread pool will create a new thread to execute the newly submitted task.
The thread pool size is Integer.MAX_VALUE, therefore, the number of threads created in the thread pool can be very large.

public static ExecutorService newCachedThreadPool() {
    return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
                                  60L, TimeUnit.SECONDS,
                                  new SynchronousQueue<Runnable>());
}

exception capture

When using the thread pool, you also need to pay attention to the problem of exception handling. When executing a task through the execute() method of the ThreadPoolExecutor object, if a runtime exception occurs during task execution, the thread of the task will terminate, but you will not get any notification. This can fool you into thinking that tasks are performing normally. Although the thread pool provides many methods for exception handling, the safest and simplest solution is to capture exception information and process it on demand.

Configure thread pool parameters

From the four perspectives of task priority, task execution time, task nature (CPU-intensive/IO-intensive), and task dependencies. And use bounded work queues as close as possible.

Tasks of different nature can be processed separately using thread pools of different sizes:

CPU intensive: as few threads as possible, Ncpu+1
IO-intensive: as many threads as possible, Ncpu*2, such as database connection pool
Hybrid: CPU-intensive tasks and IO-intensive tasks have little difference in execution time, and are split into two thread pools; otherwise, there is no need to split.