Interviewer: How are redundant threads in the thread pool recovered?

您好，我是路人，更多优质文章见个人博客：http://itsoku.com

I recently read the source code of the JDK thread pool ThreadPoolExecutor, and I have a general understanding of the process of executing tasks in the thread pool. In fact, this process is also very easy to understand, so I won’t repeat it. Others wrote it much better than me.

However, I am more interested in how the thread pool recycles worker threads, so I briefly analyzed it to deepen my understanding of the thread pool.

So, let's take JDK1.8 as an example.

1. runWorker(Worker w)

After the worker thread starts, it enters the runWorker(Worker w) method.

Inside is a while loop, the loop judges whether the task is empty, if it is not empty, execute the task; if the task cannot be obtained, or an exception occurs, exit the loop and execute processWorkerExit(w, completedAbruptly); In this method, the worker thread is moved to get rid of.

There are two sources for fetching tasks. One is firstTask, which is the task executed when the worker thread runs for the first time. It can only be executed once at most, and the task must be fetched from the getTask() method later. It seems that getTask() is the key. In the scenario where exceptions are not considered, returning null means exiting the loop and ending the thread. Next, we have to see, under what circumstances getTask() will return null.

(Limited space, intercepted in sections, omitting the steps of performing tasks in the middle)

2. getTask() returns null

There are two situations in which null will be returned, see the red box.

In the first case, the state of the thread pool is already STOP, TIDYING, TERMINATED, or SHUTDOWN and the work queue is empty;

In the second case, the number of worker threads is greater than the maximum number of threads or the current worker thread has timed out, and there are other worker threads or the task queue is empty. This is more difficult to understand, in short, remember it first, and use it later.

Condition 1 and condition 2 are used below to refer to the judgment conditions of the two cases respectively.

3. Scenario analysis thread pool recycling worker threads

3.1 The scenario where shutdown() is not called and all tasks are completed in the RUNNING state

In this scenario, the number of worker threads will be reduced to the size of the number of core threads (if it is not exceeded, there is no need to recycle).

For example, in a thread pool, the number of core threads is 4 and the maximum number of threads is 8. At the beginning, there are 4 worker threads. When the tasks fill up the task queue, you have to increase the worker threads to 8. When the subsequent tasks are almost executed and the threads cannot get tasks, it will be recycled to the state of 4 worker threads. (Depending on the value of allowCoreThreadTimeOut, the case where the default value is false is discussed here, that is, the core thread will not timeout. If it is true, the worker threads can all be destroyed).

Condition 1 mentioned above can be excluded first , the state of the thread pool is already STOP, TIDYING, TERMINATED, or SHUTDOWN and the work queue is empty. Because the thread pool is always RUNNING, this judgment is always false. In this scenario, it can be assumed that condition 1 does not exist.

The following analyzes how the thread runs when the task cannot be taken out.

step1. There are two ways to fetch tasks from the task queue, and the timeout waiting can still be blocked forever. The determining factor is the timed variable. The variable is assigned a value before. If the current number of threads is greater than the number of core threads, the variable timed is true, otherwise it is false (as mentioned above, only the case where allowCoreThreadTimeOut is false is discussed here). Obviously, what is being discussed now is the case where timed is true. keepAliveTime is generally not set, and the default value is 0, so basically it can be considered as non-blocking, and the result of fetching the task will be returned immediately.

After the thread waits for wake-up overtime, it finds that the task cannot be taken out, timeOut becomes true, and enters the next cycle.

step2. Come to the judgment of condition 1 , the thread pool is always RUNNING and does not enter the code block.

step3. Come to the judgment of condition 2. At this time, the task queue is empty and the condition is true. CAS reduces the number of threads. If successful, return null, otherwise, repeat step1.

It should be noted here that it is possible for multiple threads to pass the judgment of condition 2 at the same time . Will the number of threads be reduced instead of the expected number of core threads?

For example, the current number of threads is only 5. At this time, two threads wake up at the same time. After the judgment of condition 2 , and reduce the number at the same time, the remaining number of threads is only 3, which is inconsistent with the expectation.

Actually no. In order to prevent this situation, compareAndDecrementWorkerCount(c) uses the CAS method. If the CAS fails, continue, enter the next round of the cycle, and re-judge.

Like the above example, one of the threads will CAS fail, and then re-enter the loop, and find that the number of worker threads is only 4, timed is false, this thread will not be destroyed, and can be blocked forever (workQueue.take()).

I have been thinking about this for a long time before I came to the answer. I have been thinking about how to ensure that the number of core threads can be recycled without any lock. It turned out to be the mystery of CAS.

It can also be seen from here that although there are core threads, the thread does not distinguish whether it is core or non-core. The core is not created first, and the non-core is created after the number of core threads is exceeded. Which threads are finally retained is completely random. .

3.2 Call shutdown(), the scene where all tasks are executed

In this scenario, whether it is a core thread or a non-core thread, all worker threads will be destroyed.

After calling shutdown(), an interrupt signal is sent to all idle worker threads.

Finally pass in false and call the following method.

It can be seen that before the interrupt signal is sent, it will be judged whether it has been interrupted, and the exclusive lock of the worker thread will be obtained.

When an interrupt signal is issued, the worker thread is either preparing to acquire the task in getTask(), or it is executing the task, so it will not issue it until it finishes executing the current task, because the worker thread will also add the task when the worker thread is executing the task. Lock. After the worker thread executes the task, it goes to getTask() again.

So we just need to see how to deal with interrupt exceptions in getTask().

There are two possibilities for the worker thread in getTask().

3.2.1 The tasks have all been completed, and the thread is blocking and waiting.

Very simple, the interrupt signal wakes it up and enters the next cycle. When condition 1 is reached , if the condition is met, the number of worker threads will be reduced, and null will be returned, and the outer layer will end this thread.

The decrementWorkerCount() here is spin-type and will definitely be decremented by 1.

3.2.2 The task has not been fully executed

After calling shutdown(), the unfinished tasks must be executed before the pool can end. So it is possible that the thread is still working at this time.

There are two stages to discuss

Phase 1 has many tasks, and the worker threads can get tasks

This does not involve thread exit, you can skip it , just analyze the performance of the thread after receiving the interrupt signal.

Suppose there is thread A, which is getting tasks through getTask(). At this time, A is interrupted, and when acquiring the task, whether it is poll() or take(), an interrupt exception will be thrown. The exception is caught and re-enters the next cycle. As long as the queue is not empty, tasks can continue to be fetched.

Thread A is interrupted, fetches the task again, calls workQueue.poll() or workQueue.take(), won't an exception be thrown? Can the task be retrieved normally?

It depends on the implementation of workQueue. workQueue is a BlockingQueue type. Taking the common LinkedBlockingQueue and ArrayBlockingQueue as examples, lockInterruptibly() is called when locking, which responds to interruption. This method calls acquireInterruptibly(int arg) of AQS.

acquireInterruptibly(int arg), whether it is judging the interrupt exception at the entrance, or blocking in the parkAndCheckInterrupt() method, being awakened by the interrupt and judging the interrupt exception, Thread.interrupted() is used. This method will return the interrupt status of the thread, and reset the interrupt status! In other words, the thread is no longer in the interrupted state, so that when the task is fetched again, no error will be reported.

Therefore, this is equivalent to wasting a cycle for the thread that is preparing to fetch tasks. This may be a side effect of thread interruption. Of course, it does not affect the overall operation.

After analyzing this point, I can't help but sigh, BlockingQueue just resets the interrupt state here, how did this come up with such a wonderful design? Doug Lea God Orz.

Phase 2 mission just about to finish

At this time, the task has almost been fetched. For example, there are 4 worker threads and only 2 tasks are left, then 2 threads may obtain the task and 2 threads are blocked.

Because the judgment before obtaining the task is not locked, will it happen that all threads have passed the previous verification and come to the place where the workQueue obtains the task, it happens that the task queue is empty and all the threads are blocked? Because shutdown() has been executed, an interrupt signal can no longer be sent to the thread, so the thread has been blocked and cannot be recycled.

This is not going to happen.

Assume that there are four worker threads A, B, C, and D, and pass the judgment of condition 1 and condition 2 at the same time , and come to the place where the task is fetched. Then, there is at least one task in the work queue, and at least one thread can get the task.

Suppose A and B get the task, C and D are blocked.

A, B The next steps are:

step1. After the task execution is completed, getTask() again. At this time, condition 1 is met , and null is returned, and the thread is ready to be recycled.

step2.processWorkerExit(Worker w, boolean completedAbruptly) Recycle the thread.

Is recycling just as simple as killing threads? Let's take a look at the processWorkerExit(Worker w, boolean completedAbruptly) method.

As you can see, in addition to workers.remove(w) removing the line, tryTerminate() is also called.

The first judgment condition does not meet any sub-conditions, skip it. The second condition is that the worker thread still exists, then randomly interrupt an idle thread.

Then the problem comes, interrupting an idle thread does not mean interrupting the thread that is blocking. If A and B exit at the same time, is it possible that A interrupts B, B interrupts A, and AB interrupts each other, so that there is no thread to interrupt and wake up the blocked thread?

The answer is still, think too much...

Assuming that A can come here, it means that A has been removed from the worker thread collection workers (processWorkerExit(Worker w, boolean completedAbruptly) has been removed before tryTerminate()). Then A interrupts B, and B comes here to interrupt, and A will not be found in the workers.

In other words, exiting threads cannot interrupt each other. After I exit from the collection, I interrupt you, but you cannot interrupt me, because I have already exited the collection, and you can only interrupt others. Then, even if N threads exit at the same time, at least at the end, there will be one thread that will interrupt the remaining blocked threads.

Like dominoes, the interrupt signal will be propagated.

After any one of the blocked C and D is interrupted and awakened, the action of step1 will be repeated, and the cycle will start again and again until all blocked threads are interrupted and awakened.

This is why in tryTerminate(), if you pass in false, you only need to interrupt any idle thread.

Thinking of this, I once again feel admiration (Cantonese) for Doug Lea. It's also well designed.

4. Summary

ThreadPoolExecutor recycles worker threads, and if a thread getTask() returns null, it will be recycled.

There are two scenarios.

The scenario where shutdown() is not called and all tasks are executed in the RUNNING state

If the number of threads is greater than corePoolSize, the threads will be blocked by timeout. After the timeout wakes up, CAS will reduce the number of working threads. If CAS is successful, null will be returned and the threads will be recycled. Otherwise, enter the next cycle. When the number of worker threads is less than or equal to corePoolSize, it can be blocked all the time.

Call shutdown(), the scene where all tasks are executed

shutdown() will send an interrupt signal to all threads, and there are two possibilities.

2.1) All threads are blocking

The interrupt wakes up and enters the loop, all of which meet the first if judgment condition, return null, and all threads are recycled.

2.2) The task has not been fully executed

At least one thread will be recycled. In the processWorkerExit(Worker w, boolean completedAbruptly) method, tryTerminate() is called to send an interrupt signal to any idle thread. All blocked threads will eventually be woken up one by one and recycled.

For this analysis, I started writing last night, got stuck halfway through writing, and continued writing this morning. It took about 2+2=4 hours to write blogs and 1 hour to think.

To be honest, I'm still a bit confused, I can't understand it all at once, and I don't know if I understand it correctly.

I don't know if it's useful or not. I can only say that it has deepened my understanding of the thread pool (comfort myself), and I also feel the subtlety of the design.

Interviewer: How are redundant threads in the thread pool recovered?

more good articles

Guess you like