Remember that a fault caused by a thread pool is a three-way pot after troubleshooting

image

background

Ao Bing encountered a problem in his work before. I defined a thread pool to execute tasks, but after the program was executed, the tasks were not all executed, and his mentality almost collapsed.

image

The business scenario is like this: due to statistical business needs, order information needs to be written from the main library through statistical business code to the statistical library (the middle requires logical processing, so binlog cannot be used).

Due to code quality and historical reasons, the current re-statistic interface is single-threaded. It is roughly calculated that there are a total of 1 million order information. The processing of every 100 pieces of information takes about 10 seconds, so theoretically it takes 28 hours to process all the information. , This is not to be counted as the result of the later query time caused by limit paging in mysql and the possible memory overflow that caused the statistics to be suspended.

Based on the above reasons, and the most important point: the statistical business is carried out according to the center to which the order belongs, and the statistics of each center at the same time will not cause dirty data.

Therefore, I plan to use a thread pool to allocate a thread for each center to perform statistical services.

Business realization

// 线程工厂,用于为线程池中的每条线程命名
ThreadFactory namedThreadFactory = new ThreadFactoryBuilder().setNameFormat("stats-pool-%d").build();

// 创建线程池,使用有界阻塞队列防止内存溢出
ExecutorService statsThreadPool = new ThreadPoolExecutor(510,
                0L, TimeUnit.MILLISECONDS,
                new LinkedBlockingQueue<>(100), namedThreadFactory);
// 遍历所有中心,为每一个centerId提交一条任务到线程池
statsThreadPool.submit(new StatsJob(centerId));

After creating the thread pool, submit a task for each centerId to the thread pool. In my expectation, since the number of core threads in the thread pool is 5, up to 5 centers will perform statistical operations at the same time, which will greatly reduce 1 million data Total statistics time, so I was very excited to start the re-statistics business.

problem

After running for a long time, when I checked the statistics progress, I found a very strange problem (as shown below).

The thread marked in the blue box is in the WAIT state, indicating that the thread is idle, but from the log I see that this thread did not complete its task, because the center has 100,000 data, but the log shows It only ran halfway, and then there were no more logs about this center.

image

What is the reason?

I thought of Sanwai on the spot. It must have been Sanwai who entered the company with his left foot after going to work this morning, causing the code to be uncomfortable. This must be the case. I will go to him.

image

Debugging and reasons

Keke Sanwai is a joke, we still need to find the real reason.

It is conceivable that this thread was blocked for some reason and did not continue, but the log did not contain any abnormal information...

Maybe an experienced engineer already knows the reason...

Due to the personal level of threads, I have not found the reason for the time being, I can only give up using the thread pool, obediently use single thread to run...

Fortunately, the single-threaded task was thrown wrong (why is lucky?), so I immediately thought that the previous thread in the WAIT state might be interrupted because of the same throwing error, causing the task to not continue. .

Why are you lucky? Because if the single-threaded task did not throw an error, I might not have thought of this for a long time.

image

In-depth exploration of the exception handling of the thread pool

The reason for the work problem is found here, and the subsequent solution process is also very simple, so I won’t mention it here.

But the question is coming again. Why doesn't the thread throw any information when the thread is interrupted due to an exception when using the thread pool? Also, if the exception is usually in the main function, it will also be thrown instead of being swallowed like the thread pool.

If a child thread throws an exception, how will the thread pool handle it?

The way I submit a task to the thread pool is:, I  threadPoolExecutor.submit(Runnbale task); learned later that using execute() to submit a task will output the exception log. Here is a study of why the submit task is submitted, and the exception in the task will be "swallowed".

For tasks submitted in the form of submit(), we directly look at the source code:

public Future<?> submit(Runnable task) {
    if (task == nullthrow new NullPointerException();
    // 被包装成 RunnableFuture 对象,然后准备添加到工作队列
    RunnableFuture<Void> ftask = newTaskFor(task, null);
    execute(ftask);
    return ftask;
}

It will be packaged by the thread pool into a RunnableFuture object, and in the end it is actually a FutureTask object. After being added to the work queue of the thread pool, and then calling the start() method, the run() method of the FutureTask object starts to run, which is the task start execution.

public void run() {
    if (state != NEW || !UNSAFE.compareAndSwapObject(this,runnerOffset,null, Thread.currentThread()))
        return;
    try {
        Callable<V> c = callable;
        if (c != null && state == NEW) {
            V result;
            boolean ran;
            try {
                result = c.call();
                ran = true;
            } catch (Throwable ex) {
                // 捕获子任务中的异常
                result = null;
                ran = false;
                setException(ex);
            }
            if (ran)
                set(result);
        }
    } finally {
        runner = null;
        int s = state;
        if (s >= INTERRUPTING)
            handlePossibleCancellationInterrupt(s);
    }
}

In the run() method of the FutureTask object, the exception thrown by the task is caught, and then in the setException(ex); method, the thrown exception will be placed in the outcome object, which is the submit() method will return The result of executing the get() method of the FutureTask object.

But in the thread pool, the result of executing the child thread is not obtained, so the exception is not thrown out, that is, it is "swallowed".

This is why the submit() method of the thread pool does not throw an exception when submitting tasks.

Thread pool custom exception handling method

When defining the ThreadFactory, call the setUncaughtExceptionHandlermethod to customize the exception handling method. E.g:

ThreadFactory namedThreadFactory = new ThreadFactoryBuilder()
                .setNameFormat("judge-pool-%d")
                .setUncaughtExceptionHandler((thread, throwable)-> logger.error("ThreadPool {} got exception", thread,throwable))
                .build();

In this way, the error log will be written for the exceptions thrown by each thread in the thread pool, and you will not lose sight of it.

Follow-up

After fixing the exception of a single thread task, I continued to use the thread pool for re-statistics business, and finally finished running, and finally completed the task.

Afterwards, I also asked Sanwai to enter the company with his right foot first, otherwise it would have a great impact on the feng shui of writing code.

image

Summary: C. This accident also gives you a warning. You need to pay attention when using the thread pool. If the exception of the child thread is not caught, it will be lost, which may lead to the failure to find the cause when debugging based on the log.

I’m Ao Bing, a programmer who has survived on the Internet. See you in the next issue.


Guess you like

Origin blog.51cto.com/15060461/2678190