Accident Summary Highlights - Accident 8 caused by the sharing of a thread pool by the main and child thread pools (updated once a week)

【Problem Description】

  • Online product pictures are blank pictures.

【Sphere of influence】

  • APP homepage, store list page, single product page details

【Accident Level】

  • P0

【Processing process】

  • 11:23 Feedback that many of the app product images are empty

  • 11:30 Positioning background interface cache data returns empty;

  • 11:35 The interface turns on the downgrade query database switch, and completes the product image to the cache

  • 11:36 Problem solving

  • 12:30 Review the code and find the problem

【cause of issue】

  • The product data is re-refreshed due to the image resizing type. During the refresh process, multi-threaded parallel processing is turned on to shorten the time. During the processing, threads are turned on according to the number of stores under the merchant, and the sku images under the store are refreshed according to Sub-thread processing is enabled in batches of 50.
  • The main thread and sub-threads all originate from the same thread pool, and there is no thread pool isolation.
  • The occupation of the main and child threads will cause each other to wait.
  • With the increase of tasks, the newly generated sub-threads gradually reach the maximum number of threads, thus entering the waiting queue
  • The backlog of waiting queues for tasks gradually reaches the maximum value
  • Take the thread strategy, the rejection strategy is used here, that is, the task is lost
  • The task of image refresh began to be lost, resulting in missing online cache data

【Summarize】

Let’s review the basics of thread pools first.

Important parameters of the thread pool:

  1. corePoolSize The size of the number of core threads, when the number of threads < corePoolSize, a thread will be created to execute runnable
  2. maximumPoolSize The maximum number of threads, when the number of threads >= corePoolSize, the runnable will be put into the workQueue
  3. keepAliveTime keeps the survival time, when the number of threads is greater than the maximum time that idle threads of corePoolSize can keep.
  4. unit time unit
  5. workQueue holds the blocking queue of tasks
  6. threadFactory creates a factory for threads
  7. handler rejection policy

任务执行顺序:

  1. 当线程数小于corePoolSize时,创建线程执行任务。
  2. 当线程数大于等于corePoolSize并且workQueue没有满时,放入workQueue中
  3. 线程数大于等于corePoolSize并且当workQueue满时,新任务新建线程运行,线程总数要小于maximumPoolSize
  4. 当线程总数等于maximumPoolSize并且workQueue满了的时候执行handler的rejectedExecution。也就是拒绝策略。

ThreadPoolExecutor默认有四个拒绝策略:

  1. ThreadPoolExecutor.AbortPolicy() 直接抛出异常RejectedExecutionException
  2. ThreadPoolExecutor.CallerRunsPolicy() 直接调用run方法并且阻塞执行
  3. ThreadPoolExecutor.DiscardPolicy() 直接丢弃后来的任务
  4. ThreadPoolExecutor.DiscardOldestPolicy() 丢弃在队列中队首的任务

接下来我们通过一个demo来复现事故的原因过程

public class FuatureTaskDemo2 {
    private static ThreadPoolExecutor mExecutor = new ThreadPoolExecutor(
            4,
            4,
            10L,
            TimeUnit.SECONDS,
            new LinkedBlockingDeque<>(2),
            Executors.defaultThreadFactory(),
            new ThreadPoolExecutor.DiscardPolicy());

    /**
     * @return
     */
    public void getWorker(String name) throws Exception {
        System.out.println("执行"+name+"程任务开始");
        for(int i=0;i<10;i++){
            int finalI = i;
            mExecutor.submit(new Runnable() {
                @Override
                public void run() {
                    try {
                        Thread.sleep(100L);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                    System.out.println("执行"+name+"中子线程:"+ finalI);
                }
            });
        }
        int getActiveCount = (mExecutor).getActiveCount();
        int getCorePoolSize = (mExecutor).getCorePoolSize();
        int getMaximumPoolSize = (mExecutor).getMaximumPoolSize();
        long getTaskCount = (mExecutor).getTaskCount();
        BlockingQueue<Runnable> blockingQueue = (mExecutor).getQueue();
        System.out.println("getActiveCount"+getActiveCount);
        System.out.println("getCorePoolSize"+getCorePoolSize);
        System.out.println("getMaximumPoolSize"+getMaximumPoolSize);
        System.out.println("getTaskCount"+getTaskCount);
        System.out.println("blockingQueue"+blockingQueue.size());
        mExecutor.shutdown();
    }


    public static void main(String[] args) {
        FuatureTaskDemo2 it = new FuatureTaskDemo2();
        FuatureTaskDemo2 it3 = new FuatureTaskDemo2();
        try {
            it3.getWorker("父线程");
            it.getWorker("子线程");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}
复制代码

执行结果:

image.png

给我们的启示:

  • 由上述demo可以看到。30个子线程任务 最后只执行了 6个,其余的全部被拒绝。
  • 我们在执行核心数据线程的时候,尽量做到主-子线程池分离
  • 核心任务 拒绝策略一定是ThreadPoolExecutor.CallerRunsPolicy() 直接调用run方法并且阻塞执行,或者是 ThreadPoolExecutor.AbortPolicy() 直接抛出异常后进行重试。

Guess you like

Origin juejin.im/post/7079774983987626020