Thread pool practical application

thread pool settings

corePoolSize: number of core threads

The core thread will always live, even if there are no tasks to execute

When the number of threads is less than the number of core threads, even if there are idle threads, the thread pool will give priority to creating new threads for processing

When set allowCoreThreadTimeout=true (default false), the core thread will be closed by timeout

(1) For services with high concurrency and short task execution time, the number of threads in the thread pool can be set to the number of CPU cores + 1 to reduce thread context switching
(2) For services with low concurrency and long task execution time, it is necessary to distinguish:
a ) If the business time is concentrated on IO operations for a long time, that is, IO-intensive tasks, because IO operations do not occupy the CPU, so do not let all the CPU idle, you can increase the number of threads in the thread pool and let the CPU process More business
b) If the business time is concentrated on computing operations, that is, computing-intensive tasks, there is no way to do this. Same as (1), the number of threads in the thread pool is set to be less, and the number of threads in the thread pool is reduced. Context switching
(3) High concurrency and long business execution time. The key to solving this type of task lies not in the thread pool but in the design of the overall architecture. It is the first step to see whether some data in these businesses can be cached. The server is the second step. As for the setting of the thread pool, please refer to (2). Finally, the problem of long business execution time may also need to be analyzed to see if middleware can be used to split and decouple tasks

tasks : the number of tasks per second, assuming 500~1000
taskcost: the time spent on each task, assuming 0.1s
responsetime: the maximum response time the system can tolerate, assuming 1s
to do several calculations
corePoolSize = how many threads are needed per second deal with?
threadcount = tasks/(1/taskcost) =tasks*taskcout = (500~1000)*0.1 = 50~100 threads. The corePoolSize setting should be greater than 50.
According to the 8020 principle, if 80% of the tasks per second are less than 800, then the corePoolSize should be set to 80.

queueCapacity: task queue capacity (blocking queue)

The two most commonly used queues
ArrayBlockingQueue

It is a bounded blocking queue based on an array structure, and the underlying structure is an array

LinkedBlockingQueue

A bounded blocking queue based on a linked list structure (when the size is not set, the default is Integer.MAX_VALUE), the underlying structure is a singly linked list

When the number of core threads reaches the maximum, new tasks will be queued in the queue for execution. If the length is set unreasonably, the power of multi-threading cannot be exerted. The default length of the queue is the maximum value of int. If the
queue length is set to this length, the number of thread pools will only increase to corePoolSize
. If the number of corePoolSize is set too small, it will not be able to exert the power of multi-threading.

Tomcat's thread pool queue is infinite in length, but the thread pool will be created until the maximumPoolSize, and then the request will be placed in the waiting queue

The tomcat task queue org.apache.tomcat.util.threads.TaskQueue inherits from LinkedBlockingQueue and overrides the offer method.

    @Override
    public boolean offer(Runnable o) {
      //we can't do any checks
        if (parent==null) return super.offer(o);
        //we are maxed out on threads, simply queue the object
        if (parent.getPoolSize() == parent.getMaximumPoolSize()) return super.offer(o);
        //we have idle threads, just add it to the queue
        if (parent.getSubmittedCount()<(parent.getPoolSize())) return super.offer(o);
         //线程个数小于MaximumPoolSize会创建新的线程。
        //if we have less threads than maximum force creation of a new thread
        if (parent.getPoolSize()<parent.getMaximumPoolSize()) return false;
        //if we reached here, we need to add it to the queue
        return super.offer(o);
    }

queueCapacity = (coreSizePool/taskcost) The responsetime
calculation can obtain queueCapacity = 80/0.1
1 = 80. It means that the threads in the queue can wait for 1s. If it exceeds, a new thread needs to be opened to execute.
Remember that it cannot be set to Integer.MAX_VALUE, so the queue will be very large, and the number of threads will only remain at the corePoolSize size. Open the thread to execute, the response time will increase sharply.

maxPoolSize: maximum number of threads

When the number of threads >= corePoolSize and the task queue is full. The thread pool creates new threads to process tasks

When the number of threads=maxPoolSize and the task queue is full, the thread pool will refuse to process the task and throw an exception
maxPoolSize = (max(tasks)-queueCapacity)/(1/taskcost)(maximum number of tasks-queue capacity)/each Thread processing power per second = maximum number of threads
Calculated to get maxPoolSize = (1000-80)/10 = 92
rejectedExecutionHandler: It is determined according to the specific situation, the task is not important and can be discarded, and the task is important, some buffering mechanisms should be used to process
keepAliveTime and allowCoreThreadTimeout : The default is usually sufficient

keepAliveTime: thread idle time

When the thread idle time reaches keepAliveTime, the thread will exit until the number of threads = corePoolSize default

allowCoreThreadTimeout: Allow core threads to timeout by default

If allowCoreThreadTimeout=true, it will wait until number of threads=0

rejectedExecutionHandler: Task rejection handler

It is determined according to the specific situation. The task is not important and can be discarded. If the task is important, some buffer mechanisms should be used to process it.

There are two situations in which the task will be rejected:

1): When the number of threads has reached maxPoolSize and the queue is full, new tasks will be rejected;

2): When the thread pool calls shutdown(), it will wait for the tasks in the queue to finish executing before shutting down. If the task is submitted between the call to shutdown() and the actual shutdown of the thread pool, the new task will be rejected;

The thread pool will call rejectedExecutionHandler to handle this task. If not set, the default is AbortPolicy, an exception will be thrown

The rejection policy provided by the thread pool:
ThreadPoolExecutor.AbortPolicy: Abort tasks and throw runtime exceptions

public static class AbortPolicy implements RejectedExecutionHandler {
	public AbortPolicy() { }

	public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
            throw new RejectedExecutionException("Task " + r.toString() +
                                                 " rejected from " +
                                                 e.toString());
        }

}

ThreadPoolExecutor.CallerRunsPolicy: Assign tasks to the calling thread for execution, and run the currently discarded task. This will not really discard the task, but the performance of the submitted thread may drop sharply

public static class CallerRunsPolicy implements RejectedExecutionHandler {
	public CallerRunsPolicy() { }

	public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
            if (!e.isShutdown()) {
                r.run();
            }
        }
}

ThreadPoolExecutor.DiscardPolicy: ignore, nothing will happen

public static class DiscardPolicy implements RejectedExecutionHandler {
public DiscardPolicy() { }
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
}
}

ThreadPoolExecutor.DiscardOldestPolicy: kicks the task that entered the queue first (last executed) from the queue

public static class DiscardOldestPolicy implements RejectedExecutionHandler {
	public DiscardOldestPolicy() { }
	
	public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
            if (!e.isShutdown()) {
                e.getQueue().poll();
                e.execute(r);
            }
        }
	
}

All three strategies will discard the original task. But in some business scenarios, we can't rudely discard tasks. Another rejection strategy is to process the discarded tasks by starting the threads of the thread pool, but the problem is that even if the thread pool is idle, it will not execute the discarded tasks, but wait for the main thread that calls the thread pool to execute the tasks. until the end of the mission.
Implement the RejectedExecutionHandler interface to customize the handler

In the definition of the thread pool, we can see that the rejection policy has a unified implementation interface, as follows:

public interface RejectedExecutionHandler { void rejectedExecution(Runnable r, ThreadPoolExecutor executor); } We can define processing strategies that meet our business scenarios according to our business needs.


1. Thread pool rejection policy in Netty

 private static final class NewThreadRunsPolicy implements RejectedExecutionHandler {
        NewThreadRunsPolicy() {
            super();
        }

        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
            try {
                final Thread t = new Thread(r, "Temporary task executor");
                t.start();
            } catch (Throwable e) {
                throw new RejectedExecutionException(
                        "Failed to start a new thread", e);
            }
        }
    }

Netty's processing method is not to discard tasks. This idea is similar to the advantages of CallerRunsPolicy. It's just that in the custom rejection strategy in the Netty framework, the discarded tasks are completed by creating new worker threads, but we can see that it has no conditional constraints when creating threads, and will continue to create new ones as long as the resources allow. thread for processing.

Thread pool rejection policy in Dubbo

public class AbortPolicyWithReport extends ThreadPoolExecutor.AbortPolicy {

    protected static final Logger logger = LoggerFactory.getLogger(AbortPolicyWithReport.class);

    private final String threadName;

    private final URL url;

    private static volatile long lastPrintTime = 0;

    private static Semaphore guard = new Semaphore(1);

    public AbortPolicyWithReport(String threadName, URL url) {
        this.threadName = threadName;
        this.url = url;
    }

    @Override
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
        String msg = String.format("Thread pool is EXHAUSTED!" +
                        " Thread Name: %s, Pool Size: %d (active: %d, core: %d, max: %d, largest: %d), Task: %d (completed: %d)," +
                        " Executor status:(isShutdown:%s, isTerminated:%s, isTerminating:%s), in %s://%s:%d!",
                threadName, e.getPoolSize(), e.getActiveCount(), e.getCorePoolSize(), e.getMaximumPoolSize(), e.getLargestPoolSize(),
                e.getTaskCount(), e.getCompletedTaskCount(), e.isShutdown(), e.isTerminated(), e.isTerminating(),
                url.getProtocol(), url.getIp(), url.getPort());
        logger.warn(msg);
        dumpJStack();
        throw new RejectedExecutionException(msg);
    }

    private void dumpJStack() {
       //省略实现
    }
}

In the custom rejection policy in Dubbo, the log is printed, the stack information of the current thread is output, and the default rejection policy of JDK is executed.

customize

public class CustomRejectionHandler implements RejectedExecutionHandler {
    @Override
    public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
        // 打印日志、暂存任务、重新执行等拒绝策略
    }
}
 /**
     * 自定义拒绝策略
     */
    publice class CustomRejectedExecutionHandler implements RejectedExecutionHandler {

        @Override
        public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
            try {
                // 核心改造点,由blockingqueue的offer改成put阻塞方法  
                executor.getQueue().put(r);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

If the amount of tasks is large, and each task is required to be processed successfully, the submitted tasks should be blocked and submitted, the rejection mechanism should be rewritten, and the submission should be blocked instead. Guaranteed not to abandon a task

ThreadPoolExecutor execution order

The thread pool executes tasks with the following behavior

(1) When the number of threads is less than the number of core threads, create threads.

(2) When the number of threads is greater than or equal to the number of core threads, and the task queue is not full, put the task into the task queue.

(3) When the number of threads is greater than or equal to the number of core threads, and the task queue is full

  a)若线程数小于最大线程数,创建线程

 b)若线程数等于最大线程数,抛出异常,拒绝任务

Use analytics

For business scenarios that respond quickly to user requests, we should look at the user experience. The faster the response, the better. If a page cannot be refreshed for half a day, the user may give up viewing the product. The user-oriented function aggregation is usually very complex. With the cascading and multi-level cascading between calls, business development students often choose to use the simple method of thread pools to encapsulate the calls into parallel tasks. implementation, reducing overall response time. In addition, the use of thread pools is also considered. The most important thing in this scenario is to obtain the maximum response speed to satisfy users. Therefore, you should not set up queues to buffer concurrent tasks, and increase corePoolSize and maxPoolSize to create as many threads as possible to execute quickly. Task.

As for processing batch tasks quickly, this scenario requires a large number of tasks to be executed, and we also hope that the faster the tasks are executed, the better. In this case, a multi-threading strategy should also be used for parallel computing. However, the difference from the response speed priority scenario is that this kind of scenario has a huge amount of tasks and does not need to be completed instantaneously. Instead, it focuses on how to use limited resources and process as many tasks as possible per unit of time, that is, throughput priority. The problem. So you should set the queue to buffer concurrent tasks, and adjust the appropriate corePoolSize to set the number of threads processing tasks. Here, setting too many threads may also cause frequent thread context switching, slow down the speed of processing tasks, and reduce throughput.

Example

import com.google.common.util.concurrent.ThreadFactoryBuilder;

import java.util.concurrent.*;
public class ThreadTask {
    public static void testHospTest() {
        // 创建一个名为shopTest-pool-%d的线程池
        ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("shopTest-pool-%d").build();
        /** 构建线程池参数
         *  1.corePoolSize 核心线程数量
         *  2.maximumPoolSize 能创建的最大线程数,最大线程数不能大于核心线程数
         *  3.keepAliveTime 也就是当线程空闲时,所允许保存的最大时间,超过这个时间,线程将被释放销毁,但只针对于非核心线程
         *  4.TimeUnit 时间单位,TimeUnit.MICROSECONDS等
         *  5.workQueue 工作队列,这里有几种
         *     5.1 ArrayBlockingQueue 基于数组的有界阻塞队列,必须设置容量,遵循先进先出原则(FIFO)对元素进行排序。
         *     5.2 LinkedBlockingQueue:一个基于链表结构的阻塞队列,可以设置容量,此队列按FIFO (先进先出) 排序元素,吞吐量通常要高于ArrayBlockingQueue
         *     5.3 SynchronousQueue:一个不存储元素的阻塞队列。每个插入offer操作必须等到另一个线程调用移除poll操作,否则插入操作一直处于阻塞状态,吞吐量通常要高于LinkedBlockingQueue。
         *     5.4 PriorityBlockingQueue:一个具有优先级的无限阻塞队列。
         *  6.threadFactory 线程工厂,用于创建线程
         *  7.handler 当线程边达到最大容量时,用于处理阻塞时的程序策略
         *     7.1 ThreadPoolExecutor.AbortPolicy:丢弃任务并抛出RejectedExecutionException异常。
         *     7.2 ThreadPoolExecutor.DiscardPolicy:也是丢弃任务,但是不抛出异常。
         *     7.3 ThreadPoolExecutor.DiscardOldestPolicy:丢弃队列最前面的任务,然后重新尝试执行任务(重复此过程)
         *     7.4 ThreadPoolExecutor.CallerRunsPolicy:由调用线程处理该任务
         *  8.executorService.execute 执行一个实现Runnable 接口的线程
         *  9.executorService.shutdown();停止线程池
         */
        // 构建线程参数
        ExecutorService executorService = new ThreadPoolExecutor(3,
                3, 0l,
                TimeUnit.MICROSECONDS,
                new LinkedBlockingQueue<>(3), threadFactory, new ThreadPoolExecutor.CallerRunsPolicy());
        try {
            int a = 20;
            // 执行线程
            executorService.execute(new Seller(a));
        } catch (Exception e) {
            System.err.println("数值小于0"+e.getMessage());
        } finally {
            // 停止线程池
            executorService.shutdown();
        }


    }
}
public class Seller implements Runnable {

    private int ticket;

    public Seller(int ticket) {
        this.ticket = ticket;
    }

    @Override
    public void run() {
        if (ticket > 0) {
            while (ticket > 0) {
                ticket--;
                System.out.println("你已经白嫖了" + ticket + "次");
            }
        } else {
            System.err.println("输入参数有误");
        }
    }
}

Practical application

need

The data we need to push to medical care is about 30 million pieces of data within the first three days of each month, but the interface provided by third-party supervision only supports 3,000 pieces of data push. It can be estimated that 30 million pieces of data, one 3,000 pieces of data is calculated in 3 seconds , it takes about 25 hours. After pushing 10,000 data, it is necessary to verify the data and deal with the failed data.

Therefore, it is considered to introduce multiple threads to perform concurrent operations, reduce the time of data push, and improve the real-time performance of data push.

prevent repetition

The data we push to third parties must not be pushed repeatedly. There must be a mechanism to ensure the isolation of data pushed by each thread.
Using the method of database paging, each thread pushes data in the [start, limit] interval, we need to ensure the consistency of start

failure mechanism

We also have to account for the failure of a thread to push data.

If it is our own system, we can extract the method called by multiple threads and add a transaction, a thread exception, and the overall rollback.

However, it is the connection with the third party, and we can't do transactions. Therefore, we adopt the method of directly recording the failure status in the database, and we can process the failed data in other ways later.

Thread pool selection

In actual use, we must use the thread pool to manage threads. Regarding the thread pool, we often use the thread pool service provided by ThreadPoolExecutor. SpringBoot also provides the thread pool asynchronous method, although SprignBoot asynchronous may be more convenient, But using ThreadPoolExecutor is more intuitive to control the thread pool, so we directly use the ThreadPoolExecutor constructor to create the thread pool.

core code


@Service
public class PushProcessServiceImpl implements PushProcessService {
    @Autowired
    private PushUtil pushUtil;
    @Autowired
    private PushProcessMapper pushProcessMapper;

    private final static Logger logger = LoggerFactory.getLogger(PushProcessServiceImpl.class);

    //每个线程每次查询的条数
    private static final Integer LIMIT = 300000;
    //起的线程数
    private static final Integer THREAD_NUM = 5;
    //创建线程池
    ThreadPoolExecutor pool = new ThreadPoolExecutor(THREAD_NUM, THREAD_NUM * 2, 0, TimeUnit.SECONDS, new LinkedBlockingQueue<>(100));

    @Override
    public void pushData() throws ExecutionException, InterruptedException {
        //计数器,需要保证线程安全
        int count = 0;
        //未推送数据总数
        Integer total = pushProcessMapper.countPushRecordsByState(0);
        logger.info("未推送数据条数:{}", total);
        //计算需要多少轮
        int num = total / (LIMIT * THREAD_NUM) + 1;
        logger.info("要经过的轮数:{}", num);
        //统计总共推送成功的数据条数
        int totalSuccessCount = 0;
        for (int i = 0; i < num; i++) {
            //接收线程返回结果
            List<Future<Integer>> futureList = new ArrayList<>(32);
            //起THREAD_NUM个线程并行查询更新库,加锁
            for (int j = 0; j < THREAD_NUM; j++) {
                synchronized (PushProcessServiceImpl.class) {
                    int start = count * LIMIT;
                    count++;
                    //提交线程,用数据起始位置标识线程
                    Future<Integer> future = pool.submit(new PushDataTask(start, LIMIT, start));
                    //先不取值,防止阻塞,放进集合
                    futureList.add(future);
                }
            }
            //统计本轮推送成功数据
            for (Future f : futureList) {
                totalSuccessCount = totalSuccessCount + (int) f.get();
            }
        }
        //更新推送标志
        pushProcessMapper.updateAllState(1);
        logger.info("推送数据完成,需推送数据:{},推送成功:{}", total, totalSuccessCount);
    }

    /**
     * 推送数据线程类
     */
    class PushDataTask implements Callable<Integer> {
        int start;
        int limit;
        int threadNo;   //线程编号

        PushDataTask(int start, int limit, int threadNo) {
            this.start = start;
            this.limit = limit;
            this.threadNo = threadNo;
        }

        @Override
        public Integer call() throws Exception {
            int count = 0;
            //推送的数据
            List<PushProcess> pushProcessList = pushProcessMapper.findPushRecordsByStateLimit(0, start, limit);
            if (CollectionUtils.isEmpty(pushProcessList)) {
                return count;
            }
            logger.info("线程{}开始推送数据", threadNo);
            for (PushProcess process : pushProcessList) {
                boolean isSuccess = pushUtil.sendRecord(process);
                if (isSuccess) {   //推送成功
                    //更新推送标识
                    pushProcessMapper.updateFlagById(process.getId(), 1);
                    count++;
                } else {  //推送失败
                    pushProcessMapper.updateFlagById(process.getId(), 2);
                }
            }
            logger.info("线程{}推送成功{}条", threadNo, count);
            return count;
        }
    }
}

Guess you like

Origin blog.csdn.net/liuerchong/article/details/123866102