Calculation of the number of concurrent thread pools

Using the thread pool in the case of high concurrency can effectively reduce the time cost and resource overhead of thread creation and release. If the thread pool is not used, the system may create a large number of threads, resulting in the consumption of system memory and "excessive switching". (The processing mechanism adopted in the JVM is time slice rotation, which reduces the mutual switching between threads)
So in the case of high concurrency, how do we choose the optimal number of threads? What is the principle of selection? The technical director of Qunar.com asked me this question, so here is a summary.
The first one:
To learn more, please pay attention to the official account " code miscellaneous forum "!
If it is a CPU-intensive application, the thread pool size is set to N+1; (for computing-intensive tasks, on a system with N processors, when the thread pool size is N+1, usually the best Excellent efficiency. (Even when computationally intensive threads are occasionally suspended due to missing faults or other reasons, this extra thread ensures that CPU clock cycles are not wasted. Excerpted from "Java Concurrency In Practice")

如果是IO密集型应用,则线程池大小设置为2N+1

Tasks can generally be divided into: CPU-intensive, IO-intensive, and mixed. For different types of tasks, thread pools of different sizes need to be allocated. For CPU-intensive tasks, try to use a smaller thread pool, generally equal to the number of CPU cores + 1. Because CPU-intensive tasks make the CPU usage rate very high, if too many threads are opened, the number of context switches can only be increased, which will bring additional overhead. IO-intensive tasks can use a slightly larger thread pool, generally 2*CPU core number. The CPU usage rate of IO-intensive tasks is not high, so the CPU can be used to process other tasks while waiting for IO to make full use of CPU time. Mixed tasks can divide tasks into IO-intensive and CPU-intensive tasks, and then use different thread pools to process them. As long as the execution time of the two tasks is not much different after the division, it will be more efficient than serial execution. Because if the execution time of the two tasks is very different after division, the task that is executed first will wait for the task that is executed later, and the final time still depends on the task that is executed later, and task splitting and merging are also added. The expense outweighs the gain.

The second one, let’s start with a test question I encountered before, assuming that the TPS (Transaction Per Second or Task Per Second) of a system is required to be at least 20, and then assuming that each Transaction is completed by a thread, and continue to assume that the average TPS per second is 20. The time for a thread to process a Transaction is 4s. The problem then becomes:

How to design the thread pool size so that 20 Transactions can be processed within 1s?
To learn more, please pay attention to the public account " code miscellaneous forum "!
The calculation process is very simple. The processing capacity of each thread is 0.25TPS, so to achieve 20TPS, obviously 20/0.25=80 threads are needed.
This is theoretically true, but in reality, the fastest part of a system is the CPU, so it is the CPU that determines the upper limit of a system's throughput. Enhanced CPU processing capability can increase the upper limit of system throughput. Need to add CPU throughput into consideration when considering. In the IO optimization document, there is such a formula:
optimal number of threads = ((thread waiting time + thread CPU time) / thread CPU time) * number of CPUs,
that is, the higher the proportion of thread waiting time, the more threads are required. The higher the percentage of thread CPU time, the fewer threads are required.
However, according to the short-board effect, the real system throughput cannot be calculated purely based on the CPU. To improve system throughput, we need to start from the "system short board" (such as network delay, IO):

尽量提高短板操作的并行化比率,比如多线程下载技术
增强短板能力,比如用NIO替代IO


尽量提高短板操作的并行化比率,比如多线程下载技术
增强短板能力,比如用NIO替代IO

The first one can be related to Amdahl's law, which defines the formula for calculating the speedup ratio of a serial system after parallelization:

加速比=优化前系统耗时 / 优化后系统耗时

The larger the speedup ratio, the better the optimization effect of system parallelization. Addahl's law also gives the relationship between the system parallelism, the number of CPUs, and the speedup ratio. The speedup ratio is Speedup, the system serialization ratio (referring to the proportion of serially executed code) is F, and the number of CPUs is N:

Speedup <= 1 / (F + (1-F)/N)

When N is large enough, the smaller the serialization ratio F is, the larger the speedup ratio Speedup will be.
To learn more, please pay attention to the public account " code miscellaneous forum "!

At this time, the question of whether the thread pool must be more efficient than the thread is raised?

The answer is no. For example, Redis is single-threaded, but it is very efficient, and basic operations can reach 100,000 levels/s. From the perspective of threads, this is partly due to:

多线程带来线程上下文切换开销,单线程就没有这种开销
锁

Of course, the more essential reason for "Redis is fast" is:
Redis is basically a memory operation. In this case, a single thread can use the CPU very efficiently. The applicable scenario of multi-threading is generally: there is a considerable proportion of IO and network operations.

In general, the application situation is different, and the multi-thread/single-thread strategy is different; in the case of thread pool, different estimates have the same purpose and starting point.

Reference: http://ifeve.com/how-to-calculate-threadpool-size/
https://www.zhihu.com/question/38128980
Learn from each other and make progress together!

To learn more, please pay attention to the public account " code miscellaneous forum "!

Guess you like

Origin blog.csdn.net/qq_34417408/article/details/78895573