Setting the number of concurrent threads in multiple threads is reasonable

1. Origin of demand

Web-Server usually has a configuration, the maximum number of worker threads, back-end services generally have a configuration, the number of threads in the worker thread pool, the configuration of the number of threads, different business architects have different experience values, some businesses are set to CPU core 2 times the number, some businesses are set to 8 times the number of CPU cores, and some businesses are set to 32 times the number of CPU cores.
What is the basis for setting the "number of working threads" and how much is set to maximize CPU performance are the issues to be discussed in this article.

2. Some common perceptions

Before going further in the discussion, first agree on some common perceptions by asking questions.
Question: Is it better to set the number of worker threads as larger?
Answer: It’s definitely not.
1) First, the number of server CPU cores is limited, and the number of concurrent threads is limited. It does not make sense to set 10000 worker threads for 1 core CPU.
2) Thread switching is costly. If thread switching is too frequent, it will Will degrade performance

Question: When the sleep() function is called, does the thread keep occupying the CPU?
Answer: Do not occupy, it will give up the CPU when waiting, and use the
sleep() function for other threads that need CPU resources to make some blocking calls, such as blocking accept() in network programming [waiting for client connection] and Blocking recv() [Waiting for downstream response] does not occupy CPU resources

Question: If the CPU is a single core, does it make sense to set up multiple threads and can it improve concurrent performance?
Answer: Even if it is a single core, it makes sense to use multithreading.
1) Multithreaded coding can make our service/code clearer. Some IO threads send and receive packets, some Worker threads perform task processing, and some Timeout threads perform timeout detection.
2) If there is a task that has been occupying CPU resources for calculations, then adding threads at this time will not increase concurrency. For example, a code like this
while(1){ i++;}
This code has been occupying CPU resources for calculations, which will make the CPU The occupancy rate reaches 100%.
3) Generally speaking, Worker threads generally do not always occupy the CPU for calculations. At this time, even if the CPU is single-core, adding Worker threads can also improve concurrency, because when this thread is resting, other threads can continue working

Three, common service thread model

Understanding the common service thread model helps to understand the principle of service concurrency. Generally speaking, the common service thread models on the Internet include the following two types of
IO thread and worker thread decoupling models through queues

As shown in the figure above, most Web-Server and service frameworks use such a threading model of "IO thread and Worker thread decoupling via queue":
1) There are a few IO threads that listen to requests sent from upstream and perform transceiver package (producer)
2) have one or more tasks queue, as IO thread worker thread asynchronous decoupling of the data transmission channel (critical resources)
3) multiple worker threads to perform tasks really positive (consumer)
this The threading model is widely used and fits most scenarios. The feature of this threading model is that the working threads are synchronously blocked to execute tasks (recall how Java programs are executed in tomcat threads, and how tasks are executed in dubbo worker threads) Therefore, the concurrency capacity can be increased by increasing the number of Worker threads. The focus of today’s discussion is "How much the maximum concurrency can be achieved by setting the number of Worker threads in this model".

The pure asynchronous threading model
does not block anywhere. This threading model only needs to set a small number of threads to achieve high throughput. Lighttpd has a single-process single-threaded mode with strong concurrent processing capabilities, so it is used Of this model. The disadvantages of this model are:
1) If single-threaded mode is used, it is difficult to take advantage of the advantages of multiple CPUs and multiple cores.
2) Programmers are more accustomed to writing synchronous code. The way of callback has an impact on the readability of the code and demands more on programmers. High
3) The framework is more complex and often requires server-side transceiver components, server-side queues, client-side transceiver components, client-side queues, context management components, finite state machine components, and timeout management components.
However, this model is not the focus of today’s discussion. .

Fourth, the working mode of the worker thread

Understanding the working mode of worker threads is very helpful to quantitatively analyze the setting of the number of threads:

The figure above is a typical processing process of a worker thread. From the start of the process to the end of the process, there are 7 steps in the process of the task:
1) Take out the task from the work queue and perform some local initialization calculations, such as http protocol analysis , Parameter analysis, parameter verification, etc.
2) Access the cache to get some data
3) After getting the data in the cache, perform some local calculations, which are related to business logic
4) Call the downstream service through RPC to get some data, or Let the downstream service handle some related tasks
5) After the RPC call is over, perform some local calculations, how the calculations are related to business logic
6) Access the DB for some data operations
7) After operating the database, do some finishing work, the same finishing Work is also local calculations, related to business logic

Analyzing the time axis of the entire processing, we will find that:
1) In steps 1, 3, 5, and 7 [Pink time axis in the figure above], threads need to take up CPU for local business logic calculations
2) and steps 2, 4, and 6 In [orange time axis in the above figure], the thread is in a state of waiting for results during access to cache, service, and DB, and does not need to occupy the CPU. Further decomposition, this "waiting for results" time is divided into three parts:
2.1) Request to be transmitted to downstream cache, service, DB on the network
2.2) Downstream cache, service, DB for task processing
2.3) Cache, service, DB upload messages on the network back to the worker thread

Five, quantitative analysis and reasonable setting of the number of working threads

Finally, let's answer the question of how reasonable the number of worker threads is.
Through the above analysis, during the execution of the Worker thread, one part of the calculation time needs to occupy the CPU, and the other part of the waiting time does not need to occupy the CPU. Through quantitative analysis, such as logging for statistics, it is possible to count the entire Worker thread execution process. The ratio of the two parts of time, for example:
1) Time axis 1, 3, 5, 7 [Pink time axis in the figure above] The calculation execution time is 100ms
2) Time axis 2, 4, 6 [Orange time axis in the figure above] The waiting time is also 100ms
. The result is that the calculation and waiting time of this thread is 1:1, that is, 50% of the time is calculated (occupying CPU), and 50% of the time is waiting (not occupying CPU):
1) Assumption At this time, it is a single core, and the CPU can be fully utilized by setting it to 2 worker threads, and the CPU can run to 100%.
2) Assuming that it is N cores at this time, setting it to 2N job sites can fully utilize the CPU , Let the CPU run to N*100%

Conclusion:
N-core server, through the single-threaded analysis of the business execution, the local computing time is x and the waiting time is y, then the number of working threads (thread pool threads) is set to N*(x+y)/x, which allows the CPU The utilization rate is maximized.

Experience:
Generally speaking, for non-CPU-intensive services (encryption, decryption, compression, decompression, search and sorting, and other services are CPU-intensive services), the bottleneck is in the back-end database, and the local CPU calculation time is very small, so set a few Ten or hundreds of worker threads are also possible.

6. Conclusion

N-core server, through the single-thread analysis of the business execution, the local computing time is x and the waiting time is y, then the number of working threads (thread pool threads) is set to N*(x+y)/x, which can make use of the CPU Maximize the rate.

Guess you like

Origin blog.csdn.net/datuanyuan/article/details/109049350