Applicable occasions of multi-threaded server for muduo library learning

"Server development" is all-encompassing. For the meaning of "server development" in this article, please refer to the article "Common Models". It is described in one sentence: Linux user-mode long-running network applications without user interfaces running on multi-core machines. "Long-term running" does not mean that the program will not restart 7x24, but the program will not exit because there is nothing to do, it will wait for the next request. For example, wget is not long-running, httpd is long-running.

Rectification

Same as the previous article, the "process" in this article refers to the product of the fork() system call. "Thread" refers to the product of pthread_create(), and the pthreads I refer to are NPTL. Each thread is generated by clone() and corresponds to a core task_struct. The development language used in this article is C++, and the operating environment is Linux.

First of all, a distributed system composed of multiple machines must be multi-process (in the literal sense), because processes cannot cross OS boundaries. Under this premise, we focused our attention on one machine, an ordinary server with at least 4 cores. If you want to provide a service or perform a task on a multi-core machine, the available modes are:

  1. Run a single threaded process
  2. Run a multi-threaded process
  3. Run multiple single-threaded processes
  4. Run multiple multi-threaded processes

The comparison between these models is already a cliché, to summarize briefly:

  • Mode 1 is not scalable (scalable), unable to use the computing power of multi-core machines;
  • Mode 3 is currently recognized as the mainstream mode. It has two sub-modes:
    • 3a Simply run multiple copies of the process in mode 1, if multiple tcp ports can be used to provide external services;
    • 3b The main process + worker process, if it must be bound to a tcp port, such as httpd+fastcgi.
  • Mode 2 is despised by many people, who think that multi-threaded programs are difficult to write and have no advantages over Mode 3;
  • Mode 4 is even more referent. Not only does it fail to combine the advantages of 2 and 3, it converges the disadvantages of both.

This article mainly wants to discuss the pros and cons of Mode 2 and Mode 3b, namely: when a server program should be multi-threaded.

Functionally speaking, there is nothing that multi-threading can do that single-threading cannot do, and vice versa. They are all state machines (I'm glad to see counterexamples). In terms of performance, whether it is an IO bound or a CPU bound service, multithreading has no advantage . So why use multithreading?

Before answering this question, let me talk about the occasions where single thread must be used.

Where a single thread must be used

  • The program may fork()
  • 限制程序的 CPU 占用率
  • 看门狗进程Must be single threaded

As far as I know, there are two situations where single thread must be used:

  1. The program may fork()
  2. Limit the CPU usage of the program

Let me talk about fork() first, I mentioned in " The Enlightenment of New Linux System Calls ":

Fork() generally cannot be called in a multithreaded program, because Linux's fork() only clones the thread of control of the current thread, and does not clone other threads. That is to say, fork() cannot produce a multi-threaded child process like the parent process at once, and Linux does not have a system call such as forkall(). forkall() is actually difficult to do (syntactically), because other threads may wait on the condition variable, may be blocked on the system call, may wait for mutex to cross into the critical section, or may be in intensive calculations. These are not good enough to be moved to the sub-process.

To make matters worse, if some other thread a has already acquired mutex at the moment of fork(), since there is no such "thread a" in the new process from fork(), then this mutex will never be released, new The process can no longer acquire that mutex, otherwise it will deadlock. (This point is only speculation, and no experiment has been done yet. It is not ruled out that fork() will release all mutexes.)

In summary, a program designed to call fork() must be single-threaded, such as the "watchdog process" I mentioned in the article "Revelation". Multithreaded programs are not unable to call fork(), but to do so will encounter a lot of trouble, I can't think of a reason for doing it.

There are generally two behaviors after a program fork():

  1. Execute exec() immediately and transform it into another program. For example, shell and inetd; another example is lighttpd fork() to fork out the child process, and then run the fastcgi program. Or the daemon process that is responsible for starting the job running on the compute nodes in the cluster (what I call the "watchdog process").
  2. Do not call exec(), continue to run the current program. Either communicate with the parent process through a shared file descriptor to complete the task in coordination; or take the file descriptor passed by the parent process and complete the work independently, such as the web server NCSA httpd in the 1980s.

Among these behaviors, I think only " 看门狗进程" must be single-threaded, and the others can be replaced with multi-threaded programs (in terms of functionality).

Single-threaded programs can limit the CPU usage of the program.

This is easy to understand. For example, on an 8-core host, even if a single-threaded program occurs busy-wait (either because of a bug or because of overload), its CPU usage is only 12.5%, that is, it occupies 1 core. In this worst case, the system still has 87.5% of the computing resources available for other service processes.

Therefore, for some auxiliary programs, if it must run on the same machine as the main functional process (for example, it needs to monitor the status of other service processes), then making it single-threaded can avoid excessive grabbing of the system's computing resources.

Process-based distributed system design

The article "Common Models" mentioned that the software design and function division of distributed systems should generally be based on "processes". I advocate the use of multithreading, not that the entire system is implemented in one process, but that after the function is divided, when each type of service process is implemented, multithreading can be used to improve performance when necessary. For the entire distributed system, to be able to scale out, that is, to enjoy the benefits of adding machines.

For upper-level applications, the amount of code for each process is controlled below 100,000 lines of C++, which does not include the amount of code for the ready-made library. In this way, each process can be fully understood by one brain, and there will be no confusion. (Actually, I would like to say 50,000 lines.)

Here is a good article from Google " Introduction to Distributed System Design ". The finishing touch is: distributed system design, which is design for failure.

This article continues to discuss when a service process should use multi-threading, let's talk about the advantages of single-threaded first.

Advantages of single-threaded programs

  • From a programming point of view, the advantages of single-threaded programs needless to say: simplicity. The structure of the program is generally an event loop based on IO multiplexing, as stated in the "Common Model". Or, as Yun Feng said , the direct use blocking IO.
  • Disadvantages: 1. Yes will happen 优先级反转; 2. Will cause relatively high delay

The typical code framework of event loop is:

while (!done) {
    
    
 int retval = ::poll(fds, nfds, timeout_ms);
 if (retval < 0) {
    
    
  处理错误
 } else {
    
    
  处理到期的 timers
  if (retval > 0) {
    
    
   处理 IO 事件
  }
 }
}

The event loop has an obvious shortcoming, it is non-preemptive. Assuming that the priority of event a is higher than that of event b, it takes 1ms to process event a and 10ms to process event b. If event b occurs earlier than a, then when event a arrives, the program has left the poll() call to start processing event b. Event a has to wait 10ms to have a chance to be processed, and the total response time is 11ms. This is equivalent to a priority inversion.

This shortcoming can be overcome by multi-threading, which is also the main advantage of multi-threading.

Are there performance advantages to multithreaded programs?

I said earlier that whether it is IO bound or CPU bound services, multi-threading has no absolute performance advantage. Here is a detailed explanation of the meaning of this sentence.

This sentence means that if the IO can be full with a small amount of CPU load, or the CPU can be full with a small amount of IO traffic, then multiple threads are useless. for example:

  1. For static web servers or ftp servers, the CPU load is relatively light, and the main bottleneck is disk IO and network IO. At this time, a single-threaded program (mode 1) can fill up IO. Using multithreading does not improve throughput, because the IO hardware capacity has been saturated. In the same way, increasing the number of CPUs at this time cannot improve throughput.
  2. It is rare that the CPU runs full, so I have to make up an example here. Suppose there is a service whose input is n integers, and ask if m integers can be selected from them so that the sum is 0 (here n <100, m> 0). This is the famous subset sum problem, which is NP-Complete. For such a "service", even a small value of n will kill the CPU. For example, if n = 30, one input is no more than 120 bytes (32-bit integers), and the CPU computing time may be as long as several minutes. For this kind of application, Mode 3a is the most suitable. It can take advantage of multi-core and the program is simple.

In other words, no matter whether any party reaches the bottleneck early, multithreaded programs have no advantage.

Speaking of this, some readers may already be impatient: you have talked so much, and you are talking about the benefits of single-threading. What is the use of so many threads?

Scenarios for multi-threaded programs

Increase 响应速度, let IO and "calculation" overlap each other, reduce latency**.

I think the applicable scenarios of multi-threading are: improve response speed, allow IO and "calculation" to overlap each other, and reduce latency .

Although multithreading cannot improve absolute performance, it can improve average response performance.

A program that needs to be multi-threaded must generally satisfy:

  • There are multiple CPUs available. 单核机器上多线程的优势不明显.
  • 线程间有共享数据. If there is no shared data, use model 3b. Although we should minimize the shared data between threads, it does not mean that there is no;
  • 共享的数据是可以修改的Instead of a static constant table. If the data cannot be modified, then shared memory can be used between processes, and Mode 3 will be sufficient;
  • Provide non-homogeneous service. That is, the response to the event has a priority difference, and we can use a dedicated thread to handle the event with a high priority. Prevent priority reversal;
  • latencyAnd throughputequally important, is not the simple logic IO bound or CPU bound program;
  • Take advantage of asynchronous operations. Such as logging. Whether writing log files to disk or sending messages to log server, the critical path should not be blocked;
  • Yes scale up(expansion). A good multi-threaded program should be able to enjoy the benefits of increasing the number of CPUs. The current mainstream is 8 cores, and 16 core machines will soon be used.
  • Has predictable performance. As the load increases, the performance decreases slowly, and then drops rapidly after exceeding a certain critical point. The number of threads generally does not change with load.
  • Multi-threading 有效地划分责任与功能makes the logic of each thread relatively simple, single task, and easy to code. Instead of stuffing all the logic into an event loop, just like the Win32 SDK program.

These conditions are relatively abstract, here is a concrete (albeit fictitious) example.

Suppose you want to manage a cluster of Linux servers. There are 8 computing nodes and 1 control node in this cluster. The configuration of the machine is the same, dual-channel quad-core CPU, gigabit network interconnection. Now we need to write a simple fleet management software (refer to LLNL's SLURM ), this software consists of three programs:

  • The master running on the control node, this program monitors and controls the state of the entire cluster.
  • The slave running on each computing node is responsible for starting and terminating the job, and monitoring the resources of the machine.
  • The client command line tool for end users, used to submit jobs.

According to the previous analysis, slave is a "watchdog process", it will start other job processes, so it must be a single-threaded program. In addition, it should not take up too much CPU resources, which is also suitable for single-threaded models.

The master should be a multi-threaded program in mode 2:

  • It occupies an 8-core machine exclusively. If model 1 is used, 87.5% of the CPU resources are wasted.
  • The state of the entire cluster should be able to be completely stored in memory, and these states are shared and variable. If you use mode 3, then the state synchronization between processes will become a big problem. And if shared memory is used a lot, it is tantamount to a multi-threaded program with a multi-process cloak.
  • The main performance indicator of the master is not throughput, but latency, which means responding to various events as quickly as possible. It almost never runs out of IO or CPU.
  • The events monitored by the master have different priorities. The processing priority of a program's normal end of operation and abnormal crash is different, and the priority of the two alarm conditions of the computing node's disk full and the chassis temperature is also different. If you use a single thread, priority inversion may occur.
  • Assuming that there is a TCP connection between the master and each slave, the master uses 2 or 4 IO threads to process 8 TCP connections can effectively reduce the delay.
  • The master writes logs to the local hard disk asynchronously, which requires the logging library to have its own IO thread.
  • The master may want to read and write the database, then the third-party library that the database connects to may have its own thread and call back the master's code.
  • The master needs to serve multiple clients, and the use of multithreading can also reduce client response time. In other words, it can use 2 IO threads to handle communication with clients.
  • The master can also provide a monitor interface to broadcast (pushing) the status of the cluster so that users do not need to actively poll (polling). If this function is done with a separate thread, it will be easier to implement and will not mess up other main functions.
  • The master opened 10 threads in total:
    • 4 IO threads used to communicate with slaves
    • 1 logging thread
    • 1 database IO thread
    • 2 IO threads communicating with clients
    • 1 main thread, used to do some background work, such as job scheduling
    • 1 pushing thread, used to actively broadcast the status of the fleet
  • Although the number of threads is slightly more than the number of cores, these threads are often idle. You can rely on the process scheduling of the OS to ensure controllable delay.

In summary, it is natural and efficient for the master to be written in a multi-threaded manner.

Classification of threads

According to my experience, threads in a multi-threaded service program can be roughly divided into three categories:

  1. IO thread, the main loop of this kind of thread is io multiplexing, waiting on the select/poll/epoll system call. This type of thread also handles timed events. Of course, its function is not only IO, some calculations can also be put into it.
  2. Computing thread, the main loop of this kind of thread is the blocking queue, waiting on the condition variable. Such threads are generally located in the thread pool.
  3. Threads used by third-party libraries, such as logging, and database connection.

Server programs generally do not start and terminate threads frequently. Even, in the programs I have written, create thread is only called when the program is started, and is not called during the running of the service.

In the multi-core era, multi-threaded programming is inevitable, and the "ostrich algorithm" is not the answer.

Guess you like

Origin blog.csdn.net/qq_22473333/article/details/113515020