Detailed explanation of muduo library learning muduo multithreading model (common concurrent network server design scheme)

Dongyang's study notes

12 common solutions

The following table is the author's summary of 12 common schemes, of which:

  • Interoperability : When developing chat services, whether multiple clients can easily exchange data between connections
  • Sequentiality : In response services such as httpd/Sudoku, if a client connects to send multiple requests, whether the calculated multiple responses are 相同的顺序sent back to the client (this refers to the natural state, without deliberate synchronization).
    Insert picture description here

Plan 1 ( fork()-per-client)

The traditional Unix concurrent network programming scheme, [UNP] is called child-per-client or fork()-per-client, also commonly known as process-per-connection. This scheme:

  • Suitable for scenarios where the number of concurrent connections is small.
  • The workload suitable for response is far greater than the overhead of fork. ( Such as database server )
  • Suitable for long connection, but not suitable for short connection
#!/usr/bin/python

from SocketServer import BaseRequestHandler, TCPServer
from SocketServer import ForkingTCPServer, ThreadingTCPServer

class EchoHandler(BaseRequestHandler):
    def handle(self):
        print "got connection from", self.client_address
        while True:
            data = self.request.recv(4096)
            if data:
                sent = self.request.send(data)    # sendall?
            else:
                print "disconnect", self.client_address
                self.request.close()
                break

if __name__ == "__main__":
    listen_address = ("0.0.0.0", 2007)
    server = ForkingTCPServer(listen_address, EchoHandler)
    server.serve_forever()

Plan 2 ( thread-per-connection)

This is a traditional Java network programming solution thread-per-connection. Before Java 1.4 introduced NIO, Java network services mostly adopted this solution:

  • The cost is much smaller than option 1.
  • Still not suitable for short connections
  • The scalability of this scheme受线程数的限制
#!/usr/bin/python

from SocketServer import BaseRequestHandler, TCPServer
from SocketServer import ForkingTCPServer, ThreadingTCPServer

class EchoHandler(BaseRequestHandler):
    def handle(self):
        print "got connection from", self.client_address
        while True:
            data = self.request.recv(4096)
            if data:
                sent = self.request.send(data)    # sendall?
            else:
                print "disconnect", self.client_address
                self.request.close()
                break

if __name__ == "__main__":
    listen_address = ("0.0.0.0", 2007)
    # 这里改了
    server = ThreadingTCPServer(listen_address, EchoHandler)
    server.serve_forever()

Plan 3 and Plan 4

Scheme 3 is an optimization of scheme 1, and scheme 4 is an optimization of scheme 2. Both schemes are Apache httpdlong-term use of the program

The above schemes are all 阻塞式network programming, the program flow (thread of control) 通常阻塞在 read(),
but TCP is a full-duplex protocol, and supports read() and write() operations at the same time:

  • How to write data at the same time when TCP connection is reading?
  • The client reads from the network and stdin at the same time?

One solution is to adopt IO 复用. That is select, poll, epoll, kqueue and a series of " 多路选择器" so that a thread of controlcapable of handling multiple connections.

  • Reuse is not IO connection but thread
  • Using select/poll almost certainly requires cooperation non-blocking IO
  • The use non-blocking IOmust be used in conjunction with the application layer buffer

Fortunately, there are ready-made solutions that can help us solve the above troubles-Reactor mode is a good choice. Now there are many common libraries, such as libevent, muduo, Netty, Twisted, POEfor us to choose.

ReactorMentioned in another blog: 5 common IO modes

A Python example

The following code is not turned on non-blocking, nor does it consider the incomplete data transmission (&28).
First define a file descriptor to socket mapping (&14)
The main body of the program is an event loop (&15~&22)
Whenever an IO event occurs, different operations are performed for different file descriptors . (&16, &17)
For listening fd, accept (accept) the new connection and register it to the IO event watch list (watch list), and then add the connection to the connections dictionary (&18~&23)
For client connections, read and Echo the data and handle the closing of the connection (&24~&32)

#!/usr/bin/python

import socket
import select

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind(('', 2007))
server_socket.listen(5)
# server_socket.setblocking(0)
poll = select.poll() # epoll() should work the same
poll.register(server_socket.fileno(), select.POLLIN)

connections = {
     
     }
while True:
   events = poll.poll(10000)  # 10 seconds
   for fileno, event in events:
       if fileno == server_socket.fileno():
           (client_socket, client_address) = server_socket.accept()
           print "got connection from", client_address
           # client_socket.setblocking(0)
           poll.register(client_socket.fileno(), select.POLLIN)
           connections[client_socket.fileno()] = client_socket
       elif event & select.POLLIN:
           client_socket = connections[fileno]
           data = client_socket.recv(4096)
           if data:
               client_socket.send(data) # sendall() partial?
           else:
               poll.unregister(fileno)
               client_socket.close()
               del connections[fileno]

But the method of hiding the business code in a big loop like the above is very unfavorable for future function expansion and code maintenance.

The meaning of Rector mode is to distribute messages to the processing function provided by the user, and keep the general code of the network part unchanged, independent of the user's processing logic.

Single-threaded Reactor

The program execution sequence of the single-threaded Reactor is shown in the figure below (left). When there is no event, the thread waits on select / poll / epoll_wait. After the event arrives, the network library processes the IO, and then notifies (calls back) the message to the client code. The thread where the Reactor event loop is located is usually called IO线程. Usually the network library is responsible for reading and writing the socket, and the user code is responsible for decoding, calculation, and encoding .

Note that since there is only one thread, events are processed sequentially, and one thread can only do one thing at a time. In this kind of cooperative multitasking, the transaction can 优先级not be guaranteed, because the thread will not be affected by data or events on other connections during the period from "after poll returns" to the next call of "polling before waiting". seize.
If we want to delay the calculation, we should register 超时回调instead of sleep
Insert picture description here

Option 5 (single-threaded Reactor)

This article uses it 方案5as a benchmark to compare other solutions. This solution:
Advantages:

  • The data is sent and received by the network library, and the program only cares about business logic

Disadvantages:

  • Suitable for IO-intensive applications, not suitable for CPU-intensive applications, because it is more difficult to use the power of multi-core CPUs
  • Compared with Option 2, Scheme 5 delay processing network messages may be slightly larger , since more than a pollsystem call

Reactor code schematic

  • In order to save space, global variables are used directly, and exceptions are not handled
  • The core of the program is still the event loop (&42~&46)
  • By handling the incident handleis forwarded to each function to go, rather than being concentrated in a lump
    • The processing function of listening fd is handle_accept, which registers the handle of the client connection.
    • The processing function of ordinary client connection is handle_request, which separates the two events of connection disconnection and data arrival

注意: When using non-blocking IO + event-driven programming , 一定pay attention 避免在事件回调中执行耗时的操作to including blocking IO, otherwise it will affect the response of the program. This is very similar to the Windows GUI message loop.

#!/usr/bin/python

import socket
import select

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind(('', 2007))
server_socket.listen(5)
# serversocket.setblocking(0)

poll = select.poll() # epoll() should work the same
connections = {
     
     }
handlers = {
     
     }

def handle_input(socket, data):
   socket.send(data) # sendall() partial?

def handle_request(fileno, event):
   if event & select.POLLIN:
       client_socket = connections[fileno]
       data = client_socket.recv(4096)
       if data:
           handle_input(client_socket, data)
       else:
           poll.unregister(fileno)
           client_socket.close()
           del connections[fileno]
           del handlers[fileno]

def handle_accept(fileno, event):
   (client_socket, client_address) = server_socket.accept()
   print "got connection from", client_address
   # client_socket.setblocking(0)
   poll.register(client_socket.fileno(), select.POLLIN)
   connections[client_socket.fileno()] = client_socket
   handlers[client_socket.fileno()] = handle_request

poll.register(server_socket.fileno(), select.POLLIN)
handlers[server_socket.fileno()] = handle_accept

while True:
   events = poll.poll(10000)  # 10 seconds
   for fileno, event in events:
       handler = handlers[fileno]
       handler(fileno, event)

Solution 6 (reactor + thread-per-task)

This is a transitional solution. After receiving the Sudoku request, it is not calculated on the Reactor thread, but a new thread is created for calculation. This is a very rudimentary multi-threaded application.

  • It creates a new thread for each request (rather than each connection). ( This overhead can be avoided with the thread pool, that is, Option 8 )
  • This program is out of order, that create multiple threads to calculate multiple requests received on the same connection, that the order of a calculation result is uncertain
  • This is also the reason why id was used in the protocol design at the beginning, which is convenient for correspondence

Solution 7 (reactor + worker thread)

In order to determine the order of the returned results, Scheme 7 为每个连接创建一个计算线程, each connection request is sent to the same thread for calculation, first come first served.

  • Transition plan
  • The number of concurrent connections is limited by the number of threads
  • Perhaps as a direct use blocking IO of thread-per-connectionthe program 2

Comparison of scheme 7 and scheme 6

  • In scheme 6, for a long series of burst requests sent on a certain TCP connection, it may occupy all 8 cores
  • Scheme 7, occupying 12.5% ​​of CPU resources at most.
  • These two solutions have their own advantages and disadvantages, depending on the needs of the application scenario (in the end, fairness is important or burst performance is important)
  • 需要根据实际的应用场景做取舍

Solution 8 (reactor + thread poll)

In order to make up for the defect of creating a thread for each request in solution 6, solution 8 uses a fixed one 线程池.

The program structure is shown in the figure:

  • All IO work is completed in a Reactor thread, and the calculation task is handed overthread poll .
  • If 计算任务彼此独立, moreover IO的压力不大, this scheme is very applicable.
  • 有乱序返回的可能, The response needs to be matched based on the ID
    Insert picture description here

Code example

  • Another function of the thread pool is to perform blocking operations ( 即我们可以将阻塞操作从IO线程移到线程池中), such as database operations. This will not affect customer connections
  • If the IO pressure is relatively high and a Reactor cannot handle it, then you can consider Option 9.
void onMessage(const TcpConnectionPtr& conn, Buffer* buf, Timestamp)
{
     
     
      // ......
      if (!processRequest(conn, request))
      {
     
     
        conn->send("Bad Request!\r\n");
        conn->shutdown();
        break;
      }
      // ......
}
 
bool processRequest(const TcpConnectionPtr& conn, const string& request)
{
     
     
  // ......
  if (puzzle.size() == implicit_cast<size_t>(kCells))
  {
     
     
    threadPool_.run(std::bind(&solve, conn, puzzle, id));
  }
   // ......
}

Scheme 9 (Reactors in threads)

这是 muduo 内置方案, It is also the built-in multi-threading solution of Netty. Its unique features are:

  • one loop per thread. There is a Main Reactorcharge of accept (2) is connected, and then connected to a hanging in Sub Reactorthe
  • So that all the connection operations in the sub Reactorcompletion of which the thread. Multiple connections can be allocated to multiple threads to make full use of the CPU ( playing a similar load balancing effect ).
    Mutuo uses a fixed size, Reactor Polland the size of the pool is determined by the CPU.
    This scheme can allocate IO to multiple threads to prevent saturation
    of a Reactor. Compared with scheme 8, it reduces the two context transitions in and out of thread poll (because 方案9的请求的返回是有序of the
    strong adaptability ) , which is the default thread model of muduo.
    Insert picture description here

Code example

SudokuServer(EventLoop* loop, const InetAddress& listenAddr, int numThreads)
  : server_(loop, listenAddr, "SudokuServer"),
    numThreads_(numThreads),
    startTime_(Timestamp::now())
{
     
     
  server_.setConnectionCallback(
      std::bind(&SudokuServer::onConnection, this, _1));
  server_.setMessageCallback(
      std::bind(&SudokuServer::onMessage, this, _1, _2, _3));
  server_.setThreadNum(numThreads);  // 与server_basic.cc的为一差别
}

Scheme 10 (Reactors in processes)

  • This is a built-in solution of Nginx. If there is no interaction between the connections, this solution is also a good choice. Work processes are independent of each other
  • 可以热升级

Scheme 11 (reactors + thread poll)

  • Combine scheme 8 and scheme 9
  • That is, multiple Reactors are used to process IO, and thread pools are used to process calculations.
  • It is suitable for both burst IO (using multithreading to process IO on multiple connections) and burst computing applications (using thread pool to allocate computing tasks on one connection to multiple threads to do)
    . Use the solution in muduo 11 It is only necessary to add a line of server_.setThreadNum(numThreads) to the code of Scheme 8;
    Insert picture description here

one or more?(event loop(s))

  • ZeroMQ recommendations given in the manual are: 按照每千兆比特每秒的吞吐量配一个 event loop 的经验来设置 event loops的数目. According to this experience:
  1. When writing a network program running on Gigabit Ethernet, one event loop is enough
  2. If the program itself does not have much calculation, and the main bottleneck is the network bandwidth, you can follow this rule, that is, only use one event loop
  3. If the program is IO带宽较小, 计算量较大and 对延迟不敏感then you can put into the calculation thread pool, you can use only one event loop.

Hand over high-priority TCP connections to a separate event loop for processing

In muduo, there is no priority difference between connections belonging to the same event loop . This is done for this purpose 防止优先级反转.

  • For example, a service program has 10 heartbeat connections, and 10 data connection requests belong to an event loop. It stands to reason that heartbeat connections should be prioritized
  • 但是:If the data request arrives before the heartbeat connection, the event loop will call the corresponding event handle to process the data request. In this way, the heartbeat connection must wait for the next epoll_wait() to process it again

Guess you like

Origin blog.csdn.net/qq_22473333/article/details/113547285