C++ multi-threaded system programming essentials of muduo library learning 04-multi-threading and IO

Dongyang's study notes

When doing multi-threaded network programming, several natural problems are:

  1. How to deal with IO? (A file is only read and written by one thread in a process, this approach is obviously correct)
  2. Can multiple threads simultaneously read and write the same socket file descriptor? ( Preferably not )
  3. We know that using multiple threads to process multiple sockets at the same time can usually improve efficiency. How about processing the same socket?

The system call is thread-safe, but the use is not thread-safe

  • First of all, the system call for operating file descriptors is thread-safe. We don’t have to worry about multiple threads operating on file descriptors at the same time, which will cause the process or the kernel to crash.

  • but,

    It is really troublesome for multiple threads to operate the same socket file descriptor at the same time. I think it is not worth the gain. The situations that need to be considered are as follows:

    • If a thread is blocking and reading a socket, and another thread closes the socket
    • If a thread is blocking and accepting a listening socket, and another thread closes the socket
    • To make matters worse, one thread is preparing to read a socket, and another thread closes the socket; the third thread happens to open another file descriptor, whose fd number is exactly the same as the previous socket. This way the logic of the program is confused (see "Packing File Descriptors with RAII" below).
  • I think the above situations all reflect the problems in the logic design of the program

  • Now suppose that you do not consider closing the file descriptor, only considering reading and writing, the situation is not much better.

    Because the characteristic of socket read and write is that integrity is not guaranteed, it is possible that only 20 bytes will be returned when reading 100 bytes, and the write operation is the same:

    • **If two threads read the same TCP socket at the same time,** two threads receive part of the data at the same time, how to put the data into a complete message? How to know which part of the data arrived first?
    • **If two threads write to the same TCP socket at the same time,** each thread only sends out half of the message, how will the receiver deal with the data received?
    • **If you assign a lock to each TCP socket, **allowing only one thread to read or write to this socket at the same time seems to "solve" the problem, but it is better to let the same thread operate the socket directly. simple
    • **For non-blocking IO, the situation is the same, ** and the integrity and atomicity of sending and receiving messages is almost impossible to guarantee with locks, because this will block other IO threads
  • In this way, in theory, only read and write can be divided into two threads, because TCP sockets are bidirectional IO. The question is, is it really worth splitting read and write into two threads?

Multithreading will not speed up disk IO

  • The above discussion is all about network IO, so can multiple threads speed up disk IO?
    • First, avoid the race condition of lseek/read (see "Security of C/C++ System Libraries" above). After doing this, as far as I can see, using multiple threads to read or write the same file will not speed up
    • Not only that, multiple threads read or write multiple files on the same disk separately, it is not necessarily speed up . Because each disk has an operation queue, the read and write requests of multiple threads are queued up to the kernel. Only when the kernel caches most of the data, multi-threaded reading of these hot data may be faster than single-threaded
  • One idea of ​​multi-threaded disk IO is : each disk is equipped with a thread, and all the IO for this disk is moved to the same thread, which may avoid or reduce lock contention in the kernel.
  • I think that the program should be written in an "obviously correct" way. A file is only read and written by one thread in a process. This approach is obviously correct.

Principles for multithreading

  • For the sake of simplicity, I think the principles that a multithreaded program should follow are:
    • Each file descriptor is operated by only one thread , thus easily solving the problem of the order of message sending and receiving, and avoiding various race conditions of closing file descriptors
    • A thread can manipulate multiple file descriptors, but a thread cannot manipulate file descriptors owned by other threads
    • These are not difficult to achieve, the muduo network library has encapsulated these details
  • epoll also follows the same principle:
    • The Linux documentation does not explain what happens when a thread is blocked on epoll_wait() and another thread adds a new monitoring fd to this epoll fd. Will the event on the new fd return in this epoll_wait() call?
    • To be safe 我们应该把对同一个epoll fd的操作(添加、删除、修改、等待) **都放到同一个线程中执行, **This is exactly why we need muduo::EventLoop::wakeup()
  • There are two exceptions to this rule:
    • For disk files, when necessary, multiple threads can call pread/pwrite to read and write the same file at the same time (see pred/pwrite: https://blog.csdn.net/qq_41453285/article/details/88936714)
    • For UDP, because the protocol itself guarantees the atomicity of messages , under appropriate conditions (for example, messages are independent of each other), multiple threads can read and write the same UDP file descriptor at the same time

Guess you like

Origin blog.csdn.net/qq_22473333/article/details/113528787