A good article on three concurrent programming models of IO multiplexing, multi-process and multi-threading

Reprinted from https://blog.csdn.net/wan_hust/article/details/38441455#t1

Why do we use multi-threading/multi-process, multi-way io multiplexing, there is no doubt that it is to improve server performance and efficiency, it is no problem for a single process to handle several connections, but the server generally handles thousands of Connection, with so many connections, it is conceivable how inefficient a process is in processing. This is equivalent to a hotel with only one waiter serving all guests. At this time, we must be thinking of recruiting people for help. Threads ( equivalent to many waiters in restaurants ) handle high concurrency, but which model is better for handling high concurrency? At this time, let's take a look at the differences between these three models to choose.

Multiplex io model

I/O multiplexing principle: Allow the application to monitor multiple I/O ports at the same time to determine whether the operations on them can be performed, so as to achieve the purpose of time multiplexing. I saw an example in the book to explain the principle of I/O. I think it is very vivid. If you use monitoring water pipes (I/O ports) from 10 different places to see if there is water flow (that is, whether it is readable), then you need 10 Individuals (ie 10 threads or 10 places of code) to do this. If a certain technology (such as a camera) is used to convey the status of these 10 water pipes to a certain point, then only one person is required to monitor at that point, and multi-channel I/O such as select or epoll The multiplexing mechanism is like the function of the camera. They can feedback the status of multiple I/O ports to the same place, such as a specific file descriptor, so that the application only needs to use the corresponding select() or epoll_wait( ) System call blocking can pay attention to this place, which means that the multi-channel io model is much better than multi-threading and multi-process. The monitoring work does not need to create multiple threads and processes to handle it, only one select is needed.

Advantages and disadvantages of I/O multiplexing: Since I/O multiplexing is in the context of a single process, each logical process can access the entire address space of the process, so the overhead is significantly lower than that of multiple processes Many; the disadvantage is that the programming complexity is high.

Imagine a scenario where a server has millions of clients connected at the same time. If select is used alone, although select helps to monitor so many connections, it is still not enough for a single process to handle so many connections, which requires multi-threading , multi-process helps, so the best way to deal with high concurrency is to use the model of multiplexing io multiplexing + multithreading/multiprocessing



multi-process model

The easiest way to construct concurrency is to use processes, like the fork function. For example, a concurrent server that accepts client connection requests in the parent process and then creates a new child process to serve each new client.
Advantages of multiprocessing:
Each process is independent of each other, does not affect the stability of the main program, it does not matter if the child process crashes;
By increasing the CPU, the performance can be easily expanded;
It can minimize the impact of thread locking/unlocking and greatly improve performance, even if the module algorithm running on the thread is inefficient;
Each child process has 2GB address space and related resources, and the overall performance limit that can be achieved is very large
Disadvantages of multiprocessing:
The logic control is complex and needs to interact with the main program; 
It needs to cross process boundaries. If there is a large amount of data transmission, it is not very good. It is suitable for small data transmission and intensive computing. 
Multi-process scheduling overhead is relatively large;

multithreading model

Each thread has its own thread context, including a thread ID, stack, stack pointer, program counter, general purpose registers, and condition codes. All threads running in a process share the entire virtual address space of the process. Because threads run in a single process, they share the entire contents of that process's virtual address space, including its code, data, heap, shared libraries, and open files.

Thread execution model: The execution model of threads and processes is somewhat similar. The declaration cycle of each process is a thread, which we call the main thread. Threads are equal, the difference between the main thread and other threads is that it executes first.
Advantages of multithreading :
No need to cross process boundaries; 
The program logic and control method are simple; 
All threads can directly share memory and variables, etc.; 
The thread mode consumes better total resources than the process mode; 
Disadvantages of multithreading :
Each thread shares the address space with the main program and is limited to 2GB address space; 
Synchronization and locking control between threads are troublesome; 
The crash of a thread may affect the stability of the entire program; 
After reaching a certain number of threads, even if you increase the CPU, the performance cannot be improved. For example, in Windows Server 2003, the number of threads is about 1500 or so, and the number of threads is about to reach the limit (the thread stack is set to 1M). If the thread stack is set to 2M, the total number of threads has not reached 1500; 
The total performance that a thread can improve is limited, and after there are more threads, the scheduling of the thread itself is also a hassle, requiring more CPU consumption 


The thread implementation of Linux is performed outside the kernel, and the kernel provides the interface do_fork() for creating a process. The kernel provides two system calls __clone() and fork(), both of which ultimately call the do_fork() kernel API with different parameters. do_fork() provides many parameters, including CLONE_VM (shared memory space), CLONE_FS (shared file system information), CLONE_FILES (shared file descriptor table), CLONE_SIGHAND (shared signal handle table) and CLONE_PID (shared process ID, only for cores Inner process, that is, process No. 0 is valid). When using the fork system call to generate multiple processes, the kernel calls do_fork() without using any shared attributes, and the processes have an independent running environment. When pthread_create() is used to create a thread, all these attributes are finally set to call __clone(), and these parameters are all passed to do_fork() in the kernel, so that the created "process" has a shared running environment, Only the stack is independent, passed in by __clone().
         That is: whether it is multi-threaded programming or multi-process programming under Linux, it is ultimately multi-process programming implemented by do_fork , but the parameters when the process is created are different, resulting in different shared environments. Linux threads exist in the form of lightweight processes in the kernel, with independent process entries, and all operations such as creation, synchronization, and deletion are performed in the pthread library outside the kernel. The pthread library uses a manager thread (__pthread_manager(), independent and unique for each process) to manage the creation and termination of threads, assign thread IDs to threads, send thread-related signals, and the caller of the main thread pthread_create()) via The pipe passes the request information to the management thread.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324684822&siteId=291194637