Reprinted from https://blog.csdn.net/wan_hust/article/details/38441455#t1
Why do we use multi-threading/multi-process, multi-way io multiplexing, there is no doubt that it is to improve server performance and efficiency, it is no problem for a single process to handle several connections, but the server generally handles thousands of Connection, with so many connections, it is conceivable how inefficient a process is to process, which is equivalent to a hotel having only one waiter to entertain all the guests. At this time, we must be thinking of recruiting people for help. Threads ( equivalent to many waiters in restaurants ) handle high concurrency, but which model is better for handling high concurrency? At this time, let's take a look at the differences between these three models to choose.
Multiplex io model
I/O multiplexing principle: Allow the application to monitor multiple I/O ports at the same time to determine whether the operations on them can be performed, so as to achieve the purpose of time multiplexing. I saw an example in the book to explain the principle of I/O. I think it is very vivid. If you use monitoring water pipes (I/O ports) from 10 different places to see if there is water flow (that is, whether it is readable), then you need 10 Individuals (ie 10 threads or 10 places of code) to do this. If a certain technology (such as a camera) is used to convey the status of these 10 water pipes to a certain point, then only one person is required to monitor at that point, and multi-channel I/O such as select or epoll The multiplexing mechanism is like the function of the camera. They can feedback the status of multiple I/O ports to the same place, such as a specific file descriptor, so that the application only needs to use the corresponding select() or epoll_wait( ) System call blocking can pay attention to this place, which means that the multi-channel io model is much better than multi-threading and multi-process. The monitoring work does not need to create multiple threads and processes to handle it, only one select is needed.
Advantages and disadvantages of I/O multiplexing: Since I/O multiplexing is in the context of a single process, each logical process can access the entire address space of the process, so the overhead is significantly lower than that of multiple processes Many; the disadvantage is that the programming complexity is high.
Imagine a scenario where a server has millions of clients connected at the same time. If select is used alone, although select helps to monitor so many connections, it is still not enough for a single process to handle so many connections, which requires multi-threading , multi-process helps, so the best way to deal with high concurrency is to use the model of multiplexing io multiplexing + multithreading/multiprocessing
multi-process model
Each process is independent of each other, does not affect the stability of the main program, it does not matter if the child process crashes;
By increasing the CPU, the performance can be easily expanded;
It can minimize the impact of thread locking/unlocking and greatly improve performance, even if the module algorithm running on the thread is inefficient;
Each child process has 2GB address space and related resources, and the overall performance limit that can be achieved is very large
The logic control is complex and needs to interact with the main program;
It needs to cross process boundaries. If there is a large amount of data transmission, it is not very good. It is suitable for small data transmission and intensive computing.
Multi-process scheduling overhead is relatively large;
multithreading model
Thread execution model: The execution model of threads and processes is somewhat similar. The declaration cycle of each process is a thread, which we call the main thread. Threads are equal, the difference between the main thread and other threads is that it executes first.
No need to cross process boundaries;
The program logic and control method are simple;
All threads can directly share memory and variables, etc.;
The thread mode consumes better total resources than the process mode;Disadvantages of multithreading :
Each thread shares the address space with the main program and is limited to 2GB address space;
Synchronization and locking control between threads are troublesome;
The crash of a thread may affect the stability of the entire program;
After reaching a certain number of threads, even if you increase the CPU, the performance cannot be improved. For example, in Windows Server 2003, the number of threads is about 1500, and the number of threads is about to reach the limit (the thread stack is set to 1M). If the thread stack is set to 2M, the total number of threads has not reached 1500;
The total performance that a thread can improve is limited, and after there are more threads, the scheduling of the thread itself is also a hassle, requiring more CPU consumption
That is: whether it is multi-threaded programming or multi-process programming under Linux, it is ultimately multi-process programming implemented by do_fork , but the parameters when the process is created are different, resulting in different shared environments. Linux threads exist in the form of lightweight processes in the kernel, with independent process entries, and all operations such as creation, synchronization, and deletion are performed in the pthread library outside the kernel. The pthread library uses a manager thread (__pthread_manager(), independent and unique for each process) to manage the creation and termination of threads, assign thread IDs to threads, send thread-related signals, and the caller of the main thread pthread_create()) via The pipe passes the request information to the management thread.