Whoever asks you what multiplexing io is in the future will be kicked into the steel plate

io is the hardest hit area for many Java / python / go developers. If you have never been involved in development, you may only know about blocking/non-blocking, synchronous/asynchronous, and more powerfully, multiplexing.
insert image description here
Many students have no idea about these concepts. Without a clear understanding, it is really just reading and reciting the full text~

Today, I will take you to explore the development history of io. If someone asks you about io in the future, he will just kick the steel plate.
insert image description here

There are so many articles about io on the Internet, but you may have seen the following paragraph for the first time (if you understand it, read it, if you don't understand it, skip it, and you will understand it naturally when you understand it):

Whether it is windows or linux, all operations involving io cannot be directly completed by the application program. It is very dangerous to open file operation permissions to users. If you want to perform io operations, you must use the functions provided by the operating system kernel, but these functions We don't need to call it by ourselves. Java has already done a good job of encapsulation for us. We can adjust the relevant api during development. As shown in the figure below, the two packages
insert image description here
io are simple read/write, and nio introduces three new concepts:

  • Buffer: data container;
  • Channel: This thing is too abstract. The Chinese name is channel. It is enough to know that it can complete the io operation between kernels;
  • Selector: nio implements the basis for multiplexing;

Ok, after talking nonsense, let's sort out the relevant concepts first.

  • Synchronous or asynchronous : Synchronous means running in an orderly manner, and the next task will be executed after the current task is executed; on the contrary, other tasks do not need to wait for the current task to be executed, and usually rely on events and callback mechanisms to realize the order relationship between tasks;
  • Blocking and non-blocking : When a blocking operation is performed, the current thread will be in a blocked state and cannot be engaged in other tasks. It can only continue when conditions are ready, such as data reading and writing; non-blocking means that regardless of whether the IO operation ends, the return.

Just looking at the concept brain is buzzing, it doesn't matter, please look at the big screen~

cooked rice

There is an Internet celebrity restaurant near the company. There are a lot of young ladies who go to their house to eat every day. Of course, don’t misunderstand me. I just want to say that their meals are very good. Ah... No, I mean their young ladies are very good Yummy, ugh... not right

In short, I go there often, and my ordering method has experienced the following versions :

  • v1.0 : When I first went there, I saw that other people were queuing to order food. As an honest person, I can only wait for the person in front to finish ordering before ordering, and then go to the pick-up port and wait for the food to be picked up. Sometimes it takes 15 minutes I will get the meal, but I can see more young ladies and sisters standing at the meal pickup port, so I didn't care at first;
  • v2.0 : One day later, I found that there was a screen above the food pickup port, which displayed the current order of the order number. Wow, I suddenly realized that from then on, every time I finished ordering, I would find a place where there are many ladies and sisters to sit. Then, come to the pick-up port every 5 minutes to see if it's my turn to pick up the meal;
  • v3.0 : Over time, I got tired of looking at Miss and Sister every day. One day I found a QR code on the table. Wow, I suddenly realized that I can order food by scanning the code. will deliver meals to your location;

The relationship between meal pickup and io :

  • v1.0 (bio, blocking io) : queue up to order and wait for food, similar to bio, synchronous blocking, after the thread submits the io operation, it cannot return before the io operation is completed, nor can it do other things, and can only wait awkwardly ;
  • v2.0 (nio, non-blocking io) : queue up to order, go to the pick-up port from time to time to see if it is your turn to pick up the meal, similar to nio, synchronous non-blocking, the thread can return first after submitting the io operation, but you need to actively search for the operation The system gets the result (nio can build multiplexed io, which is the focus of this article);
  • v3.0 (nio2, or aio, asynchronous io) : scan the code to order, find a place to sit and watch the lady after ordering, and the waiter will deliver it after the meal, similar to aio, asynchronous and non-blocking, in On the basis of nio, an event and callback mechanism is added. After the operating system prepares the data, it will actively notify the thread.

Regarding the function support of the operating system for io to nio, the evolution process is more complicated. You need to understand how the operating system implements zero copy. Please move to Zero Copy Technology Analysis

About io / nio, in fact, it is just the above things, don't be confused, today our theme is multiplexing io

The old rules, let's throw some of the most common doubts of the brothers first, and these doubts will be answered below:

  • What is multiplexing?
  • Why do you need multiplexing?
  • What problem does multiplexing solve?
  • What exactly is multiplexing, and what is multiplexing?

Traditional network communication uses sockets, and the process is as follows:

  1. create socket
  2. Bind the current server ip and port to the socket
  3. Listen on the port and process requests through accept

There is nothing wrong with the process, but you must know that accept is blocking, which means that only one request can be processed at the same time . If you want to process multiple requests at the same time, you can use multithreading, but this will cause another problem: operation When there are too many threads in the system, a lot of resources are needed to manage threads and context switching (here is a long-winded sentence, no matter what the scenario, the more threads the better, after reaching a certain threshold, the more threads, the lower the efficiency)

Therefore, to gracefully handle multiple requests at the same time, the operating system kernel must implement a single thread to listen to multiple sockets, that is, io multiplexing

There are three types of support for multiplexing in the Linux system:

  • Multiplexing: select
  • Multiplexing pro: poll
  • Multiplexing pro max: epoll

Let's take a look at the implementation of these three functions, and the performance of the three of them is probably clear.

select

//返回值是已就绪文件描述符个数
int select (int __nfds, fd_set *__readfds, fd_set *__writefds, fd_set *__exceptfds, struct timeval *__timeout)

Parameter explanation:

  • _ _nfds: The number of file descriptors monitored (don't worry about what a file descriptor is, it can be understood that each io operation corresponds to a file descriptor);
  • _ _readfds, __writefds, __exceptfds: Specify the event types monitored by the multiplexing mechanism, these three are read data events, write data events, and exception events;
  • _ _timeout: The timeout period for blocking and waiting when listening;

select manages multiple connections by monitoring multiple file descriptors, and can monitor 1024 at a time (the default value). When the select function returns, it traverses the descriptor set to find the ready descriptor for the next step.

So far, the pain point that a thread can only manage one connection is solved, but there are still two problems:

  • The number of descriptors monitored each time is limited. Although it can be modified by modifying the macro file, this approach is too extreme;
  • We need to continuously execute the select function to obtain ready file descriptors, and the traversal process also consumes CPU performance;

So linux introduced poll to solve the problem of file descriptor restrictions

poll

//返回值也是已就绪文件描述符个数
int poll (struct pollfd *__fds, nfds_t __nfds, int __timeout);

Parameter explanation:

  • _ _fds: an array of pollfd structures, including the file descriptor to be monitored and the event type to be monitored;
  • _ _nfds: The number of pollfd structure arrays, there is no size limit;
  • _ _timeout: The timeout period for blocking and waiting when listening;

Compared with select, poll actually breaks through the limitation of file descriptors, but we still need to traverse each file descriptor to detect whether it is ready.

So, Linux2.6 introduced epoll

epoll

epoll has three functions epoll_create, epoll_ctl and epoll_wait

//创建epoll实例,size表示监听多少个文件描述符
int epoll_create(int size)//将连接加入epoll监听列表,参数分别表示:
//epoll_create()的返回值
//需要执行的修改操作
//需要监听的文件描述符
//监听的事件类型
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)//阻塞等待返回已就绪的文件描述符,参数分别表示:
//epoll_create()的返回值
//事件的集合
//events大小
//超时时间
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

The epoll model supports the number of descriptors to be monitored, and can also directly return ready descriptors

The implementation of the famous redis in the Linux system is based on the epoll model, so even if it is a single thread, it can easily cope with high concurrent client access


Having said so much, what exactly is io multiplexing? What is reused?

I think these two questions are difficult to explain. Maybe I will really understand it when I really come into contact with the knowledge of the operating system in the future.

Personal opinion on reuse:

First of all, it is definitely not the multiplexing of sockets. A request corresponding to a socket cannot be changed even if Jesus comes;

It is not a reuse of threads, but it seems to be understandable, because after all, one thread manages multiple sockets;

The so-called reuse may not have to reuse a certain thing. I think it is the behavior of managing multiple sockets through a request to the kernel.


It is enough to understand the various functions mentioned above. You don’t need to study too deeply. I have also read a lot of documents to get this information. There is no need to spend too much time on it, otherwise my brain will be buzzing~, I am just thinking about it now. Buzzing, sure enough, the more you know, the more you don't understand. . .

ok i'm done

Guess you like

Origin blog.csdn.net/qq_33709582/article/details/123137787