IO multiplexing What does it mean? (Reprint)

Transfer: https://www.zhihu.com/question/32163005

1 IO multiplexing What does this mean? - Luozhi Yu answer - know almost https://www.zhihu.com/question/32163005/answer/55772739

This is very good to say clearly.

Suppose you are an airport's air traffic control, all you need to manage your airport routes, including inbound, outbound, and some flights need to put the tarmac waiting, some flights need to pick passengers boarding gate.

What would you do?

The easiest way is you go to recruit a large number of air traffic controllers, and then stare each person an airplane, from inbound, pick up, qualifying, departure, route monitoring, until the handover to the next airport, process monitoring.

So the question arises:

  • Soon you will find that air traffic control tower air traffic controllers gathered inside a big ticket, traffic a little bit busy, a new air traffic controllers had not come in a squeeze.
  • The need for coordination between air traffic controllers, inside the house on the 1, 2 individual time Fortunately, after Jishihaoren, basically become a vegetable market.
  • Air traffic controllers often need to update some of the common things, such as take-off display, such as after the next hour flight departure, eventually you'll be surprised to find that everyone's time is spent on the last grab these resources.

 

In fact our air traffic control tube while dozens of aircraft commonplace things, how they do it?
They use this thing

This thing called flight progress strip. Each block represents a flight, different slots on behalf of the state, then a air traffic controllers can manage a set of such blocks (a group of flight), and his work, it is to have the flight information the new update, corresponding to the trough and into different blocks.

 

This thing is not yet eliminated Oh, just become the electrons only. .

Is not that all of a sudden a lot higher efficiency, air traffic control tower can schedule a route can be several times the previous method to several times.

If you put every airline as a Sock (I / O streams), ATC as your server Sock management code words.

The first method is the most traditional multi-process concurrency model (each come in a new I / O stream will be assigned a new process management.)
The second method is the I / O multiplexing (single thread, by recording the track state of each I / O stream (our sock), and to manage a plurality of I / O streams.)

In fact, "I / O multiplexer" This may be the reason pit father translated this concept in Chinese inside so difficult to understand. The so-called I / O multiplexing in English, in fact, called I / O multiplexing if you search for multiplexing What do you mean, basically will be out this chart:

So most people are directly associated with "a network cable, multiple sock reuse" concept, including several of the above answer, in fact, whether you use multiple processes or I / O multiplexers, network lines are only one good cutting. Sock multiplexing a plurality of cable this function is implemented in the kernel driver layer +.

The important thing repeat: I / O multiplexing that inside the multiplexing means is in fact a single thread by the recording tracks the status of each Sock (I / O stream) (corresponding to the empty tube inside the tower Fight progress strip grooves) to manage a plurality of I / O streams. reason the invention it is possible much to improve the throughput of the server.

 

Is not it sounds good mouthful, look at a map to understand.

 


In the same thread which, by way of DIP switches to simultaneously transmit multiple I / O streams (EE learned people who can stand up now Yoshimasa sternly said, this is called "time division multiplexing" a).

 

What, you do not get to know "a request arrived, nginx use epoll receiving the request process like", and read more on the understanding of this figure. Under a reminder, ngnix have a lot of links coming in, epoll they will have to monitor it, and then like a dial switch, like, who will be allocated to whom the data, and then call the appropriate code to handle.

------------------------------------------
understand this basic concept in the future, other on a good explanation.

select, specific implementation poll, epoll all I / O multiplexing, the reason why there are three ghosts exist, in fact, they appear is a sequential.

I/O多路复用这个概念被提出来以后, select是第一个实现 (1983 左右在BSD里面实现的)。

select 被实现以后,很快就暴露出了很多问题。

  • select 会修改传入的参数数组,这个对于一个需要调用很多次的函数,是非常不友好的。
  • select 如果任何一个sock(I/O stream)出现了数据,select 仅仅会返回,但是并不会告诉你是那个sock上有数据,于是你只能自己一个一个的找,10几个sock可能还好,要是几万的sock每次都找一遍,这个无谓的开销就颇有海天盛筵的豪气了。
  • select 只能监视1024个链接, 这个跟草榴没啥关系哦,linux 定义在头文件中的,参见FD_SETSIZE。
  • select 不是线程安全的,如果你把一个sock加入到select, 然后突然另外一个线程发现,尼玛,这个sock不用,要收回。对不起,这个select 不支持的,如果你丧心病狂的竟然关掉这个sock, select的标准行为是。。呃。。不可预测的, 这个可是写在文档中的哦.

“If a file descriptor being monitored by select() is closed in another thread, the result is unspecified”
霸不霸气

于是14年以后(1997年)一帮人又实现了poll, poll 修复了select的很多问题,比如

  • poll 去掉了1024个链接的限制,于是要多少链接呢, 主人你开心就好。
  • poll 从设计上来说,不再修改传入数组,不过这个要看你的平台了,所以行走江湖,还是小心为妙。

其实拖14年那么久也不是效率问题, 而是那个时代的硬件实在太弱,一台服务器处理1千多个链接简直就是神一样的存在了,select很长段时间已经满足需求。

但是poll仍然不是线程安全的, 这就意味着,不管服务器有多强悍,你也只能在一个线程里面处理一组I/O流。你当然可以那多进程来配合了,不过然后你就有了多进程的各种问题。

于是5年以后, 在2002, 大神 Davide Libenzi 实现了epoll.

epoll 可以说是I/O 多路复用最新的一个实现,epoll 修复了poll 和select绝大部分问题, 比如:

  • epoll 现在是线程安全的。
  • epoll 现在不仅告诉你sock组里面数据,还会告诉你具体哪个sock有数据,你不用自己去找了。

 

epoll 当年的patch,现在还在,下面链接可以看得到:
/dev/epoll Home Page

贴一张霸气的图,看看当年神一样的性能(测试代码都是死链了, 如果有人可以刨坟找出来,可以研究下细节怎么测的).


横轴Dead connections 就是链接数的意思,叫这个名字只是它的测试工具叫deadcon. 纵轴是每秒处理请求的数量,你可以看到,epoll每秒处理请求的数量基本不会随着链接变多而下降的。poll 和/dev/poll 就很惨了。

 

可是epoll 有个致命的缺点。。只有linux支持。比如BSD上面对应的实现是kqueue。

其实有些国内知名厂商把epoll从安卓里面裁掉这种脑残的事情我会主动告诉你嘛。什么,你说没人用安卓做服务器,尼玛你是看不起p2p软件了啦。

而ngnix 的设计原则里面, 它会使用目标平台上面最高效的I/O多路复用模型咯,所以才会有这个设置。一般情况下,如果可能的话,尽量都用epoll/kqueue吧。

详细的在这里:
Connection processing methods

PS: 上面所有这些比较分析,都建立在大并发下面,如果你的并发数太少,用哪个,其实都没有区别。 如果像是在欧朋数据中心里面的转码服务器那种动不动就是几万几十万的并发,不用epoll我可以直接去撞墙了

I/O复用是不是用跟各种池一起用的?

线程池和sock池都和具体实现有关。所以你要告诉我你指的线程池是某个实现呢,还是通常意义上的“线程池”

通常意义上的线程池只是多线程里面对线程生命周期的一个管理方法而已(不用的线程不是直接结束掉,而是放到池里面复用, 以避免创建/销毁线程的开销),和并发模型没有关系。线程池=空管员宿舍。没有宿舍,空管员都要从家里来上班,有了宿舍,上下班开销变小。但是和空管们工作方式没有关系。

不过很多很多的文章里面说的线程池其实是和多线程模型一起用的(多进程模型里面把进程换成线程)。这种情况一般是一个线程一个sock.

如果用I/O复用和线程池,一般是一个线程一组I/O, 然后可能另外还有一堆工作线程处理具体数据。。

关于socket 一个端口和多个端口通信如何通信?你注意到了accept会返回一个新的socket描述符么。。所以服务端只会监听一个端口,每次来了新的请求,都会创建一个新的sock和客户端通信。

每个socket就是一个I/O流,不考虑协议的情况下,确实就是两端IP和端口不同(服务器可能有多个IP). 不管这个socket在不在池里面。

Guess you like

Origin www.cnblogs.com/raohuabing/p/12216962.html