Detailed explanation of IO multiplexing mechanism

Server-side programming often needs to construct high-performance IO models. There are four common IO models:

(1) Synchronous blocking IO (Blocking IO): the traditional IO model.

(2) Synchronous non-blocking IO (Non-blocking IO): The sockets created by default are blocked, and non-blocking IO requires the socket to be set to NONBLOCK. Note that NIO mentioned here is not Java's NIO (New IO) library.

(3) IO Multiplexing (IO Multiplexing): the classic Reactor design pattern, sometimes called asynchronous blocking IO. Selector in Java and epoll in Linux are both models.

(4) Asynchronous IO (Asynchronous IO): the classic Proactor design pattern, also known as asynchronous non-blocking IO.

 

The concepts of synchronization and asynchrony describe the interaction between user threads and the kernel: synchronization means that after the user thread initiates an IO request, it needs to wait for or poll the kernel IO operation before it can continue to execute; while asynchronous means that the user thread still initiates an IO request and continues to execute. Continue to execute. When the kernel IO operation is completed, the user thread will be notified, or the callback function registered by the user thread will be called.

The concepts of blocking and non-blocking describe how user threads call kernel IO operations: blocking means that the IO operation needs to be completely completed before returning to the user space; non-blocking means that the IO operation returns a status value to the user immediately after the IO operation is called , without waiting until the IO operation is completely completed.

 

In addition,  the signal-driven IO (Signal Driven IO) model mentioned by Richard Stevens in Volume 1 of "Unix Network Programming" is not covered in this article because this model is not commonly used. Next, we analyze the implementation principles of four common IO models in detail. For the convenience of description, we uniformly use the read operation of IO as an example.

 

1. Synchronous blocking IO

 

The synchronous blocking IO model is the simplest IO model. User threads are blocked when the kernel performs IO operations.

Figure 1 Synchronous blocking IO

As shown in Figure 1, the user thread initiates an IO read operation through the system call read, which is transferred from the user space to the kernel space. The kernel waits until the data packet arrives, and then copies the received data to user space to complete the read operation.

The pseudo-code description of the user thread using the synchronous blocking IO model is:

{

read(socket, buffer);

process(buffer);

}

That is, the user needs to wait for read to read the data in the socket to the buffer before continuing to process the received data. During the entire IO request process, the user thread is blocked, which causes the user to not be able to do anything when initiating an IO request, and the resource utilization of the CPU is insufficient.

 

2. Synchronous non-blocking IO

 

Synchronous non-blocking IO sets the socket to NONBLOCK based on synchronous blocking IO. In this way, the user thread can return immediately after initiating an IO request.

 

Figure 2 Synchronous non-blocking IO

As shown in Figure 2, since the socket is a non-blocking method, the user thread returns immediately when it initiates an IO request. However, no data is read, and the user thread needs to continuously initiate IO requests until the data arrives, and then the data is actually read and the execution continues.

The pseudo-code description of the user thread using the synchronous non-blocking IO model is:

{

while(read(socket, buffer) != SUCCESS)

;

process(buffer);

}

That is, the user needs to call read continuously, try to read the data in the socket, and continue to process the received data until the read is successful. During the entire IO request process, although the user thread can return immediately after each IO request, in order to wait for the data, it still needs to continuously poll and repeat the request, which consumes a lot of CPU resources. Generally, this model is rarely used directly, but the non-blocking IO feature is used in other IO models.

 

3. IO multiplexing

IO多路复用模型是建立在内核提供的多路分离函数select基础之上的,使用select函数可以避免同步非阻塞IO模型中轮询等待的问题。

图3 多路分离函数select

如图3所示,用户首先将需要进行IO操作的socket添加到select中,然后阻塞等待select系统调用返回。当数据到达时,socket被激活,select函数返回。用户线程正式发起read请求,读取数据并继续执行。

从流程上来看,使 用select函数进行IO请求和同步阻塞模型没有太大的区别,甚至还多了添加监视socket,以及调用select函数的额外操作,效率更差。但是, 使用select以后最大的优势是用户可以在一个线程内同时处理多个socket的IO请求。用户可以注册多个socket,然后不断地调用select 读取被激活的socket,即可达到在同一个线程内同时处理多个IO请求的目的。而在同步阻塞模型中,必须通过多线程的方式才能达到这个目的。

用户线程使用select函数的伪代码描述为:

{

select(socket);

while(1) {

sockets = select();

for(socket in sockets) {

if(can_read(socket)) {

read(socket, buffer);

process(buffer);

}

}

}

}

其中while循环前将socket添加到select监视中,然后在while内一直调用select获取被激活的socket,一旦socket可读,便调用read函数将socket中的数据读取出来。

 

然而,使用 select函数的优点并不仅限于此。虽然上述方式允许单线程内处理多个IO请求,但是每个IO请求的过程还是阻塞的(在select函数上阻塞),平均 时间甚至比同步阻塞IO模型还要长。如果用户线程只注册自己感兴趣的socket或者IO请求,然后去做自己的事情,等到数据到来时再进行处理,则可以提 高CPU的利用率。

IO多路复用模型使用了Reactor设计模式实现了这一机制。

图4 Reactor设计模式

如图4所 示,EventHandler抽象类表示IO事件处理器,它拥有IO文件句柄Handle(通过get_handle获取),以及对Handle的操作 handle_event(读/写等)。继承于EventHandler的子类可以对事件处理器的行为进行定制。Reactor类用于管理 EventHandler(注册、删除等),并使用handle_events实现事件循环,不断调用同步事件多路分离器(一般是内核)的多路分离函数 select,只要某个文件句柄被激活(可读/写等),select就返回(阻塞),handle_events就会调用与文件句柄关联的事件处理器的 handle_event进行相关操作。

5 IO多路复用

如图5所示,通过 Reactor的方式,可以将用户线程轮询IO操作状态的工作统一交给handle_events事件循环进行处理。用户线程注册事件处理器之后可以继续 执行做其他的工作(异步),而Reactor线程负责调用内核的select函数检查socket状态。当有socket被激活时,则通知相应的用户线程 (或执行用户线程的回调函数),执行handle_event进行数据读取、处理的工作。由于select函数是阻塞的,因此多路IO复用模型也被称为异 步阻塞IO模型。注意,这里的所说的阻塞是指select函数执行时线程被阻塞,而不是指socket。一般在使用IO多路复用模型时,socket都是 设置为NONBLOCK的,不过这并不会产生影响,因为用户发起IO请求时,数据已经到达了,用户线程一定不会被阻塞。

用户线程使用IO多路复用模型的伪代码描述为:

void UserEventHandler::handle_event() {

if(can_read(socket)) {

read(socket, buffer);

process(buffer);

}

}

 

{

Reactor.register(new UserEventHandler(socket));

}

用户需要重写EventHandler的handle_event函数进行读取数据、处理数据的工作,用户线程只需要将自己的EventHandler注册到Reactor即可。Reactor中handle_events事件循环的伪代码大致如下。

Reactor::handle_events() {

while(1) {

sockets = select();

for(socket in sockets) {

get_event_handler(socket).handle_event();

}

}

}

事件循环不断地调用select获取被激活的socket,然后根据获取socket对应的EventHandler,执行器handle_event函数即可。

IO多路复用是最常使用的IO模型,但是其异步程度还不够“彻底”,因为它使用了会阻塞线程的select系统调用。因此IO多路复用只能称为异步阻塞IO,而非真正的异步IO。

 

四、异步IO

 

“真正”的异步 IO需要操作系统更强的支持。在IO多路复用模型中,事件循环将文件句柄的状态事件通知给用户线程,由用户线程自行读取数据、处理数据。而在异步IO模型 中,当用户线程收到通知时,数据已经被内核读取完毕,并放在了用户线程指定的缓冲区内,内核在IO完成后通知用户线程直接使用即可。

异步IO模型使用了Proactor设计模式实现了这一机制。

图6 Proactor设计模式

如图 6,Proactor模式和Reactor模式在结构上比较相似,不过在用户(Client)使用方式上差别较大。Reactor模式中,用户线程通过向 Reactor对象注册感兴趣的事件监听,然后事件触发时调用事件处理函数。而Proactor模式中,用户线程将 AsynchronousOperation(读/写等)、Proactor以及操作完成时的CompletionHandler注册到 AsynchronousOperationProcessor。AsynchronousOperationProcessor使用Facade模式提 供了一组异步操作API(读/写等)供用户使用,当用户线程调用异步API后,便继续执行自己的任务。 AsynchronousOperationProcessor 会开启独立的内核线程执行异步操作,实现真正的异步。当异步IO操作完成 时,AsynchronousOperationProcessor将用户线程与AsynchronousOperation一起注册的Proactor 和CompletionHandler取出,然后将CompletionHandler与IO操作的结果数据一起转发给 Proactor,Proactor负责回调每一个异步操作的事件完成处理函数handle_event。虽然Proactor模式中每个异步操作都可以 绑定一个Proactor对象,但是一般在操作系统中,Proactor被实现为Singleton模式,以便于集中化分发操作完成事件。

7 异步IO

如图7所示,异步 IO模型中,用户线程直接使用内核提供的异步IO API发起read请求,且发起后立即返回,继续执行用户线程代码。不过此时用户线程已经将调用的 AsynchronousOperation和CompletionHandler注册到内核,然后操作系统开启独立的内核线程去处理IO操作。当 read请求的数据到达时,由内核负责读取socket中的数据,并写入用户指定的缓冲区中。最后内核将read的数据和用户线程注册的 CompletionHandler分发给内部Proactor,Proactor将IO完成的信息通知给用户线程(一般通过调用用户线程注册的完成事件 处理函数),完成异步IO。

用户线程使用异步IO模型的伪代码描述为:

void UserCompletionHandler::handle_event(buffer) {

process(buffer);

}

 

{

aio_read(socket, new UserCompletionHandler);

}

用户需要重写CompletionHandler的handle_event函数进行处理数据的工作,参数buffer表示Proactor已经准备好的数据,用户线程直接调用内核提供的异步IO API,并将重写的CompletionHandler注册即可。

相比于IO多路复 用模型,异步IO并不十分常用,不少高性能并发服务程序使用IO多路复用模型+多线程任务处理的架构基本可以满足需求。况且目前操作系统对异步IO的支持 并非特别完善,更多的是采用IO多路复用模型模拟异步IO的方式(IO事件触发时不直接通知用户线程,而是将数据读写完毕后放到用户指定的缓冲区中)。 Java7之后已经支持了异步IO,感兴趣的读者可以尝试使用。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326686685&siteId=291194637