BIO IO model of, NIO and AIO

What is the IO

We know that in Unix systems inside, everything is a file. File is a string of binary stream only. Whether Socket, pipelines, terminals, for us, all files are stream. In the process of information exchange, we send and receive data on the operation of these streams, referred to as I / O operations (input and Output) .

Calculator There are so many streams, how do we know which stream to operate it? Is through the file descriptor , known as fd, fd a is an integer, so for this operation is an integer operation of the file (stream).

We create a socket, through the system call returns a file descriptor, then the rest of the operation of the socket will be transformed into the operation of this descriptor.

User space and kernel space

Current operating systems are virtual memory, the system for 32-bit, its address space (virtual storage) bit 4G (32 th power of 2). The core of the operating system kernel, independent of the normal application, you can access a protected memory space, but also all access to the underlying hardware.

In order to ensure that users can not directly manipulate the process kernel (kernel), to ensure the safety of the kernel, the operating system of the virtual space is divided into two parts, one for the kernel space, as part of the user space.

For Linux systems, high-1G byte for kernel called kernel space, while the lower 3G bytes for each process used is called user space.

Synchronous and asynchronous

It is for synchronous and asynchronous application's interaction with the kernel terms.

Synchronization process is triggered IO operations and waits (blocking) or poll to see IO operations (non-blocking) is complete.

After an asynchronous process triggered IO operations, return directly to do their own thing, IO to the kernel to deal with, after the completion of the notification process kernel IO completion.

So there are synchronous blocking and non-blocking of the points, it must be non-blocking asynchronous.

Blocking, non-blocking, multi-channel IO multiplexing, are synchronous IO. Only the user thread when operating the IO IO with no thought to the implementation of all to the CPU to complete, but they only wait for a completion notification signal, this is the real asynchronous IO. Polling thread to pull a child, or to use circular die select, poll, epoll not asynchronous.

Blocking and non-blocking

Blocking: CPU to process a task after, has been waiting for the CPU processing is complete before performing operations behind.

Non-blocking: After the CPU to process a mandate to continue to deal with follow-up operation, whether the operation again from time to time before the inquiry is completed, the process is actually also called polling.

Because a thread can process a socket I / O event, if you want to process multiple, may utilize a non-blocking busy polling manner, the following pseudo-code:

while true  
{  
    for i in stream[]  
    {  
        if i has data  
        read until unavailable  
    }  
}

Through the above code, we just need to read through all the streams, it can handle multiple flow. But doing so there is a problem, because the CPU time slice if all streams are no I / O event, bye waste.

In order to avoid idle CPU here, we do not let this thread go and check if there is a stream of events, but the introduction of a proxy (began as a select, and later poll), this agency is very powerful, I can observe many simultaneous streams / O event. If there is no event on the blocked thread will not go one by one the poll.

while true  
{  
    select(streams[]) //这一步死在这里,直到有一个流有I/O事件时,才往下执行  
    for i in streams[]  
    {  
        if i has data  
        read until unavailable  
    }  
}

Here is a problem, we select from there just know there are I / O event occurs, but do not know which stream, we still need to poll all undifferentiated flow, identify data read or write data flow, for them to operate. So having a select polling undifferentiated complexity O (n) a. The more current, the longer the undifferentiated polling event.

epoll can be understood as event poll, different from the busy polling and non-discriminatory poll, epoll which flow will occur how the I / O event let us know. So actually epoll event-driven (each event associated with the fd), this time we are operating in these streams are made meaningless. Complexity down to O (1).

while true  
{  
    active_stream[] = epoll_wait(epollfd)  
    for i in active_stream[]  
    {  
        read or write till  
    }  
}

So the biggest difference is that select and epoll, select just tell you there are a certain number of flow events, as to which stream an event, had to have to go one by one polling. The incident happened epoll will tell you, by the events happened, was able to locate the specific flow. Not the slightest bit of good performance.

Executing process, due to the occurrence of certain events did not expect, such as system resource request fails, waiting for some operation is completed, the new data is not yet reached or no new job, etc., is performed automatically by the system blocking primitives (Block), the blocked by the state to run their own state. Visible, blocked processes is the process itself is an active behavior, and therefore only run in a dynamic process (to get CPU), it may be converted to the blocked state. When the process enters the blocked state, it is not take up CPU resources.

I / O models

Input operation generally comprises two steps:

  • Waiting for data to be ready for operation on a socket too, this step is related to the data arriving from the network, and copy it to a kernel buffer.
  • Copy the data from the kernel buffer to process buffer.

Blocking I / O model

The most widely used model is blocking I / O model, by default, all sockets are blocked. Process calls the recvfrom system call, the whole process is blocked until the buffer when copying data to process the return (of course, will return when the system call is interrupted).

img

Non-blocking I / O model (NIO)

When we put a socket to non-blocking, that is, to tell the kernel when the requested I / O operation can not be completed, the process will not sleep, but returns an error. When the data is not ready, the kernel returns EWOULDBLOCK error immediately. When calling recvfrom fourth time, the data already exists, then the process of copying data to the buffer. This one has a polling (Polling) operation.

img

I / O multiplexing model

This model uses select and poll function, these two functions also make the process blocked. However, the blocking I / O is different, these two functions can be simultaneously blocking a plurality of I / O operations, but also simultaneously a plurality of read operations, the write operation of the plurality of I / O functions for testing until the data read or writable.

select 被调用后,进程会被阻塞,内核监视所有 select 负责的 socket,当有任何一个 socket 的数据准备好了,select 就会返回套接字可读,我们就可以调用 recvfrom 处理数据。

阻塞 I/O 只能阻塞一个 I/O 操作,而 I/O 复用模型能够阻塞多个 I/O 操作,所以才叫多路复用。

img

异步I/O模型(AIO)

进程发起 read 操作之后,立刻就可以开始去做其他的事情。内核收到了一个 asynchronous read 之后,首先会立即返回,所以不会 block 任何用户进程。Kernel 会等待数据准备完成,然后将数据拷贝到用户内存,当这一切完成后,kernel 会给用户发送一个 singal,告诉它 read 操作完成了。

img

服务端编程经常需要构造高性能的 IO 模型,常见的 IO 模型有四种:

  • 同步阻塞 IO(Blocking IO):即传统的 IO 模型。
  • 异步非阻塞 IO(Non-blocking IO):默认创建的 socket 都是阻塞的,非阻塞 IO 要求 socket 被设置为 NONBLOCK。这里的 NIO 并未 Java 的 NIO(new IO)库。
  • IO 多路复用(IO Multiplexing):即经典的 Reactor 设计模式,Java 中的 Selector 和 Linux 中的 epoll 都是这种模型。
  • 异步 IO(Asynchronous IO):即经典的 Proactor 设计模式,也被称为异步非阻塞 IO。

同步阻塞 IO

同步阻塞 IO 模型是最简单的 IO 模型,用户线程在内核进行 IO 操作时被阻塞。

img

  1. 用户线程通过系统调用 read 发起 IO 读操作。由用户空间转到内核空间。
  2. 内核等数据包到达后,将接收到的数据拷贝到用户空间
{

read(socket, buffer); //一直阻塞等待
process(buffer);

}

用户需要等待 read 将 socket 中的数据读取到 buffer,才能继续处理接收的数据。整个 IO 请求的过程中,用户线程是被阻塞的,导致用户在发起 IO 请求时不能做任何事情,对 CPU 的资源利用率不够。

同步非阻塞 IO

同步非阻塞 IO 是在同步阻塞 IO 的基础上,将 socket 设置为 NONBLOCK。这样做用户线程可以在发起 IO 请求后立即返回。

img

  1. 由于 socket 是非阻塞的方式,因此用户线程发起 IO 请求时立即返回。
  2. 但并未读取到任何数据,用户线程需要不断地发起 IO 请求,直到数据到达后,才读取到真正的数据。
{

while(read(socket, buffer) != SUCCESS); //不断请求

process(buffer);

}
  1. 用户需要不断地调用 read,尝试读取 socket 中的数据,直到读取成功后,才继续处理接收的数据。
  2. 虽然可以立即返回,但是为了等到数据,仍需要不断地轮询,重复请求,消耗了大量的 CPU 资源。一般很少直接使用这种模型,而是在其他 IO 模型中使用非阻塞 IO 这一特性。

IO 多路复用

IO 多路复用模型是建立在内核提供的多路分离函数 select 基础之上的,使用 select 函数可以避免同步非阻塞 IO 模型中轮询等待的问题。

img

  1. 用户首先将需要进行 IO 操作的 socket 添加到 select 中,然后阻塞等待 select 系统调用返回。
  2. 当数据到达时,socket 被激活,select 函数返回。
  3. 用户线程正式发起 read 请求,读取数据并继续执行。

从流程上来看,使用 select 进行 IO 请求和同步阻塞模型并没有太大区别,甚至还加了监视 socket 的操作,效率好像更差。但是使用 select 后最大的优势是用户可以在一个线程内同时处理多个 socket 的 IO 请求。用户可以注册多个 socket,然后不断地调用 select 读取被激活的 socket,即可达到在同一个线程内同时处理多个 IO 请求的目的。在同步阻塞模型中,必须通过多线程才能达到这个目的。

{
    select(socket);
    while(1) {
        sockets = select();
        for(socket in sockets) {
        if(can_read(socket)) {
            read(socket, buffer);
            process(buffer);
        }
    }
    }
}

使用 select 函数的有点不仅限于此。虽然上述方式允许多线程内处理多个 IO 请求,但是每个 IO 请求的过程还是阻塞的(在 select 上阻塞),平局实现比同步阻塞 IO 模型还要长。如果用户线程只注册自己感兴趣的 socket 或者 IO 请求,然后去做自己的事情,这样可以提高 CPU 的利用率。

IO 多路复用模型使用了 Reactor 设计模式实现了这一个机制:

img

EventHandler 抽象类表示 IO 事件处理器,他拥有 IO 文件句柄 handle(通过 get_handle 获取),以及对 handle 的操作 handle_event(读/写等)。继承于 EventHandler 的子类可以对事件处理器的行为进行定制。Reactor 用于管理 EventHandler(注册,删除等),并使用 handle_events 实现事件循环,不断调用同步事件多路分离器的多路分离函数 select,只要某个文件句柄被激活(可读/写等),select 就返回,handle_events 就会调用与文件句柄相关联的事件处理器 handle_event 进行相关操作。

img

通过 Reactor 的方式,可以将用户线程轮询 IO 操作状态的工作统一交给 handle_events 事件循环进行处理。用户线程注册事件处理器之后可以继续执行其他的工作,而 Reactor 线程负责调用内核的 select 函数检查 socket 的状态。当有 socket 被激活时,则通知相应的用户线程(或执行用户线程的回调函数),执行 handle_event 进行数据读取,处理的工作。由于 select 函数是阻塞的,因此多路 IO 复用模型也被称为一步阻塞 IO 模型。这里说的阻塞指的是 select 函数执行的线程被阻塞,而不是 socket。

IO 多路复用模型的伪代码描述为:

void UserEventHandler::handle_event() {
    if(can_read(socket)) {
        read(socket, buffer);
        process(buffer);
    }
}
{
    Reactor.register(new UserEventHandler(socket));
}

用户需要重写 EventHandler 的 handle_event 函数进行读取数据,处理数据的工作,用户线程只需要将自己的 EventHandler 注册到 Reactor 即可。

Reactor 中的 handle_events 事件循环的伪代码如下:

Reactor::handle_events() {
        while(1) {
                sockets = select();
                for(socket in sockets) {
                        get_event_handler(socket).handle_event();
                }
        }
}

事件循环不断地调用 select 获取被激活的 socket,然后根据 socket 对应的 EventHandler,执行 handle_event 函数即可。

异步IO

真正的异步 IO 需要操作系统更强的支持。在 IO 多路复用模型中,事件循环将文件句柄的状态事件通知给用户线程,用户自行读取数据,处理数据。而在异步 IO 模型中,当用户线程收到通知时,数据已经被内核读取完毕,并且放到了用户线程指定的缓冲区内,内核在 IO 完成后通知用户线程直接使用即可。

异步 IO 模型使用了 Proactor 设计模式实现了这一机制。

img

Proactor 模式和 Reactor 模式在结构上比较相似,不过在用户使用方式上差别较大。Reactor 模式中,用户线程通过向 Reactor 对象注册感兴趣的事件监听,然后触发时调用事件处理函数。而 Proactor 模式中,用户线程将 AsynchronousOperation,Proactor 以及操作完成时的 CompeletionHandler 注册到 AsynchronousOperationProcessor。AsynchronousOperationProcessor 使用 Facade 模式提供了一组异步操作 API 供用户使用,当用户线程调用异步 API 后,便继续执行自己的任务。

Proactor 负责回调每一个异步操作的事件完成处理函数 handle_event,虽然 Proactor 模式中每个异步操作都可以绑定一个 Proactor 对象,但是一般在操作系统中,Proactor 被实现为 Singleton 模式,以便集中化分发操作完成事件。

img

  1. Thread returns immediately after the user directly using asynchronous IO API provided by the kernel to initiate read requests, and initiate.
  2. User thread has called AsynchronousOperation and CompletionHandler registered to the kernel, then the operating system on a separate kernel thread to handle IO operations.
  3. When the requested read data arrives, the data is read by the kernel is responsible for the socket, and writes the user specified buffer.
  4. The kernel will read the data and user threads registered CompletionHandler distributed inside Proactor, Proactor the IO completion notification information to the user thread, complete asynchronous IO.
void UserCompletionHandler::handle_event(buffer) {
    process(buffer);
}

{
    aio_read(socket, new UserCompletionHandler);
}

Asynchronous IO is not commonly used, high-performance concurrent services that use a lot of IO multiplexing architecture model + multithreaded processing to meet the basic needs. After Java 7 has support for asynchronous IO.

Guess you like

Origin www.cnblogs.com/paulwang92115/p/12186174.html