IO model provides five kinds of Unix

First, the need for network programming expressly synchronous / asynchronous, blocking / non-blocking
  • Synchronize:

  The so-called synchronous, that is, when issuing a function call, until no results, the call will not return. One thing that is a must do, such as one done before in order to do the next thing.

  E.g. ordinary B / S mode (synchronous): Submit Request -> waiting for the server process -> return processing completed anything during this client browser can not do

  • asynchronous:

  The concept of asynchronous and synchronous relative. When an asynchronous procedure call is issued, the caller can not immediately get results. The actual processing of the call after completion member, to inform the caller via a state, and a notification callback.

  For example ajax request (asynchronous): Request by event triggers -> server processing (which is the browser can still be used for other things) -> processed

  • Clog

  Blocking call refers to the results before the call returns, the current thread is suspended (thread into the non-executable state, in this state, cpu will not give the thread allocation of time slices that thread pauses). Function will return only after getting results.

  Maybe someone will call and synchronous call blocking equate, in fact, he is different. For synchronous calls, many times the current thread is still active, but does not return from the current function logically only. For example, we call in the socket recv function, if there is no data in the buffer, this function will wait until there is data before returning. By this time, the current thread will continue to deal with a variety of news.

  • Non-blocking

  Non-blocking and blocking concept corresponds to refer to before can not get immediate results, this function does not block the current thread, and will return immediately.
  Objects blocking mode and blocking function call: whether the object is in blocking mode and functions are not blocking call has a strong correlation, but not one to one. There are ways you can call on the non-blocking blocking object, we can go through some API polling status, call blocking function at the appropriate time, you can avoid congestion. For non-blocking object, call a special function can also enter a blocking call. Select function is one such example.

1, the user initiates a read operation as an example, there is a two-step operation:

  • Wait for data preparation;
  • Copy data from user space to user space;
    Here Insert Picture Description

The difference between synchronous and asynchronous IO IO is that: when the data copy process is blocked;

Difference between blocking and non-blocking IO IO lies: whether the calling application to return immediately;

2, for c / s mode:

Synchronization: submitting a request -> Waiting for server processing -> returns processed during this period the client browser can not do anything
asynchronous: request by event triggers -> server processing (which is the browser can still be used for other things) -> processed

Synchronous and asynchronous were directed to the machine SOCKET terms.
  Synchronous and asynchronous, blocking and non-blocking, some mix, in fact, they are not the same thing, and they modified the object is not the same.
Blocking and non-blocking refers to the process of data when accessed, if not ready, whether to wait for the process, which is equivalent to simply realize the difference between the internal function, that is, when not ready is returned directly or waiting ready;

  Refers to the synchronous and asynchronous data access mechanism, generally refers to the active synchronization request and waiting for I / O operation is complete manner, when the data is ready to be blocked (ready distinction, in synchronization with the read write two stages at the time of reading and writing write must block), asynchronous refers to after the initiative to request data can continue processing other tasks, then wait for I O, the operation is completed the notification /, which can make the process does not block while data read and write.

Second, the five kinds of Unix IO model provides:
  1. Synchronous blocking IO (blocking I / O): process will block until data copying is completed

  IO application calls a function that causes the application blocks, waiting for the data is ready. If the data is not ready, has been waiting for ... Data ready copy from the kernel to user space, IO function returns an indication of success.
Here Insert Picture Description
  When calling recv () / recvfrom () function, waiting occurs in the kernel data and process the replicated data.

  When you call recv () function, the system first check whether there is data ready. If the data is not ready, then the system is in a wait state. When the data is ready, copy the data from the system buffer to user space, then the function returns. In the application socket, when calling recv () function, the user space is not data already exists, then the time recv () function will wait state.

  当使用socket()函数和WSASocket()函数创建套接字时,默认的套接字都是阻塞的。这意味着当调用Windows Sockets API不能立即完成时,线程处于等待状态,直到操作完成。

  并不是所有Windows Sockets API以阻塞套接字为参数调用都会发生阻塞。例如,以阻塞模式的套接字为参数调用bind()、listen()函数时,函数会立即返回。将可能阻塞套接字的Windows Sockets API调用分为以下四种:

  • 输入操作: recv()、recvfrom()、WSARecv()和WSARecvfrom()函数。以阻塞套接字为参数调用该函数接收数据。如果此时套接字缓冲区内没有数据可读,则调用线程在数据到来前一直睡眠。

  • 输出操作: send()、sendto()、WSASend()和WSASendto()函数。以阻塞套接字为参数调用该函数发送数据。如果套接字缓冲区没有可用空间,线程会一直睡眠,直到有空间。

  • 接受连接:accept()和WSAAcept()函数。以阻塞套接字为参数调用该函数,等待接受对方的连接请求。如果此时没有连接请求,线程就会进入睡眠状态。

  • 外出连接:connect()和WSAConnect()函数。对于TCP连接,客户端以阻塞套接字为参数,调用该函数向服务器发起连接。该函数在收到服务器的应答前,不会返回。这意味着TCP连接总会等待至少到服务器的一次往返时间。

  使用阻塞模式的套接字,开发网络程序比较简单,容易实现。当希望能够立即发送和接收数据,且处理的套接字数量比较少的情况下,使用阻塞模式来开发网络程序比较合适。

  阻塞模式套接字的不足表现为,在大量建立好的套接字线程之间进行通信时比较困难。当使用“生产者-消费者”模型开发网络程序时,为每个套接字都分别分配一个读线程、一个处理数据线程和一个用于同步的事件,那么这样无疑加大系统的开销。其最大的缺点是当希望同时处理大量套接字时,将无从下手,其扩展性很差

  1. 同步非阻塞IO(nonblocking I/O)

  非阻塞IO通过进程反复调用IO函数(多次系统调用,并马上返回),在数据拷贝的过程中,进程是阻塞的;
Here Insert Picture Description
  我们将一个SOCKET接口设置为非阻塞就是告诉内核,当所请求的I/O操作无法完成时,不要将进程睡眠,而是返回一个错误。这样我们的I/O操作函数将不断的测试数据是否已经准备好,如果没有准备好,继续测试,直到数据准备好为止。在这个不断测试的过程中,会大量的占用CPU的时间。

  把SOCKET设置为非阻塞模式,即通知系统内核:在调用Windows Sockets API时,不要让线程睡眠,而应该让函数立即返回。在返回时,该函数返回一个错误代码。图所示,一个非阻塞模式套接字多次调用recv()函数的过程。前三次调用recv()函数时,内核数据还没有准备好。因此,该函数立即返回WSAEWOULDBLOCK错误代码。第四次调用recv()函数时,数据已经准备好,被复制到应用程序的缓冲区中,recv()函数返回成功指示,应用程序开始处理数据。

  当使用socket()函数和WSASocket()函数创建套接字时,默认都是阻塞的。在创建套接字之后,通过调用ioctlsocket()函数,将该套接字设置为非阻塞模式。Linux下的函数是:fcntl().

  套接字设置为非阻塞模式后,在调用Windows Sockets API函数时,调用函数会立即返回。大多数情况下,这些函数调用都会调用“失败”,并返回WSAEWOULDBLOCK错误代码。说明请求的操作在调用期间内没有时间完成。通常,应用程序需要重复调用该函数,直到获得成功返回代码。

  需要说明的是并非所有的Windows Sockets API在非阻塞模式下调用,都会返回WSAEWOULDBLOCK错误。例如,以非阻塞模式的套接字为参数调用bind()函数时,就不会返回该错误代码。当然,在调用WSAStartup()函数时更不会返回该错误代码,因为该函数是应用程序第一调用的函数,当然不会返回这样的错误代码。

  要将套接字设置为非阻塞模式,除了使用ioctlsocket()函数之外,还可以使用WSAAsyncselect()和WSAEventselect()函数。当调用该函数时,套接字会自动地设置为非阻塞方式。

  由于使用非阻塞套接字在调用函数时,会经常返回WSAEWOULDBLOCK错误。所以在任何时候,都应仔细检查返回代码并作好对“失败”的准备。应用程序连续不断地调用这个函数,直到它返回成功指示为止。上面的程序清单中,在While循环体内不断地调用recv()函数,以读入1024个字节的数据。这种做法很浪费系统资源。

  要完成这样的操作,有人使用MSG_PEEK标志调用recv()函数查看缓冲区中是否有数据可读。同样,这种方法也不好。因为该做法对系统造成的开销是很大的,并且应用程序至少要调用recv()函数两次,才能实际地读入数据。较好的做法是,使用套接字的“I/O模型”来判断非阻塞套接字是否可读可写。

  非阻塞模式套接字与阻塞模式套接字相比,不容易使用。使用非阻塞模式套接字,需要编写更多的代码,以便在每个Windows Sockets API函数调用中,对收到的WSAEWOULDBLOCK错误进行处理。因此,非阻塞套接字便显得有些难于使用。

  但是,非阻塞套接字在控制建立的多个连接,在数据的收发量不均,时间不定时,明显具有优势。这种套接字在使用上存在一定难度,但只要排除了这些困难,它在功能上还是非常强大的。通常情况下,可考虑使用套接字的“I/O模型”,它有助于应用程序通过异步方式,同时对一个或多个套接字的通信加以管理。

  1. IO复用(I/O multiplexing)

  主要是select和epoll;对一个IO端口,两次调用,两次返回,比阻塞IO并没有什么优越性;关键是能实现同时对多个IO端口进行监听;
Here Insert Picture Description
  I/O复用模型会用到select、poll、epoll函数,这几个函数也会使进程阻塞,但是和阻塞I/O所不同的的,这两个函数可以同时阻塞多个I/O操作。而且可以同时对多个读操作,多个写操作的I/O函数进行检测,直到有数据可读或可写时,才真正调用I/O操作函数。

  1. 信号驱动IO(signal driven I/O (SIGIO))

  两次调用,两次返回,首先我们允许套接口进行信号驱动I/O,并安装一个信号处理函数,进程继续运行并不阻塞。当数据准备好时,进程会收到一个SIGIO信号,可以在信号处理函数中调用I/O操作函数处理数据。
Here Insert Picture Description

  1. 异步非阻塞IO(asynchronous I/O (the POSIX aio_functions))

  数据拷贝的时候进程无需阻塞。
Here Insert Picture Description
  当一个异步过程调用发出后,调用者不能立刻得到结果。实际处理这个调用的部件在完成后,通过状态、通知和回调来通知调用者的输入输出操作

总结:

  • 同步IO引起进程阻塞,直至IO操作完成。
  • 异步IO不会引起进程阻塞。
  • IO复用是先通过select调用阻塞。
三、五种I/O模型的比较

Here Insert Picture Description

举个栗子,假如将要举办一场演唱会,小明想要购买这次演唱会的门票

  • 同步阻塞:
    小明从家到售票点买票,得知明天才能买票,小明直接在售票点等待,直到明天买到票之后回家
  • 非阻塞IO:
    小明从家到演唱会现场问售票员买票,但是票还没有出来,然后小明走了,去做其他的事情,过了几个小时再次来询问票有的情况,如果还没出来就继续干其他事,直到票可以买
  • IO复用:
    Java–》selector/Linux–》select、poll、eoll
    小明想去买演唱会的票,打电话告诉黄牛,帮忙留意一下售票时间,出票之后需要小明自己去买票
  • 信号IO:
    小明想买演唱会门票,给举办方打电话,确定售票时间之后通知小明来买票
  • 异步IO:
    小明想买演唱会门票,给举办方打电话,售票之后让快递员把票直接送到小明的家里,小明就不用自己去买票了

1. select、poll、epoll

  epoll跟select都能提供多路I/O复用的解决方案。在现在的Linux内核里有都能够支持,其中epoll是Linux所特有,而select则应该是POSIX所规定,一般操作系统均有实现

select:

select本质上是通过设置或者检查存放fd标志位的数据结构来进行下一步处理。这样所带来的缺点是:

  • 单个进程可监视的fd数量被限制,即能监听端口的大小有限。

  一般来说这个数目和系统内存关系很大,具体数目可以cat /proc/sys/fs/file-max察看。32位机默认是1024个。64位机默认是2048.

  • 对socket进行扫描时是线性扫描,即采用轮询的方法,效率较低:

  当套接字比较多的时候,每次select()都要通过遍历FD_SETSIZE个Socket来完成调度,不管哪个Socket是活跃的,都遍历一遍。这会浪费很多CPU时间。如果能给套接字注册某个回调函数,当他们活跃时,自动完成相关操作,那就避免了轮询,这正是epoll与kqueue做的。

  • 需要维护一个用来存放大量fd的数据结构,这样会使得用户空间和内核空间在传递该结构时复制开销大

poll:

  poll本质上和select没有区别,它将用户传入的数组拷贝到内核空间,然后查询每个fd对应的设备状态,如果设备就绪则在设备等待队列中加入一项并继续遍历,如果遍历完所有fd后没有发现就绪设备,则挂起当前进程,直到设备就绪或者主动超时,被唤醒后它又要再次遍历fd。这个过程经历了多次无谓的遍历。

它没有最大连接数的限制,原因是它是基于链表来存储的,但是同样有一个缺点:

  • 大量的fd的数组被整体复制于用户态和内核地址空间之间,而不管这样的复制是不是有意义;
  • poll还有一个特点是“水平触发”,如果报告了fd后,没有被处理,那么下次poll时会再次报告该fd;

epoll:

  epoll支持水平触发和边缘触发,最大的特点在于边缘触发,它只告诉进程哪些fd刚刚变为就需态,并且只会通知一次。还有一个特点是,epoll使用“事件”的就绪通知方式,通过epoll_ctl注册fd,一旦该fd就绪,内核就会采用类似callback的回调机制来激活该fd,epoll_wait便可以收到通知

epoll的优点:

  • 没有最大并发连接的限制,能打开的FD的上限远大于1024(1G的内存上能监听约10万个端口);
  • 效率提升,不是轮询的方式,不会随着FD数目的增加效率下降。只有活跃可用的FD才会调用callback函数;

  即Epoll最大的优点就在于它只管你“活跃”的连接,而跟连接总数无关,因此在实际的网络环境中,Epoll的效率就会远远高于select和poll。

  • 内存拷贝,利用mmap()文件映射内存加速与内核空间的消息传递;即epoll使用mmap减少复制开销。

2. select、poll、epoll 区别

  • 支持一个进程所能打开的最大连接数

select:

  Maximum number of connections that can be opened with a single process FD_SETSIZE macro definition, which size is the size of 32 integer (32-bit machine, the size is 32 32 on 64-bit machines Similarly FD_SETSIZE 32 64), we can of course to modify and recompile the kernel, but performance may be affected, which requires further testing.

poll:

  And select the poll no essential difference, but it does not limit the maximum number of connections, because it is based on a linked list to store

epoll:

  Although the number of connections has an upper limit, however great, can open a connection of about 100,000 on the machine 1G memory, 2G memory machines can open the connection about 200,000;

  • After the surge has brought the FD IO efficiency

select、poll:

  Because of the connection will be linear traversing each call, so as FD increases will result in slower traverse the "linear performance degradation issues."

epoll:

  Because epoll kernel implementation is achieved according to the callback function on each fd, only active socket will take the initiative to call callback, less so in the case of active socket, the use of performance issues epoll no linear decrease in both the front but when all the socket are active, there may be performance issues.

  • Message passing

select、poll:

  The kernel needs to deliver the message to the user space, kernel copy operation requires

epoll:

  epoll achieved through a shared kernel and user space memory.

  1. In summary, the choice of select, poll, epoll according to their own characteristics and specific occasions these three ways.
  • The case of the surface properties of the best epoll, but fewer in number of connections and connections are very active, select and poll performance may be better than epoll, epoll after the notification mechanism needs a lot of function callbacks;

  • select inefficient because it requires every polling. But also relatively inefficient, as the case may be, may be improved by good design;

Reference article: https://www.cnblogs.com/renxs/p/3683189.html

Guess you like

Origin blog.csdn.net/Daria_/article/details/91828740
Recommended