Linux in the zero-copy

Zero-copy

This article reviews some of the content comes from the back of the reference, non-original articles just some of the key contents tidy, counted as a study notes.

The traditional I / O operations

The traditional IO operation is a user application just needs to call two system calls read () and write () to complete the data transfer operation, but the bottom will occur many steps, these steps are hidden on top. Let's comb.

When the application needs to access a piece of data:

  1. Application-initiated system call read()to read the file (context switch, a mode switch or mode switch 1 , switch to user mode kernel mode)
  2. Operating system kernel will check this data is not already stored in the buffer operating system kernel address space, if there is direct return. If you are not on the next step.
  3. If you can not find this data in the kernel buffer (called the missing pages, missing pages will trigger an exception), Linux operating system kernel will first piece of data is read from the disk into the operating system kernel buffer to go (once the DMA 2 copies, page caching hard disk)
  4. This data is then copied into kernel address space to the application (a copy of CPU, kernel space to user space)
  5. read()Function returns. (Context switch, or a switch mode, user mode to kernel mode switching)
  6. The application calls write()a function to write data to the socket buffer. (Context switch, or a switch mode, user mode to kernel mode switching)
  7. The kernel needs to copy the data from the buffer again the user to the application address space associated with the network stack kernel buffer (CPU copy once, the kernel space)
  8. Copy DMA execution, the socket buffer is sent to the kernel data through the DMA physical NIC, a user space application during execution of write()the function returns. (Context switch, or a switch mode, user mode to kernel mode switching)

From the point of view the above process, after a context switch 4 or the mode switching, four copying operation (copy DMA 2, CPU 2 copies).

Why do we need zero-copy

From the above process point of view, four switches and four copies of the whole process relatively lengthy, but this is not a problem, in fact, this technology does not require network speed is slower times (56K cat, 10 / 100MB Ethernet), because the internal network and then jammed fast rate will be, barrel effect. But when significantly improve network speed appears 1Gb, 10Gb 100Gb speeds even when this zero-copy technology there is an urgent need, because the network transmission speed has been far greater than the speed of data transfer inside the computer. It is necessary to speed, this time people will focus on how to optimize your computer's internal data flow.

Zero-copy solve the problem

There are many techniques to achieve zero-copy, but ultimately aimed at reducing the intermediate link data transmission, in particular data copy of the user space and kernel space of the process.

A method of reducing the CPU copies

Direct I / O

Cache I / O, also known as standard I / O, most of the default file system I / O operations are cached I / O. In the Linux cache I / O mechanism, the operating system will be I / O data is cached in the file system cache pages (page cache). Find time to read the data if hit directly returned in the buffer, does not hit the disk to read. In fact, this mechanism is a good mechanism to increase the speed to reduce the IO operation, because after all, belong to the low-speed disk device.

Then turn write data when the application is first written to the page cache, as to whether it will synchronize to write to disk depending on the mechanism used immediately, in the end is synchronous or asynchronous write write. Synchronous write mechanism for the application will immediately get a response, and asynchronous writes will later get some response. Of course, there is another mechanism is delayed write mechanism, but delayed write written on the disk will not be notified when the application.

In the direct I / O mechanism, the data are directly transmitted directly between the user buffer and disk address space, all without page cache support. Such zero-copy technique for the operating system kernel is not required to direct the data processing situation. In some scenarios will be used to this way.

Kafka on the use of this cache I / O mechanism, write caching, when read also read from the cache, so that the throughput is very high, but the risk of data loss will be higher, because large amounts of data in memory, but the parameters can be Adjustment.

mmap

Application calls mmap()after a context switch occurs 2 times (call and return). In addition to copy data twice DMA does not change the most important is to reduce the data copies once the kernel to user space, but a copy directly from the page cache to socke buffer, so with standard I / O ratio, it becomes 2 context switches, 2 copies of DMA, CPU 1 copies. This optimization reduces the intermediate links.

But the memory-mapped file is the application buffer and kernel space buffers are mapped to the same physical memory address range, you can also say that the operating system to share this buffer to the application, but the mapping operation is also a big overhead virtual memory operation, which need to change the page table and TLB flushing (so that the contents of the TLB invalid) to maintain consistency store. However, this refresh overhead than the TLB.

But mmap has a relatively large risks that call write () system call, if another process at this time truncate the file, then write () system call will be interrupted bus error signal SIGBUS, because at this time is being executed a memory access error. This signal will cause the process to be killed.

sendfile

From the figure can be seen the application calls sendfile()a system call 2 occurs here only context switches (call and return). In addition to copy data twice DMA does not change the most important is to reduce the data copies once the kernel to user space, but a copy directly from the page cache to socke buffer, so with standard I / O ratio, it becomes 2 context switches, 2 copies of DMA, CPU 1 copies. This optimization reduces the intermediate links, improve the internal efficiency of the transmission also liberated the CPU. But this is not a zero-copy, as well as 1 CPU copy.

How to use this feature in high-level languages you need to see the language library functions, look at the underlying library function call is sendfile()a system call.

sendfile with DMA

这种方式就是为了解决sendfile中的那1次CPU拷贝,也就是内核缓冲区到socket缓冲区的拷贝。不拷贝的话该如何发送数据呢?就是将内核缓冲区中待发送数据的描述符发送到网络协议栈中,然后在socket缓冲区中建立数据包的结构,最后通过DMA的收集功能将所有的数据结合成一个网络数据包。网卡的 DMA 引擎会在一次操作中从多个位置读取包头和数据。Linux 2.4 版本中的 socket 缓冲区就可以满足这种条件,这也就是用于 Linux 中的众所周知的零拷贝技术。

  • 首先,sendfile() 系统调用利用 DMA 引擎将文件内容拷贝到内核缓冲区去;
  • 然后,将带有文件位置和长度信息的缓冲区描述符添加到 socket 缓冲区中去,此过程不需要将数据从操作系统内核缓冲区拷贝到 socket 缓冲区中;
  • 最后,DMA 引擎会将数据直接从内核缓冲区拷贝到协议引擎中去,这样就避免了最后一次数据拷贝。

sendfile的局限性

首先,sendfile只适用于数据发送端;其次要发送的数据中间不能被修改而是原样发送的。

参考

Linux 中的零拷贝技术,第 1 部分

Linux 中的零拷贝技术,第 2 部分

Linux 中直接 I/O 机制的介绍


  1. 模式切换属于上下文切换的范围,只不过不是通常的进程或者线程上下文切换。原则上用户空间应用程序不能直接和硬件交互,只有内核才可以,所以应用程序必须通过系统调用来实现对硬件的访问,其本质就是应用程序代码暂时不执行,而是在CPU上运行内核代码,内核代码执行完成后在切换回来,这也就常说的内核陷入。

  2. 这里以读为例。就是磁盘说:把从第x号扇区开始的y个扇区的数据写入到从p地址开始的内存中,写完了告诉我(触发中断)。这个操作叫做DMA,整个过程不需要CPU参与。不过DMA只能实现页缓存到外设之间的数据拷贝。

Guess you like

Origin www.cnblogs.com/rexcheny/p/12178014.html