Zero-copy
This article reviews some of the content comes from the back of the reference, non-original articles just some of the key contents tidy, counted as a study notes.
The traditional I / O operations
The traditional IO operation is a user application just needs to call two system calls read () and write () to complete the data transfer operation, but the bottom will occur many steps, these steps are hidden on top. Let's comb.
When the application needs to access a piece of data:
- Application-initiated system call
read()
to read the file (context switch, a mode switch or mode switch 1 , switch to user mode kernel mode) - Operating system kernel will check this data is not already stored in the buffer operating system kernel address space, if there is direct return. If you are not on the next step.
- If you can not find this data in the kernel buffer (called the missing pages, missing pages will trigger an exception), Linux operating system kernel will first piece of data is read from the disk into the operating system kernel buffer to go (once the DMA 2 copies, page caching hard disk)
- This data is then copied into kernel address space to the application (a copy of CPU, kernel space to user space)
read()
Function returns. (Context switch, or a switch mode, user mode to kernel mode switching)- The application calls
write()
a function to write data to the socket buffer. (Context switch, or a switch mode, user mode to kernel mode switching) - The kernel needs to copy the data from the buffer again the user to the application address space associated with the network stack kernel buffer (CPU copy once, the kernel space)
- Copy DMA execution, the socket buffer is sent to the kernel data through the DMA physical NIC, a user space application during execution of
write()
the function returns. (Context switch, or a switch mode, user mode to kernel mode switching)
From the point of view the above process, after a context switch 4 or the mode switching, four copying operation (copy DMA 2, CPU 2 copies).
Why do we need zero-copy
From the above process point of view, four switches and four copies of the whole process relatively lengthy, but this is not a problem, in fact, this technology does not require network speed is slower times (56K cat, 10 / 100MB Ethernet), because the internal network and then jammed fast rate will be, barrel effect. But when significantly improve network speed appears 1Gb, 10Gb 100Gb speeds even when this zero-copy technology there is an urgent need, because the network transmission speed has been far greater than the speed of data transfer inside the computer. It is necessary to speed, this time people will focus on how to optimize your computer's internal data flow.
Zero-copy solve the problem
There are many techniques to achieve zero-copy, but ultimately aimed at reducing the intermediate link data transmission, in particular data copy of the user space and kernel space of the process.
A method of reducing the CPU copies
Direct I / O
Cache I / O, also known as standard I / O, most of the default file system I / O operations are cached I / O. In the Linux cache I / O mechanism, the operating system will be I / O data is cached in the file system cache pages (page cache). Find time to read the data if hit directly returned in the buffer, does not hit the disk to read. In fact, this mechanism is a good mechanism to increase the speed to reduce the IO operation, because after all, belong to the low-speed disk device.
Then turn write data when the application is first written to the page cache, as to whether it will synchronize to write to disk depending on the mechanism used immediately, in the end is synchronous or asynchronous write write. Synchronous write mechanism for the application will immediately get a response, and asynchronous writes will later get some response. Of course, there is another mechanism is delayed write mechanism, but delayed write written on the disk will not be notified when the application.
In the direct I / O mechanism, the data are directly transmitted directly between the user buffer and disk address space, all without page cache support. Such zero-copy technique for the operating system kernel is not required to direct the data processing situation. In some scenarios will be used to this way.
Kafka on the use of this cache I / O mechanism, write caching, when read also read from the cache, so that the throughput is very high, but the risk of data loss will be higher, because large amounts of data in memory, but the parameters can be Adjustment.
mmap
Application calls mmap()
after a context switch occurs 2 times (call and return). In addition to copy data twice DMA does not change the most important is to reduce the data copies once the kernel to user space, but a copy directly from the page cache to socke buffer, so with standard I / O ratio, it becomes 2 context switches, 2 copies of DMA, CPU 1 copies. This optimization reduces the intermediate links.
But the memory-mapped file is the application buffer and kernel space buffers are mapped to the same physical memory address range, you can also say that the operating system to share this buffer to the application, but the mapping operation is also a big overhead virtual memory operation, which need to change the page table and TLB flushing (so that the contents of the TLB invalid) to maintain consistency store. However, this refresh overhead than the TLB.
But mmap has a relatively large risks that call write () system call, if another process at this time truncate the file, then write () system call will be interrupted bus error signal SIGBUS, because at this time is being executed a memory access error. This signal will cause the process to be killed.
sendfile
From the figure can be seen the application calls sendfile()
a system call 2 occurs here only context switches (call and return). In addition to copy data twice DMA does not change the most important is to reduce the data copies once the kernel to user space, but a copy directly from the page cache to socke buffer, so with standard I / O ratio, it becomes 2 context switches, 2 copies of DMA, CPU 1 copies. This optimization reduces the intermediate links, improve the internal efficiency of the transmission also liberated the CPU. But this is not a zero-copy, as well as 1 CPU copy.
How to use this feature in high-level languages you need to see the language library functions, look at the underlying library function call is
sendfile()
a system call.
sendfile with DMA
这种方式就是为了解决sendfile中的那1次CPU拷贝,也就是内核缓冲区到socket缓冲区的拷贝。不拷贝的话该如何发送数据呢?就是将内核缓冲区中待发送数据的描述符发送到网络协议栈中,然后在socket缓冲区中建立数据包的结构,最后通过DMA的收集功能将所有的数据结合成一个网络数据包。网卡的 DMA 引擎会在一次操作中从多个位置读取包头和数据。Linux 2.4 版本中的 socket 缓冲区就可以满足这种条件,这也就是用于 Linux 中的众所周知的零拷贝技术。
- 首先,sendfile() 系统调用利用 DMA 引擎将文件内容拷贝到内核缓冲区去;
- 然后,将带有文件位置和长度信息的缓冲区描述符添加到 socket 缓冲区中去,此过程不需要将数据从操作系统内核缓冲区拷贝到 socket 缓冲区中;
- 最后,DMA 引擎会将数据直接从内核缓冲区拷贝到协议引擎中去,这样就避免了最后一次数据拷贝。
sendfile的局限性
首先,sendfile只适用于数据发送端;其次要发送的数据中间不能被修改而是原样发送的。