Some understanding of zero-copy

the term

Zero-copy

"Zero-copy" of the "copy" of the operating system in the I / O operation, the data is copied from one memory area to another memory area. The "zero" does not mean 0 replications, means more user copy the previous mode and kernel mode is 0 times.

CPU COPY

By principle we know the composition of the computer's memory read and write operations coordination is needed CPU data bus, address bus and control bus to complete

So when the "copy" occurs, often we need to suspend the existing CPU processing logic, to assist the memory read and write. This we call the CPU COPY

cpu copy not only takes up CPU resources, but also take up bus bandwidth.

DMA COPY

DMA (DIRECT MEMORY ACCESS) is an important function of modern computers. It is an important feature is that when you need to exchange data with peripherals, CPU only need to initialize this action will be able to continue to execute other instructions, the rest of the data transmission entirely by the DMA operation is completed

We can see DMA COPY avoided a lot of CPU interrupts

Context switching

Herein refers to the context switch from the user mode to kernel mode switching, and switching from kernel mode to user mode

Reasons for the existence of multiple copies

  1. In order to protect the operating system, application system is not intentionally or unintentionally destroyed, set up user mode and kernel mode two states for the operating system. Userland want to get the system resources (such as access to hard disk), the system must call into kernel mode , to obtain system resources by the kernel mode, and then switch back to return to user mode application.

  2. For performance optimization requires "readahead cache" asynchronous writing and the like, in an operating system kernel mode also adds a "core buffer" (kernel buffer). When reading data to the read data is not directly applied to the program buffer, and to read Kernel buffer, then copied from the buffer to the application buffer Kernel Thus, the data before being used in the application, may need to be copied a plurality of times

What are unnecessary copies

Before then answer this question, let's look at a scenario

Recall that all the systems in the real world, whether it is a web application servers, ftp servers, database servers, static file servers, etc., all the scenes related to data transmission, nothing more than one of:

 从硬盘上读取文件数据, 发送到网络上去.
复制代码

The scene we simplified as a model:

 File.read(fileDesc, buf, len);
 Socket.send(socket, buf, len);
复制代码

For ease of description, the above two lines of code, we gave it a name: read-send model

Operating system in the realization of this read-send model requires the following steps:

1. 应用程序开始读文件的操作
2. 应用程序发起系统调用, 从用户态切换到内核态(第一次上下文切换)
3. 内核态中把数据从硬盘文件读取到内核中间缓冲区(kernel buf)
4. 数据从内核中间缓冲区(kernel buf)复制到(用户态)应用程序缓冲区(app buf),从内核态切换回到用户态(第二次上下文切换)
5. 应用程序开始发送数据到网络上
6. 应用程序发起系统调用,从用户态切换到内核态(第三次上下文切换)
7. 内核中把数据从应用程序(app buf)的缓冲区复制到socket的缓冲区(socket)
8. 内核中再把数据从socket的缓冲区(socket buf)发送的网卡的缓冲区(NIC buf)上
9. 从内核态切换回到用户态(第四次上下文切换)
复制代码

FIG expressed as follows:

image

It can clearly be seen from the chart, it relates to a read-send four copies:

1. 硬盘拷贝到内核缓冲区(DMA COPY)
2. 内核缓冲区拷贝到应用程序缓冲区(CPU COPY)
3. 应用程序缓冲区拷贝到socket缓冲区(CPU COPY)
4. socket buf拷贝到网卡的buf(DMA COPY)
复制代码

2 which involves cpu interrupts, context switches and four times

Obviously, copy the 2nd and 3rd of just copying data to the app buffer and copy it back intact, for which twice brought the cpu copy and two context switches, is completely unnecessary

linux zero-copy technology is to optimize away unnecessary copies of these two

sendFile

2.1 linux kernel call sendFile started to introduce a system call, the system call to copy data directly from the kernel buffer into the socket (SOCKET) buffer within the kernel mode, which can reduce unnecessary replication context switching and data

This system call is actually a high-level I / O functions, function signature as follows:

#include<sys/sendfile.h>
ssize_t senfile(int out_fd,int in_fd,off_t* offset,size_t count);
复制代码
  1. out_fd is to write the file descriptor, and must be a socket
  2. in_fd is read the contents of the file descriptor must be a real file, not a pipe or socket
  3. offset is started reading position
  4. count is the number of bytes to be read

With sendFile after this system call, we read-send model can be simplified to:

1. 应用程序开始读文件的操作
2. 应用程序发起系统调用, 从用户态切换到内核态(第一次上下文切换)
3. 内核态中把数据从硬盘文件读取到内核中间缓冲区
4. 通过sendFile,在内核态中把数据从内核缓冲区复制到socket的缓冲区
5. 内核中再把数据从socket的缓冲区发送的网卡的buf上
6. 从内核态切换到用户态(第二次上下文切换)
复制代码

As shown below:

image

Data related to the copy becomes:

1. 硬盘拷贝到内核缓冲区(DMA COPY)
2. 内核缓冲区拷贝到socket缓冲区(CPU COPY)
3. socket缓冲区拷贝到网卡的buf(DMA COPY)
复制代码

It can be seen once read-send model, using the system call sendFile, can be reduced to 4 times data is copied 3 times, 4 times 2 times to reduce context switches, CPU interrupt 2 is reduced to 1

Compared with the traditional I / O, this zero-copy technique by reducing two context switches, 1 cpu copy, may be increased by more than 50% I / O performance (data network, not pro-test)

Speaking in terms of start, so-called zero-copy "zero", is the number of copying between user mode and kernel mode 0, from this definition, the current zero-copy technology has been a true "zero" the

However, the pursuit of the ultimate performance of great scientists and engineers are not satisfied with this. They better cpu copy of the 2nd intermediate still bear a grudge, tried every means to get rid of this time there is no need to copy data and CPU interrupt

Supports scatter-gather characteristics sendFile

In a later version 2.4 kernel, linux kernel socket buffer descriptor is optimized. With this optimization, sendFile system calls can only copy the basis of a small amount of meta information on the kernel buffer, the data is copied directly from the kernel buffer to buffer card to go thus avoiding the copy from the "kernel buffer" to "socket buffer zone" in which one copy.

sendFile after this optimization, we call support scatter-gather characteristics sendFile

In support support scatter-gather characteristics sendFile, our model can optimize the read-send to:

1. 应用程序开始读文件的操作
2. 应用程序发起系统调用, 从用户态进入到内核态(第一次上下文切换)
3. 内核态中把数据从硬盘文件读取到内核中间缓冲区
4. 内核态中把数据在内核缓冲区的位置(offset)和数据大小(size)两个信息追加(append)到socket的缓冲区中去
5. 网卡的buf上根据socekt缓冲区的offset和size从内核缓冲区中直接拷贝数据
6. 从内核态返回到用户态(第二次上下文切换)
复制代码

This process is shown below:

image

Finally, data is copied into only two DMA COPY:

1. 硬盘拷贝到内核缓冲区(DMA COPY)
2. 内核缓冲区拷贝到网卡的buf(DMA COPY)
复制代码

perfect

mmap和sendFile

MMAP (memory mapped file), refers to the map file into the process address space to achieve one correspondence between the virtual address space of a process with a physical address on the hard disk.

MMAP is another for zero-copy system calls. SendFile now is not the same place, it is the use of shared memory space the way, avoiding data copying between app buf and kernel buf (buf two share the same section of memory)

mmap sendFile relative to the benefits of:

  1. When multiple processes access the same file, you can save a lot of memory.
  2. Since the data is sent directly to the network in the kernel, the application can not be operated in the user mode data again.

mmap sendFile disadvantages with respect to:

  1. When a memory-mapped file, and then call write, while another process truncates the same file, may be interrupted bus error signal SIGBUS, the default behavior of this signal is to kill off the process and dump core. This server is generally not acceptable
  2. Continuous sequential access of small files, as sendFile of efficient readahead cahce

reference

juejin.im/post/5c70d8… www.jianshu.com/p/e9f422586… www.ibm.com/developerwo… blog.csdn.net/u014303647/… www.jianshu.com/p/f3bea2f6c…

Guess you like

Origin juejin.im/post/5d060177f265da1b8811daea