Understand the principle of zero copy, summarize

Why zero copy?

        The traditional IO copy has too many copies in the computer and the speed is too slow. Zero copy can reduce the number of copies and increase the system performance. In addition, zero-copy does not mean that no files are copied, but only reduces the number of copies.

1. Basic Concepts

1.1, DMA controller

​ First of all, you need to know the read and write speed of the computer hardware, which is roughly as follows
:

  • The CPU cache is the fastest, and here it can be considered the speed of the aircraft.
  • The speed of the network card can be considered as the speed of the car.
  • Hard drives can be thought of as walking speed.

        If the files on the hard disk are sent out through the network card, ordinary IO, and the CPU cache have to write data to the network card, even if the CPU is faster, it will be affected by the other two devices. So the CPU needs a little brother DMA controller.

insert image description here

        Direct Memory Access (DMA) is a way of working where I/O swapping is performed entirely by hardware. In this way, the DMA controller completely takes over the control of the bus from the CPU, and the data exchange does not go through the CPU, but directly between the memory and the I/O device. The DMA method is generally used for high-speed transfer of grouped data. The DMA controller will send address and control signals to the memory, modify the address, count the number of words transferred, and report the end of the transfer operation to the CPU in an interrupt mode.

        The main advantage of the DMA method is that it is fast. Since the CPU does not participate in the transfer operation at all, operations such as CPU fetching, fetching, and sending are omitted. In the process of data transmission, there is no work such as saving the scene or restoring the scene. The modification of the memory address, the count of the number of transmitted words, etc., are not implemented by software, but directly implemented by hardware circuits. Therefore, the DMA method can meet the requirements of high-speed I/O devices, and is also conducive to the utilization of CPU efficiency.

1.2, CPU user mode and kernel mode

        People in the world are inherently unequal, which is like, your wife can manage your money, but you can only make money. The same is true for programs. Some programs have high authority and can access any resource of the computer, but some programs have low authority and can only access some resources. These two types of programs can be mapped to the user mode and kernel mode of the CPU. It is easy to understand that the kernel mode is the core of the computer and can access any resources of the computer, such as network cards and hard disks. However, for security, the CPU cannot allow the user program to access any resources of the computer unscrupulously, so that if the user program is unstable, it may cause the system to crash, so there is a user state.

  • Kernel state: The cpu can access all data in memory, including peripheral devices such as hard disks, network cards, and the cpu can also switch itself from one program to another.
  • User mode: only limited access to memory, and peripheral devices are not allowed, the ability to occupy the CPU is deprived, and the CPU resources can be obtained by other programs.

        Therefore, when the CPU wants to read the hard disk file, it needs to switch from user mode to kernel mode to have permission. After the reading is completed, for program security, it is necessary to switch from kernel mode to user mode.

2. Zero copy

2.1. Ordinary copy

image-20211101145912594

In ordinary copying, the approximate process is as follows

  1. When the cpu switches to the kernel mode, first go to the kernel mode to query the kernel buffer. If the kernel buffer exists, it can be directly copied to the user space.
  2. If the kernel buffer doesn't, the CPU will let the DMA load into kernel space. There will be a DMA copy here.
  3. After copying to the kernel buffer, the CPU will copy away from the kernel buffer. This is a CPU copy. After the copy is completed, switch to user mode.
  4. When writing, switch to kernel mode again. After the switch is completed, write to the socket buffer. After writing, switch to user mode.
  5. DMA sends the data in the socket buffer to the peer through the network card in an asynchronous manner.

Summarize this ordinary IO. There are a total of 4 CPU switches (blue in the figure above), 2 times for reading and 2 times for writing. 4 file copies, namely:

  1. file from hard disk to kernel space
  2. kernel space to CPU
  3. CPU to socket buffer
  4. socket buffer to the network card.

2.2, mmap zero copy

image-20211101154349830

        mmap is a way of zero-copy through virtual memory. That is to say, user space and kernel space use the same physical address. This way, the file no longer needs to go through user space. It is possible to copy directly from the kernel buffer to the socket buffer. One file copy is reduced.

2.3, sendfile zero copy

image-20211101155055995

        The sendfile function transfers data between two file descriptors (operating entirely in the kernel), thus avoiding data copying between the kernel buffer and the user buffer, which is very efficient and is called zero copy

  1. The system call sendfile() copies the hard disk data to the kernel buffer through DMA, and then the data is directly copied by the kernel to another socket buffer related to the socket. There is no switching between user mode and kernel mode, and the copy from one buffer to another is directly completed in the kernel.
  2. DMA directly copies data from the kernel buffer to the protocol stack, without switching, and does not require data from user mode and kernel mode, because the data is in the kernel.

Summarize

Zero-copy does not mean no copies, it refers to reducing the number of copies. There are two ways mmap and sendfile.

  1. mmap is suitable for reading and writing small amounts of data, and sendFile is suitable for large file transfers. (There is no detailed theoretical basis for this. If you have clues, please leave a message)
  2. mmap requires 4 context switches and 3 data copies; sendFile requires 3 context switches and at least 2 data copies.
  3. sendFile can use DMA to reduce CPU copying, but mmap cannot (must be copied from the kernel to the Socket buffer).

If there is any mistake, please correct me.

Guess you like

Origin blog.csdn.net/qq_30285985/article/details/121083283