Explain the zero-copy technology in Linux in detail, and Xiaobai can understand it in seconds!

Preface

In the vernacular, zero copy means that data is not copied from one storage area to another. But without data duplication, how can data transmission be realized?

In fact, the zero copy we encountered in java NIO, netty, and kafka is not to not copy data, but to reduce the number of unnecessary data copies, thereby improving code performance

The content of this article:

1. The benefits of zero copy:

2. "User space" and "kernel space" of Linux system

3. Implementation direction of zero-copy technology in Linux

4. The principle of zero copy mechanism

4.1. Traditional I/O

4.2、DMA

5. Zero-copy I/O realized by sendfile

6. I/O realized by sendfile with DMA collection and copy function

7. "Traditional I/O" VS "sendfile zero copy I/O"

8. Zero-copy I/O through mmap

9. FileChannel and zero copy

10. Zero copy method provided by java


 

1. The benefits of zero copy:

  • Reduce or even completely avoid unnecessary CPU copies, thereby freeing the CPU to perform other tasks
  • Reduce memory bandwidth usage
  • Usually zero-copy technology can also reduce context switching between user space and operating system kernel space
     

2. "User space" and "kernel space" of Linux system

  • Kernel space: the space used by Linux itself; it mainly provides functions such as process scheduling, memory allocation, and connection of hardware resources

  • User space: the space provided to each program process; the user space does not have the authority to access the resources of the kernel space. If the application needs to use the resources of the kernel space, it needs to be completed through a system call: switch from user space to kernel space, complete Switch from kernel space back to user space after related operations

 

3. Implementation direction of zero-copy technology in Linux

① Direct I/O: For this data transmission method, the application program can directly access the hardware storage, and the operating system kernel only assists the data transmission. There is still a context switch between user space and kernel space in this way, but the data on the hardware will not be copied to the kernel space, but directly copied to the user space, so there is no kernel space buffer and user for direct I/O. Data copy between space buffers.

② In the process of data transmission, avoid the CPU copy of data between the user space buffer and the system kernel space buffer, and the CPU copy of data in the system kernel space. This article mainly discusses the zero copy mechanism in this way.

③ copy-on-write (copy-on-write technology): In some cases, the kernel space buffer of the Linux operating system may be shared by multiple applications, and the operating system may map the user space buffer address to the kernel In the space buffer. When the application needs to modify the shared data, it needs to actually copy the data to the user space buffer of the application, and modifying the data in the user space buffer of its own will not affect other shared data applications program. Therefore, if the application does not need to modify the data, there will be no operation of copying data from the system kernel space buffer to the user space buffer.

Note that whether various zero-copy mechanisms can be implemented depends on whether the underlying operating system provides corresponding support.

 

4. The principle of zero copy mechanism

Below we use a very common application scenario in Java: sending files in the system to the remote end (this process involves: files on disk -> memory (byte array) -> transmission to users/network) to expand the tradition in detail I/O operations and I/O operations implemented by zero copy.

4.1. Traditional I/O

#include <unistd>
ssize_t write(int filedes, void *buf, size_t nbytes);
ssize_t read(int filedes, void *buf, size_t nbytes);
  • For example, java reads a disk file on the linux system and sends it to the remote service

  • 1) Issuing the read system call will cause a context switch from user space to kernel space, and then read the data in the file from the disk to the kernel space buffer through DMA

  • 2) Then copy the data in the kernel space buffer to the user space process memory, and then the read system call returns. The return of the system call will cause a context switch from kernel space to user space.

  • 3) The write system call will again cause a context switch from user space to kernel space, copy the memory data in the user space process to the socket buffer in the kernel space (also the kernel buffer, but for socket use), and then The write system call returns, triggering the context switch again

  • 4) As for the data transmission from the socket buffer to the network card, it is an independent and asynchronous process, which means that the return of the write system call does not guarantee that the data is transmitted to the network card

Q: You may ask what does independent and asynchronous mean? Could it be that the call returns before the data is transmitted?
A: In fact, the return of the call does not guarantee that the data is transmitted; it does not even guarantee the start of the transmission. It just means that the data we want to send is put into a queue to be sent, and there may be many packets in the queue before us. Unless the driver or hardware implements a priority ring or queue, data is transmitted in a first-in, first-out manner.

In general, the traditional I/O operation performed 4 context switches between user space and kernel space, and 4 data copies. Among the 4 data copies, 2 DMA copies and 2 CPU copies are included.

Q: Why does the traditional I/O mode read data from the disk to the kernel space buffer, and then copy the data from the kernel space buffer to the user space buffer? Why not just read the data directly from the disk to the user space buffer?
A: The reason why the traditional I/O mode reads data from the disk to the kernel space buffer rather than directly to the user space buffer is to reduce disk I/O operations to improve performance. Because the OS will pre-read more file data into the kernel space buffer during a read() system call based on the principle of locality, so that when the next read() system call is made, it will find that the data to be read already exists When in the kernel space buffer, just copy the data directly to the user space buffer. There is no need to perform an inefficient disk I/O operation (note: the speed of disk I/O operation is several times slower than direct memory access Magnitude).
Q: Since the system kernel buffer can reduce disk I/O operations, what is the BufferedInputStream buffer we often use for?
A: The function of BufferedInputStream is to automatically prefetch more data for us according to the situation into an internal byte data buffer it maintains. This can reduce the number of system calls to improve performance.

In general, a major use of the kernel space buffer is to reduce disk I/O operations, because it will pre-read more data from the disk into the buffer. The use of BufferedInputStream is to reduce "system calls."

4.2、DMA

DMA (Direct Memory Access) —— Direct memory access: DMA allows peripheral components to directly transfer I/O data to the main memory and the transfer does not require the participation of the CPU, thereby freeing the CPU to complete other things .
The data transmission between user space and kernel space does not have a transmission tool like DMA that does not require CPU participation. Therefore, the data transmission between user space and kernel space requires the full participation of the CPU. So there is zero copy technology to reduce and avoid unnecessary CPU data copy process.
 

5. Zero-copy I/O realized by sendfile

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

  • 1) Issue the sendfile system call, causing a context switch from user space to kernel space, and then copy the contents of the disk file to the kernel space buffer through the DMA engine, and then copy the data from the kernel space buffer to the socket-related buffer Area

  • 2) The sendfile system call returns, resulting in a context switch from kernel space to user space. DMA asynchronously transfers the data in the kernel space socket buffer to the network card

In general, the zero-copy I/O implemented by sendfile only uses two context switches between user space and kernel space, and three copies of data. Among the three data copies, two DMA copies and one CPU copy are included.

Q: But there is still a CPU copy operation here, that is, kernel buffer ——> socket buffer. Is there a way to cancel the copy operation?
A: Yes. But this requires the support of the underlying operating system. Starting from Linux 2.4, the bottom layer of the operating system provides scatter/gather as a DMA method to read data directly from the kernel space buffer to the protocol engine without having to copy the data in the kernel space buffer. To the buffer associated with the kernel space socket.
 

6. I/O realized by sendfile with DMA collection and copy function

  • Starting from Linux version 2.4, the operating system provides scatter and gather SG-DMA methods, which directly read data from the kernel space buffer to the network card without copying the data in the kernel space buffer to the socket buffer.

  • 1) Issue the sendfile system call, causing a context switch from user space to kernel space. Copy the contents of the disk file to the kernel space buffer through the DMA engine
  • 2) The data is not copied to the socket buffer; instead, the corresponding descriptor information is copied to the socket buffer. The descriptor contains two kinds of information: A) the memory address of the kernel buffer, B) the offset of the kernel buffer

  • 3) The sendfile system call returns, resulting in a context switch from kernel space to user space. DMA directly copies the data in the kernel buffer to the network card according to the address and offset provided by the descriptor of the socket buffer

In general, the I/O implemented by sendfile with DMA collection and copy function uses two context switches between user space and kernel space, and two data copies, and these two data copies are non-CPU copies. In this way, we have achieved the ideal zero-copy I/O transmission, without any one-time CPU copy, and minimal context switching

About sendfile:

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

Before the linux2.6.33 version, sendfile refers to the support file to transfer data between the socket, that is, in_fd is equivalent to a file that supports mmap, and out_fd must be a socket. But starting from version linux2.6.33, out_fd can be any type of file descriptor. So from the linux2.6.33 version, sendfile can support data transmission between "file to file" and "file to socket".
 

7. "Traditional I/O" VS "sendfile zero copy I/O"

  • Traditional I/O uses two system instructions read and write to complete the data reading and transmission operations, so that the overhead of context switching between user space and kernel space is generated four times; and sendfile only uses one instruction to complete the data Therefore, only two context switches between user space and kernel space occurred.
  • Traditional I/O has produced 2 useless CPU copies, that is, the copy of the data in the kernel space cache and the data in the user space buffer; while sendfile only produces one CPU copy at most, that is, the data copy between the kernel space, or even With the support of the underlying operating system, sendfile can achieve I/O with zero CPU copy.
  • Because data is stored in the traditional I/O user space buffer, the application can modify the data and other operations; and sendfile zero copy eliminates all the data copy process between the kernel space buffer and the user space buffer, so The realization of sendfile zero-copy I/O is completed in the kernel space, which makes it impossible for the application to operate on the data.

Q: Regarding the third point above, what should we do if we need to manipulate the data?
A: Linux provides mmap zero copy to meet our needs.
 

8. Zero-copy I/O through mmap

mmap (memory mapping) is a more expensive method than sendfile but better than traditional I/O.

#include <sys/mman.h>
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)

  • 1) Issue the mmap system call, causing a context switch from user space to kernel space. Then copy the data in the disk file to the kernel space buffer through the DMA engine

  • 2) The mmap system call returns, resulting in a context switch from kernel space to user space

  • 3) There is no need to copy data from kernel space to user space, because user space and kernel space share this buffer

  • 4) Issue the write system call, resulting in a context switch from user space to kernel space. Copy data from the kernel space buffer to the kernel space socket buffer; the write system call returns, resulting in a context switch from kernel space to user space

  • 5) Asynchronous, the DMA engine copies the data in the socket buffer to the network card

In general, the zero-copy I/O implemented by mmap performed 4 context switches between user space and kernel space, and 3 data copies. Among the 3 data copies, 2 DMA copies and 1 CPU copy are included.
 

9. FileChannel and zero copy

The zero-copy technology we mentioned above is extensively used in FileChannel.
The map method of FileChannel returns a MappedByteBuffer. MappedByteBuffer is a direct byte buffer, the memory of the buffer is a memory mapped area of ​​a file. The bottom layer of the map method is realized by mmap, so after reading the file memory from the disk to the kernel buffer, the user space and the kernel space share the buffer.
The MappedByteBuffer memory mapped file is a special file that allows Java programs to directly access from memory. We can map the entire file or a part of the entire file into the memory, then the operating system will make the relevant page request and write the memory modification into the file. Our application only needs to process data in memory, so that very fast I/O operations can be achieved.

Three modes of FileChannel map

  • Read only mode
/**
 * Mode for a read-only mapping.
 */
public static final MapMode READ_ONLY = new MapMode("READ_ONLY");

For read-only mode, if the program tries to write, it will throw a ReadOnlyBufferException

  • Read and write mode
/**
 * Mode for a read/write mapping.
 */
public static final MapMode READ_WRITE = new MapMode("READ_WRITE");

The read-write mode indicates that the changes made to the buffer will eventually be broadcast to the file. But this modification may or may not be visible by other programs that map the same file.

  • Dedicated mode
/**
 * Mode for a private (copy-on-write) mapping.
 */
public static final MapMode PRIVATE = new MapMode("PRIVATE");

In private mode, changes to the result buffer will not be broadcast to the file and will not be visible to other programs that map the same file. Instead, it will cause the modified part of the buffer to be copied to user space alone. This is the "copy on write" principle of OS.

FileChannel的transferTo、transferFrom

If the bottom layer of the operating system supports it, transferTo and transferFrom will also use related zero-copy technology to realize data transmission. Therefore, whether to use zero copy here must depend on the underlying system implementation.
 

10. Zero copy method provided by java

  • The zero-copy implementation of java NIO is based on mmap+write

  • The MappedByteBuffer generated by FileChannel's map method FileChannel provides the map() method, which can establish a virtual memory mapping between an open file and MappedByteBuffer. MappedByteBuffer inherits from ByteBuffer; the memory of this buffer is the memory mapping area of ​​a file . The bottom layer of the map method is realized by mmap, so after reading the file memory from the disk to the kernel buffer, the user space and the kernel space share the buffer. Usage is as follows

public void main(String[] args){
    try {
        FileChannel readChannel = FileChannel.open(Paths.get("./cscw.txt"), StandardOpenOption.READ);
        FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
        MappedByteBuffer data = readChannel.map(FileChannel.MapMode.READ_ONLY, 0, 1024 * 1024 * 40);
        //数据传输
        writeChannel.write(data);
        readChannel.close();
        writeChannel.close();
    }catch (Exception e){
        System.out.println(e.getMessage());
    }
}
  • If the transferTo and transferFrom of FileChannel are supported by the bottom layer of the operating system, transferTo and transferFrom will also use related zero-copy technology to realize data transmission. Usage is as follows
public void main(String[] args) {
    try {
        FileChannel readChannel = FileChannel.open(Paths.get("./cscw.txt"), StandardOpenOption.READ);
        FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
        long len = readChannel.size();
        long position = readChannel.position();
        //数据传输
        readChannel.transferTo(position, len, writeChannel);
        //效果和transferTo 一样的
        //writeChannel.transferFrom(readChannel, position, len, );
        readChannel.close();
        writeChannel.close();
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
}

 

11. Finally

This article is a very superficial knowledge of the zero-copy mechanism through video learning and a large number of data queries, at least personally. Through this study, I learned more about the Linux operating system. Everyone is very welcome to give pointers to the deficiencies and errors in the article~

  • Finally, share the learning materials with everyone, all for free !

Learning materials: Click here to receive it for free, password: CSDN , and there are more interview materials and videos from major companies!

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_43080036/article/details/109290905