Linux zero copy mechanism and FileChannel

Preface

In a vernacular explanation, zero copy means not copying data from one storage area to another. But without data duplication, how is it possible to realize data transmission? In fact, the zero copy we encountered in java NIO, netty, and kafka is not to not copy data, but to reduce the number of unnecessary data copies, thereby improving code performance

  • Benefits of zero copy
  • Kernel space and user space
  • Buffer and virtual memory
  • Traditional I/O
  • Zero copy achieved by mmap+write
  • Zero copy implemented by sendfile
  • Zero copy realized by sendfile with DMA collection copy function
  • Zero copy method provided by java

Benefits of zero copy

  • Reduce or avoid unnecessary CPU data copy, thereby freeing the CPU to perform other tasks
  • Zero copy mechanism can reduce context switching between user space and operating system kernel space
  • Reduce memory usage

Kernel space and user space

  • Kernel space: the space used by Linux itself; it mainly provides functions such as process scheduling, memory allocation, and connection of hardware resources
  • User space: the space provided to each program process; the user space does not have the authority to access the resources of the kernel space. If the application needs to use the resources of the kernel space, it needs to be completed through a system call: switch from user space to kernel space, complete Switch from kernel space back to user space after related operations

Buffer and virtual memory

  • Direct Memory Access (DMA)
    • Direct memory access: DMA allows direct IO data transfer between peripheral devices and memory storage, and the process does not require CPU participation

  • The buffer is the basis of all I/O, I/O is nothing more than moving data in or out of the buffer
    • The process initiates a read request, and the kernel first checks whether the data needed by the process exists in the kernel space buffer. If it already exists, it directly copies the data to the memory area of ​​the process. If not, the system requests data from the disk, writes the kernel's read buffer area through DMA, and then copies the kernel buffer data to the process's memory area
    • When the process initiates a write request, it copies the data in the memory area of ​​the process to the write buffer of the kernel, and then flushes the kernel buffer data back to the disk or network card through DMA
  • Virtual memory: Modern operating systems all use virtual memory, which has the following two benefits
    • More than one virtual address can point to the same physical memory address
    • The virtual memory space can be larger than the actual available physical address
  • Using the first feature, the kernel space address and the user space virtual address can be mapped to the same physical address, so that DMA can fill (read and write) the buffer that is visible to the kernel and user space processes at the same time; roughly as follows

Traditional I/O

#include <unistd>
ssize_t write(int filedes, void *buf, size_t nbytes);
ssize_t read(int filedes, void *buf, size_t nbytes);
  • For example, java reads a disk file on the linux system and sends it to the remote service

  • 1) Issuing the read system call will cause a context switch from user space to kernel space, and then read the data in the file from the disk to the kernel space buffer through DMA
  • 2) Then copy the data in the kernel space buffer to the user space process memory, and then the read system call returns. The return of the system call will cause a context switch from kernel space to user space.
  • 3) The write system call will again cause a context switch from user space to kernel space, copy the memory data in the user space process to the socket buffer in the kernel space (also the kernel buffer, but for socket use), and then The write system call returns, triggering the context switch again
  • 4) As for the data transmission from the socket buffer to the network card, it is an independent and asynchronous process, which means that the return of the write system call does not guarantee that the data is transmitted to the network card

There are four context switches between user space and kernel space. Four data copies, two CPU data copies and two DMA data copies

Zero copy realized by mmap+write

#include <sys/mman.h>
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)

  • 1) Issue the mmap system call, causing a context switch from user space to kernel space. Then copy the data in the disk file to the kernel space buffer through the DMA engine
  • 2) The mmap system call returns, resulting in a context switch from kernel space to user space
  • 3) There is no need to copy data from kernel space to user space, because user space and kernel space share this buffer
  • 4) Issue the write system call, resulting in a context switch from user space to kernel space. Copy data from the kernel space buffer to the kernel space socket buffer; the write system call returns, resulting in a context switch from kernel space to user space
  • 5) Asynchronous, the DMA engine copies the data in the socket buffer to the network card

The zero-copy I/O implemented by mmap performed 4 context switches between user space and kernel space, and 3 data copies; the 3 data copies included 2 DMA copies and 1 CPU copy

Zero copy implemented by sendfile

#include <sys/sendfile.h>
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

  • 1) Issue the sendfile system call, causing a context switch from user space to kernel space, and then copy the contents of the disk file to the kernel space buffer through the DMA engine, and then copy the data from the kernel space buffer to the socket-related buffer Area
  • 2) The sendfile system call returns, resulting in a context switch from kernel space to user space. DMA asynchronously transfers the data in the kernel space socket buffer to the network card

The zero-copy I/O implemented by sendfile uses two context switches between user space and kernel space, and three copies of data. Among the 3 data copies, 2 DMA copies and 1 CPU copy are included

Zero copy realized by sendfile with DMA collection copy function

  • Starting from the Linux version 2.4, the operating system provides scatter and gather SG-DMA methods to directly read data from the kernel space buffer to the network card without copying the data in the kernel space buffer to the socket buffer.

  • 1) Issue the sendfile system call, causing a context switch from user space to kernel space. Copy the contents of the disk file to the kernel space buffer through the DMA engine
  • 2) The data is not copied to the socket buffer; instead, the corresponding descriptor information is copied to the socket buffer. The descriptor contains two kinds of information: A) the memory address of the kernel buffer, B) the offset of the kernel buffer
  • 3) The sendfile system call returns, resulting in a context switch from kernel space to user space. DMA directly copies the data in the kernel buffer to the network card according to the address and offset provided by the descriptor of the socket buffer

The I/O implemented by sendfile with DMA collection and copy function uses two context switches between user space and kernel space, and two data copies, and these two data copies are non-CPU copies. In this way, we have achieved the ideal zero-copy I/O transmission, without any one-time CPU copy, and minimal context switching

Zero copy method provided by java

  • The zero-copy implementation of java NIO is based on mmap+write
  • The MappedByteBuffer generated by FileChannel's map method
    FileChannel provides the map() method, which can establish a virtual memory mapping between an open file and MappedByteBuffer. MappedByteBuffer inherits from ByteBuffer; the buffer memory is the memory mapping area of ​​a file . The bottom layer of the map method is realized by mmap, so after reading the file memory from the disk to the kernel buffer, the user space and the kernel space share the buffer. Usage is as follows
public void main(String[] args){
    try {
        FileChannel readChannel = FileChannel.open(Paths.get("./cscw.txt"), StandardOpenOption.READ);
        FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
        MappedByteBuffer data = readChannel.map(FileChannel.MapMode.READ_ONLY, 0, 1024 * 1024 * 40);
       	//数据传输
        writeChannel.write(data);
        readChannel.close();
        writeChannel.close();
    }catch (Exception e){
        System.out.println(e.getMessage());
    }
}

  • If the transferTo and transferFrom of FileChannel are supported by the bottom layer of the operating system, transferTo and transferFrom will also use related zero-copy technology to realize data transmission. Usage is as follows
public void main(String[] args) {
    try {
        FileChannel readChannel = FileChannel.open(Paths.get("./cscw.txt"), StandardOpenOption.READ);
        FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
        long len = readChannel.size();
        long position = readChannel.position();
        //数据传输
        readChannel.transferTo(position, len, writeChannel);
        //效果和transferTo 一样的
        //writeChannel.transferFrom(readChannel, position, len, );
        readChannel.close();
        writeChannel.close();
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
}

Guess you like

Origin blog.csdn.net/GYHYCX/article/details/109323153