The principle of zero-copy and Java implementation

When talking about Kafka have to mention high-performance zero-copy. By using zero copy Kafka provides application performance greatly reduces the number of context switches between kernel and user mode. So what is the zero-copy, how to achieve zero-copy of it?

What is a zero-copy

WIKI in their definitions are as follows:

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another.

WIKI definition, we see the "zero-copy" refers to the process operation in the computer, the CPU does not need to copy data between resources consumed memory. It generally refers to the computer on the network when sending a file, the file does not need to copy the contents of the user space (User Space) is directly transmitted to the network mode in the kernel space (Kernel Space).

Zero-copy to the benefit of us

  • Reduce or even totally avoid unnecessary CPU copy, so that freed CPU to perform other tasks
  • Reduce memory bandwidth usage
  • Typically zero-copy technique is also capable of reducing context switching between operating system kernel space and user space

Zero-copy implementation

The actual realization of zero-copy no real standard, depending on how the operating system to achieve this. Zero copy entirely dependent on the operating system. Operating system support, there; do not support, do not. Java does not rely on itself.

Traditional I / O

In Java, we can read the data from the source through the InputStream data stream in a buffer, then enter them into the OutputStream in. We know that this way IO transmission efficiency is relatively low. So, when using the above-source operating system what happens:


 
Traditional IO.jpg

This is read from a disk file write process and through the socket, the system calls corresponding follows:

read(file,tmp_buf,len) write(socket,tmp_buf,len) 
  1. Program uses read () system call. State to the user by the system kernel mode (the first on-line context switch), the data on the disk with a DMA (Direct Memory Access) reads the buffer to the kernel (kernel buffer). CPU is not involved during DMA read and write data, but the hard disk DMA processor into memory via data bus directly.
  2. The system converted from kernel mode to user mode (second context switch), the program to be read when the data has been completed after the buffer into the kernel, the kernel program will buffer data, buffer write user), the process CPU needs to read and write data involved.
  3. Program uses write () system call. The system switches from user mode to kernel mode (third context switch), the user data is written from the buffer to the state of the network buffer (Socket Buffer), this process requires involvement CPU read and write data.
  4. The system switches from kernel mode to user mode (fourth context switch), the data network is transmitted through the DMA buffer to the manner of driving the card (storage buffer) of the (protocol engine)

Can be seen, the traditional I / O mode 4 will through the switching kernel mode and user mode (context switch), the process read and write data in two CPU memory. This copying process is relatively resource consuming

Memory mapping mode I / O
 
mmap.jpg
tmp_buf = mmap(file, len); write(socket, tmp_buf, len); 

This is the method used by the system call, I / O principle of this approach is that the user buffer (user buffer) and kernel memory address buffer (kernel buffer) memory address to do a mapping system that is in user mode and operating data can be directly read kernel space.

  1. mmap () system call will first use DMA way to read data to the kernel disk buffer, and then by way of memory mapping, allowing users to read kernel buffer and buffer memory addresses for the same memory address, that is not required repeat CPU read data is copied from the kernel buffer to the user buffer.
  2. When using the write () system calls, cpu kernel data buffer (identical to buffer the user) is written directly to the network transmit buffer (socket buffer), then the incoming data to the network card driver by way of DMA in ready to send.

可以看到这种内存映射的方式减少了CPU的读写次数,但是用户态到内核态的切换(上下文切换)依旧有四次,同时需要注意在进行这种内存映射的时候,有可能会出现并发线程操作同一块内存区域而导致的严重的数据不一致问题,所以需要进行合理的并发编程来解决这些问题。

通过sendfile实现的零拷贝I/O
 
sendfile.jpg
sendfile(socket, file, len); 

通过sendfile()系统调用,可以做到内核空间内部直接进行I/O传输。

  1. sendfile()系统调用也会引起用户态到内核态的切换,与内存映射方式不同的是,用户空间此时是无法看到或修改数据内容,也就是说这是一次完全意义上的数据传输过程。
  2. 从磁盘读取到内存是DMA的方式,从内核读缓冲区读取到网络发送缓冲区,依旧需要CPU参与拷贝,而从网络发送缓冲区到网卡中的缓冲区依旧是DMA方式。

依旧有一次CPU进行数据拷贝,两次用户态和内核态的切换操作,相比较于内存映射的方式有了很大的进步,但问题是程序不能对数据进行修改,而只是单纯地进行了一次数据的传输过程。

理想状态下的零拷贝I/O
 
sendfile2.jpg

依旧是系统调用sendfile()

sendfile(socket, file, len); 

可以看到,这是真正意义上的零拷贝,因为其间CPU已经不参与数据的拷贝过程,也就是说完全通过其他硬件和中断的方式来实现数据的读写过程吗,但是这样的过程需要硬件的支持才能实现。

借助于硬件上的帮助,我们是可以办到的。之前我们是把页缓存的数据拷贝到socket缓存中,实际上,我们仅仅需要把缓冲区描述符传到socket缓冲区,再把数据长度传过去,这样DMA控制器直接将页缓存中的数据打包发送到网络中就可以了。

  1. 系统调用sendfile()发起后,磁盘数据通过DMA方式读取到内核缓冲区,内核缓冲区中的数据通过DMA聚合网络缓冲区,然后一齐发送到网卡中。

可以看到在这种模式下,是没有一次CPU进行数据拷贝的,所以就做到了真正意义上的零拷贝,虽然和前一种是同一个系统调用,但是这种模式实现起来需要硬件的支持,但对于基于操作系统的用户来讲,操作系统已经屏蔽了这种差异,它会根据不同的硬件平台来实现这个系统调用

Java的实现

NIO的零拷贝
  File file = new File("test.zip"); RandomAccessFile raf = new RandomAccessFile(file, "rw"); FileChannel fileChannel = raf.getChannel(); SocketChannel socketChannel = SocketChannel.open(new InetSocketAddress("", 1234)); // 直接使用了transferTo()进行通道间的数据传输 fileChannel.transferTo(0, fileChannel.size(), socketChannel); 

NIO的零拷贝由transferTo()方法实现。transferTo()方法将数据从FileChannel对象传送到可写的字节通道(如Socket Channel等)。在内部实现中,由native方法transferTo0()来实现,它依赖底层操作系统的支持。在UNIX和Linux系统中,调用这个方法将会引起sendfile()系统调用。

使用场景一般是:

  • 较大,读写较慢,追求速度
  • M内存不足,不能加载太大数据
  • 带宽不够,即存在其他程序或线程存在大量的IO操作,导致带宽本来就小

以上都建立在不需要进行数据文件操作的情况下,如果既需要这样的速度,也需要进行数据操作怎么办?
那么使用NIO的直接内存!

NIO的直接内存
  File file = new File("test.zip"); RandomAccessFile raf = new RandomAccessFile(file, "rw"); FileChannel fileChannel = raf.getChannel(); MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size()); 

首先,它的作用位置处于传统IO(BIO)与零拷贝之间,为何这么说?

  • IO,可以把磁盘的文件经过内核空间,读到JVM空间,然后进行各种操作,最后再写到磁盘或是发送到网络,效率较慢但支持数据文件操作。
  • 零拷贝则是直接在内核空间完成文件读取并转到磁盘(或发送到网络)。由于它没有读取文件数据到JVM这一环,因此程序无法操作该文件数据,尽管效率很高!

而直接内存则介于两者之间,效率一般且可操作文件数据。直接内存(mmap技术)将文件直接映射到内核空间的内存,返回==一个操作地址(address)==,它解决了文件数据需要拷贝到JVM才能进行操作的窘境。而是直接在内核空间直接进行操作,省去了内核空间拷贝到用户空间这一步操作。

NIO direct memory by == MappedByteBuffer == achieve. The core that is map () method to map the file into memory, the memory address addr is obtained, then this class MappedByteBuffer addr configured to expose various file operations API.

Because MappedByteBuffer application is outside the heap memory, and therefore not subject to Minor GC control it can only be recovered in the event of Full GC. The == DirectByteBuffer == improve this situation, it is a subclass of class MappedByteBuffer, while it implements DirectBuffer interface maintains a Cleaner objects to complete memory recovery. It can thus be recovered by memory Full GC, you can call clean () method is recovered.

Further, direct memory size can be set by jvm parameters: -XX: MaxDirectMemorySize.

NIO's MappedByteBuffer had a sister named HeapByteBuffer. As the name suggests, it is used to apply for heap memory, it is essentially an array. Because it is located in the stack, thus by GC control, easy to recycle.

Reference
https://blog.csdn.net/localhost01/article/details/83422888
https://blog.csdn.net/cringkong/article/details/80274148



Author: mountaineering passenger
link: https: //www.jianshu.com/p/497e7640b57c
Source: Jane books
are copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Guess you like

Origin www.cnblogs.com/eryun/p/12088001.html