Zero-copy presentation

Foreword

Understood from the literal meaning is to copy data back and forth is not required, greatly enhance the performance of the system; the word we often hear in java nio, netty, kafka, RocketMQ and other frameworks, often as a highlight of its performance improvement; below several concepts from I / O start, zero copy further analysis.

I / O concept

1. Buffer

The buffer is the basis of all of I / O, I / O is nothing more than speaking into or out of the data buffer; process performs I / O operations, a request is sent to the operating system, it is either the data buffer drained (write), or to fill the buffer (read); see below java process initiates a data read request to load a flowchart roughly:

after initiating the process after the read request, the read request is received by the core, it will first check if the process already exists in the kernel space the required data, if it already exists, then copy the data directly to the buffer process; if there is no kernel then issue commands to the disk controller, requires reading data from the disk, the disk controller to read data directly into the kernel buffer this step by DMA completion; the next step is to copy the kernel data buffer process;
if the process initiated write requests, users also need to copy data to a buffer inside the kernel socket buffer inside, and then through the DMA data copy the network card and sends them out;
you might think this is quite a waste of space, each time data needed to copy the kernel space User space, so the zero-copy appears to solve this problem;
on zero-copy provides two ways are: mmap + write mode, sendfile way;

2. Virtual Memory

All modern operating systems use virtual memory, virtual address to use to replace the physical address, the benefits of doing so are:
more than 1. A virtual address can point to the same physical memory address,
2. virtual memory space can be greater than the actual physical address available ;
using the first characteristic may be mapped virtual address space of the kernel and user address space to the same physical address, so that it can be filled to the DMA kernel and user space processes simultaneously visible to a buffer substantially as shown below:

omitted the dealings copy the kernel and user space, java also take advantage of this feature of the operating system to improve performance, look at the following java focus on what support has zero copy.

3.mmap + write mode

Use mmap + write mode instead of the original read + write mode, mmap is a method of memory-mapped files, a file or other objects coming mapped into the process address space, file and disk address in the process virtual address space for a virtual address eleven pairs of enantiomeric relationship; so save the original kernel can copy read buffer data to the user buffer, but still requires the kernel to read data copy buffer into the kernel socket buffer, substantially as shown below:

4.sendfile way

sendfile system call kernel version 2.1 is introduced, the purpose is to simplify the process of data transfer carried out through a network between the two channels. introducing sendfile system call, not only reduces the replication of data, also reduces context switching, substantially as shown below:

the data transfer occurs only in the kernel space, a context switch is reduced; however, there is still a Copy, Can this time copy can also be omitted, Linux2.4 kernel has been improved, the kernel buffer corresponding data description information (memory address, offset) corresponding to the recorded among the socket buffer, so that even a kernel space cpu also saved a copy;

Java zero-copy

1.MappedByteBuffer

FileChannel java nio offer provides a map () method, which can create a virtual memory mapping between an open file and MappedByteBuffer, MappedByteBuffer inherited from ByteBuffer, similar to a memory-based buffer, but the object's data elements stored in a file on disk; call to get () method gets the data from the disk, the data reflect the current contents of the file, call the put () method updates the file on disk, and file modifications made to other reader is also visible; see the following example a simple reading, and then analyzed for MappedByteBuffer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class MappedByteBufferTest {
 
    public static void main(String[] args) throws Exception {
        File file = new File("D://db.txt");
        long len = file.length();
        byte[] ds = new byte[(int) len];
        MappedByteBuffer mappedByteBuffer = new FileInputStream(file).getChannel().map(FileChannel.MapMode.READ_ONLY, 0,
                len);
        for (int offset = 0; offset < len; offset++) {
            byte b = mappedByteBuffer.get();
            ds[offset] = b;
        }
        Scanner scan = new Scanner(new ByteArrayInputStream(ds)).useDelimiter(" ");
        while (scan.hasNext()) {
            System.out.print(scan.next() + " ");
        }
    }
}

Map FileChannel provided mainly by () to achieve map, map () method is as follows:

1
2
3
    public abstract MappedByteBuffer map(MapMode mode,
                                         long position, long size)
        throws IOException;

Three parameters are provided, MapMode, Position and size; respectively represent:
mapmode: mapping modes, options include: READ_ONLY, READ_WRITE, PRIVATE;
the Position: mapped starting position from which the position number of bytes;
Size: from position start back how many bytes;

Focus look MapMode, please indicate two read-only and read-write, and of course the requested mapping mode by access restrictions Filechannel object, if enabled READ_ONLY on a file does not have read permission, will throw NonReadableChannelException; PRIVATE mode represents copy-on-write mapping means through put () method will result in any changes made to produce a private copy of the data and the data of the copy only MappedByteBuffer example can be seen; this process does not make any changes to the underlying file , and once the buffer is subjected to the action of garbage collection (garbage collected), those changes will be lost; a quick look at the source map () method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
    public MappedByteBuffer map(MapMode mode, long position, long size)
        throws IOException
    {
            ...省略...
            int pagePosition = (int)(position % allocationGranularity);
            long mapPosition = position - pagePosition;
            long mapSize = size + pagePosition;
            try {
                // If no exception was thrown from map0, the address is valid
                addr = map0(imode, mapPosition, mapSize);
            } catch (OutOfMemoryError x) {
                // An OutOfMemoryError may indicate that we've exhausted memory
                // so force gc and re-attempt map
                System.gc();
                try {
                    Thread.sleep(100);
                } catch (InterruptedException y) {
                    Thread.currentThread().interrupt();
                }
                try {
                    addr = map0(imode, mapPosition, mapSize);
                } catch (OutOfMemoryError y) {
                    // After a second OOME, fail
                    throw new IOException("Map failed", y);
                }
            }
 
            // On Windows, and potentially other platforms, we need an open
            // file descriptor for some mapping operations.
            FileDescriptor mfd;
            try {
                mfd = nd.duplicateForMapping(fd);
            } catch (IOException ioe) {
                unmap0(addr, mapSize);
                throw ioe;
            }
 
            assert (IOStatus.checkAll(addr));
            assert (addr % allocationGranularity == 0);
            int isize = (int)size;
            Unmapper um = new Unmapper(addr, mapSize, isize, mfd);
            if ((!writable) || (imode == MAP_RO)) {
                return Util.newMappedByteBufferR(isize,
                                                 addr + pagePosition,
                                                 mfd,
                                                 um);
            } else {
                return Util.newMappedByteBuffer(isize,
                                                addr + pagePosition,
                                                mfd,
                                                um);
            }
     }

Roughly meaning acquired by the native method address memory-mapped, and if that fails, to manually re-mapping gc; the last address of the memory mapped by instantiating a MappedByteBuffer, MappedByteBuffer itself is an abstract class, in fact, there is a real example of words out of DirectByteBuffer;

2.DirectByteBuffer

DirectByteBuffer inherited MappedByteBuffer, you can guess from the name opens up some direct memory, does not occupy jvm memory space; the one mapped out by Filechannel MappedByteBuffer actual DirectByteBuffer also, of course, in addition to this way, you can manually open up some space:

1
ByteBuffer directByteBuffer = ByteBuffer.allocateDirect(100);

As opened direct memory space of 100 bytes;

3.Channel-to-Channel transport

Often need to transfer files from one location to another location, provided FileChannel transferTo () method is used to increase the efficiency of the transmission, first look at a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class ChannelTransfer {
    public static void main(String[] argv) throws Exception {
        String files[]=new String[1];
        files[0]="D://db.txt";
        catFiles(Channels.newChannel(System.out), files);
    }
 
    private static void catFiles(WritableByteChannel target, String[] files)
            throws Exception {
        for (int i = 0; i < files.length; i++) {
            FileInputStream fis = new FileInputStream(files[i]);
            FileChannel channel = fis.getChannel();
            channel.transferTo(0, channel.size(), target);
            channel.close();
            fis.close();
        }
    }
}

通过FileChannel的transferTo()方法将文件数据传输到System.out通道,接口定义如下:

1
2
3
    public abstract long transferTo(long position, long count,
                                    WritableByteChannel target)
        throws IOException;

几个参数也比较好理解,分别是开始传输的位置,传输的字节数,以及目标通道;transferTo()允许将一个通道交叉连接到另一个通道,而不需要一个中间缓冲区来传递数据;
注:这里不需要中间缓冲区有两层意思:第一层不需要用户空间缓冲区来拷贝内核缓冲区,另外一层两个通道都有自己的内核缓冲区,两个内核缓冲区也可以做到无需拷贝数据;

Netty零拷贝

netty提供了零拷贝的buffer,在传输数据时,最终处理的数据会需要对单个传输的报文,进行组合和拆分,Nio原生的ByteBuffer无法做到,netty通过提供的Composite(组合)和Slice(拆分)两种buffer来实现零拷贝;看下面一张图会比较清晰:

TCP层HTTP报文被分成了两个ChannelBuffer,这两个Buffer对我们上层的逻辑(HTTP处理)是没有意义的。 但是两个ChannelBuffer被组合起来,就成为了一个有意义的HTTP报文,这个报文对应的ChannelBuffer,才是能称之为”Message”的东西,这里用到了一个词”Virtual Buffer”。
可以看一下netty提供的CompositeChannelBuffer源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
public class CompositeChannelBuffer extends AbstractChannelBuffer {
 
    private final ByteOrder order;
    private ChannelBuffer[] components;
    private int[] indices;
    private int lastAccessedComponentId;
    private final boolean gathering;
 
    public byte getByte(int index) {
        int componentId = componentId(index);
        return components[componentId].getByte(index - indices[componentId]);
    }
    ...省略...

components用来保存的就是所有接收到的buffer,indices记录每个buffer的起始位置,lastAccessedComponentId记录上一次访问的ComponentId;CompositeChannelBuffer并不会开辟新的内存并直接复制所有ChannelBuffer内容,而是直接保存了所有ChannelBuffer的引用,并在子ChannelBuffer里进行读写,实现了零拷贝。

其他零拷贝

RocketMQ的消息采用顺序写到commitlog文件,然后利用consume queue文件作为索引;RocketMQ采用零拷贝mmap+write的方式来回应Consumer的请求;
同样kafka中存在大量的网络数据持久化到磁盘和磁盘文件通过网络发送的过程,kafka使用了sendfile零拷贝方式;

总结

零拷贝如果简单用java里面对象的概率来理解的话,其实就是使用的都是对象的引用,每个引用对象的地方对其改变就都能改变此对象,永远只存在一份对象。高质量编程视频shangyepingtai.xin

发布了122 篇原创文章 · 获赞 47 · 访问量 3万+

Guess you like

Origin blog.csdn.net/fengzongfu/article/details/105341878