The reason for fast read and write speeds kafka

Traditional IO | IO buffer

Traditional IO

Traditional IO is cached IO. Data replication start disk space to the kernel buffer, and then copy from the kernel buffer space into the address space of the application. Here is the kernel buffer cache pages - PageCache , virtual memory space

Read: Check the operating system kernel buffer has no data to, if already cached, then returned directly from the cache; otherwise, read from disk and then cached in the operating system's cache

Writes: copy data from user space to kernel space in the cache. Then a user program write operation has been completed, as to when re-written to disk is determined by the operating system, unless explicitly call sync sync command: sync, fsync and fdatasync

Advantages: separating the kernel space and user space, the safe operation of the protection system itself; read the disk number reduction, improved performance

Cons: Cache I / O mechanism, DMA mode data can be read directly from disk cache page, or the data directly from the page cache is written back to disk, but not directly between the application's address space and disk data transmission. Thus, data needs to be transmitted in the course of the application's address space (user space) between the buffer (kernel space) and multiple data copying operation, the data copying operation brings CPU and memory overhead is very large

Zero-copy

Linux Zero-copy is divided into: direct io, mmap (), sendfile (), splice ()

Direct IO

Direct IO

Direct IO: applications to directly access disk data without going through the kernel buffer. The aim is to reduce a copy from the kernel buffer to the user program cached data. For example, a database management system such applications, they are more likely to choose their own caching mechanism, because the database management systems tend to be more understanding of the data stored in the database than the operating system, database management systems can provide a more efficient caching mechanism to improve access performance data in the database

Advantages: reducing the number of times data by copying the operating system kernel and application address space buffers, reducing the use of reading and writing files when brought CPU and memory bandwidth occupied

Cons: Direct I / O read and write data operation will cause simultaneous read and write disk, leading to the slow implementation process. Therefore, applications that use direct I / O for data transmission when normally and use asynchronous I / O used in combination (asynchronous IO: When the threads access the data request, the thread will be followed to deal with other things, rather than blocking waiting)

MMAP

MMAP

After the application calls mmap (), the data will first be copied to the operating system kernel buffer by the DMA. Next, the application share the buffer with the operating system. Thus, the operating system kernel and the applications do not need to storage of any data copying operation.

That memory-mapped file MMAP only one page cached copy, copy files from disk to cache pages read from the page cache flush to the disk file, the default 30s when writing. MMAP dealing with the operating system Pagecache

IO regular files need to be copied twice, once copied mmap memory-mapped files, file IO is a common operation, memory-mapped file is a heap outside the inner reactor operation

SendFile

SendFile

the sendfile () system call by DMA engine copies the data file to the operating system kernel buffer, then data is copied into kernel buffers associated with the socket. Subsequently, DMA engine the data is copied from the kernel buffer to the socket protocol engine

the sendfile () system call does not need to copy data or application mapped to the address space, so the sendfile () applies only to the application address space does not require access to the case where the data processing. Such as apache, nginx web server such as static transfer files using sendfile

servletOutputStream = response.getOutputStream();

FileChannel channel = new FileInputStream(imgPath).getChannel();

channel.transferTo(0, channel.size(), Channels.newChannel(servletOutputStream));

SocketAddress sad = new InetSocketAddress(host, port);

SocketChannel sc = SocketChannel.open();

FileChannel fc = new FileInputStream(fname).getChannel();

fc.transferTo(0, fc.size(), sc);

to sum up

Commonly used to read papers read process is:
Disk -> Buffer File -> user space

mmap is:
Disk -> user space

I can see a little memory copy mmap. In addition mmap you can be mapped by a multiple user processes to the same file (also can be a virtual file) to achieve the shared memory, the effectiveness of the process of communication. Implementation process is to assign two of the same page

再说sendfile
nginx,apache都有开启sendfile的配置项,sendfile相对于传统的read+write+buffer的socket通信方式会高效一些

传统过程:
磁盘->文件缓冲区->用户空间->socket缓冲区->协议引擎

sendfile:
磁盘->文件缓冲区->socket缓冲区->协议引擎

相对于mmap,sendfile少了内存映射的环节,如果传输很大的文件,内存映射的损耗可以忽略不计,但是如果传输的文件比较小,内存映射的损耗占比就会扩大

splice:和sendfile()非常类似,用户应用程序必须拥有两个已经打开的文件描述符,一个用于表示输入设备,一个用于表示输出设备。与sendfile()不同的是,splice()允许任意两个文件之间互相连接,而并不只是文件到socket进行数据传输

对于从一个文件描述符发送数据到socket这种特例来说,一直都是使用sendfile()这个系统调用,而splice一直以来就只是一种机制,它并不仅限于sendfile()的功能。也就是说,sendfile()只是splice()的一个子集,在Linux 2.6.23中,sendfile()这种机制的实现已经没有了,但是这个API以及相应的功能还存在,只不过API以及相应的功能是利用了splice()这种机制来实现的

硬件设备跟内存通讯通过DMA完成,内存之间的copy是由cpu完成,减少内存copy可以解放cpu,提高系统负载

Kafka机制

partition 顺序写入

 

每一个Partition其实都是一个文件,收到消息后Kafka会把数据插入到文件末尾。消费者对每个Topic都有一个offset(存放ZK中)用来表示读取到了第几条数据

FileChannel fc = new RandomAccessFile("data.txt" , "rw").getChannel();

long length = fc.size(); // 设置映射区域的开始位置,这是末尾写入的重点

MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_WRITE, length, 20);

//由于要写入的字符串"Write in the end"占16字节,所以长度设置为20就足够了

mbb.put("Write in the end".getBytes()); //写入新数据

mmap写入pagecache

To achieve a direct mapping to physical memory file directly using the operating system's Page. After you complete the mapping operation of the physical memory will be synchronized to your hard drive (operating system when appropriate). That is MappedByteBuffer class

mmap file mapping will be released in full gc time. When close, you need to manually clear the memory-mapped file, call the sun.misc.Cleaner reflection method: MappedByteBuffer

MappedByteBuffer not close the lead to slower disk access

Ali RocketMQ implementation is java version of Kafka: MappedFile

sendfile read

Kafka put all the messages are stored in one file, when consumers need data Kafka directly to the "File" to consumers

Kafka is mmap as reading and writing of files, it is a file handle, so it is passed directly to the sendfile; Ye offset resolved, the user will keep their offset, the offset every request sent

to sum up

Kafka secret of speed is that it put all messages into a file. Mmap by improving I / O speed, when the write data is added at the end so that the optimum speed; when the read data is directly output violent mating sendfile

Traditional IO | IO buffer

Guess you like

Origin blog.csdn.net/Mirror_w/article/details/93399719