Why Kafka speed so fast?

Kafka message is stored or cached on disk, it is generally believed that read and write data on the disk will reduce performance because addressing would be more time-consuming, but in fact, one of Kafka's characteristics is the high throughput.

Even an ordinary server, Kafka can easily support for millions of write requests per second level, more than most of the messaging middleware, this feature also makes Kafka widely used in log processing huge amounts of data such as scenes.

For benchmark Kafka can refer to, Apache Kafka benchmarks: write 2 million per second (on three low-cost machines)

The following data from the analysis of both writing and reading, why Kafka so fast.

data input

Kafka will receive the message are written to the hard disk, it will not lose data. To optimize the write speed Kafka uses two techniques, sequential write and MMFile.

Sequential Write

Disk read and write speed depends on how you use it, that is sequential read and write or random read and write. In the case of sequential read and write, sequential read and write speed and memory disk unchanged.

Because the hard disk is a mechanical structure, are each read and write address -> Write, wherein the address is a "mechanical action" it is the most time consuming. So the hard disk hate random I / O, like most sequential I / O. In order to improve read and write speed of the hard disk, Kafka is to use sequential I / O.

For Linux and optimize disk read and write will be more, including read-ahead and write-behind, disk caching. If the time to do these operations in memory, a great memory overhead JAVA object, the other is the heap memory with the increase of data, JAVA of GC become very long time, use a disk operation has the following benefits:

Sequential read and write disk exceeds Random Access Memory
JVM GC of low efficiency, large memory footprint. Use Disk avoid this problem
After a cold start system, disk cache is still available

The following figure shows how Kafka write data to each Partition in fact a document after receiving the message data will Kafka inserted end of the file (virtual frame section):

This method has a drawback - there is no way to delete data, Kafka will not delete data, it will all the data retained, each consumer (Consumer) for each Topic has an offset is used to indicate read the first few data.

Two consumers:

There are two offset corresponding Consumer1 Partition0, Partition1 (assuming a Topic Partition each);
Consumer2 offset corresponding to a Partition2.

This offset is responsible for keeping the client SDK, Kafka's Broker completely ignore this thing; under normal circumstances SDK will save it Zookeeper inside, so it is necessary to provide the address zookeeper Consumer.

If you do not remove the hard disk will certainly be full, so Kakfa provides two strategies to delete the data:

One is based on the time;
The second is based on partition file size.

Specific configuration can be found in its configuration file.

Memory Mapped Files

Even if the order is written to the hard disk access speed is still impossible to catch memory. So Kafka is not a real-time data is written to disk, it takes full advantage of modern operating system paging memory to increase the use of memory I / O efficiency.

Memory Mapped Files (later referred to as the mmap) is also translated into the memory-mapped files, the 64-bit operating systems generally can represent a data file 20G, its working principle is the direct use of the operating system Page achieved file direct mapping to physical memory .

After you complete the mapping operation of the physical memory will be synchronized to your hard drive (operating system when appropriate).

By mmap, the process of reading and writing as hard as read-write memory (of course, the virtual machine memory), do not have to worry about memory size of virtual memory for us to reveal all the details.

This way you can get a lot of I / O upgrade, eliminating the overhead of user space to kernel space to copy (read data will call the first file into the kernel memory space, and then copied to the user memory space in.)

But there is an obvious flaw - unreliable data written in the mmap was not really written to the hard disk, the operating system writes real hard when the program automatically call only then flush the data.

Kafka provides a parameter to control is not active --producer.type flush, if Kafka written after the mmap immediately flush before returning Producer called synchronization (sync); return immediately after writing mmap Producer asynchronous call does not call the flush ( async).

Read data

Kafka did what optimization when reading the disk?

Based sendfile achieve Zero Copy

The traditional mode, when a file needs to be transmitted, the specific process details are as follows:

Call the read function, the file data is to copy the kernel buffer
read function returns, the file copy data from the kernel buffer to the user buffer
write function call, the file copy data from the user buffer to the kernel buffer associated with the socket.
Copy data from the socket buffer to the relevant protocol engine.

More detail is the traditional way of read / write mode for network file transfer, we can see that in this process, the data file is actually a copy through four actions:

Hard Drive -> kernel buf-> User buf-> socket buffer related -> protocol engine

The sendfile system call provides a method of reducing the above repeatedly copy, improve the performance of file transfer method.

Version 2.1 kernel, sendfile system call is introduced to simplify the data transmission between the network and two local files. Sendfile introduced not only reduces the replication of data, also reduces context switching.

sendfile(socket, file, len);
复制代码

Run process is as follows:

sendfile system call, file data is to copy the kernel buffer
And then related to the kernel socket buffers from the kernel buffer copy
Finally socket buffer copy to the relevant protocol engine

Compared to traditional read / write mode, the introduction of the 2.1 version of the kernel sendfile kernel buffer has been reduced to the user buffer, and then copy the file from the user to the socket buffer associated buffer, and after the kernel version 2.4, the file descriptor results It is changed, sendfile to achieve a more simple way, once again reducing the copy operation .

Among Apache, Nginx, lighttpd web server and so on, have a sendfile-related configuration, use sendfile can significantly improve file transfer performance.

Kafka all messages are stored in a file of a , when the consumer needs to send data directly Kafka files to the consumer, as a file with read-write mode mmap, pass it directly sendfile.

Batch compression

In many cases, the bottleneck is not CPU or disk, but the network IO, especially the need to send messages between the data center WAN data lines. Data compression consumes a small amount of CPU resources, but for kafka, network IO should be considered.

If each message is compressed, but the compression rate is relatively low, so Kafka uses a batch compression, compression coming together multiple messages rather than a single message compression
Kafka allows recursive set of messages, message volume can be transmitted in the form compressed by a log and can be kept in a compressed format, the consumer is compressed until the solution
Kafka supports multiple compression protocols, including Gzip compression protocol and Snappy

to sum up

Kafka speed secret that it all the messages into a batch file, and reasonable bulk compression to reduce network IO loss, by improving mmap I / O speed, when the write data is added at the end due to the single Partion Therefore, the optimum speed; when reading data with direct output sendfile violence.

kingmax54212008

Published 900 original articles · won praise 387 · Views 2.79 million +

His message board concerns