Reasons for Kafka's high throughput

1. Sequential reading and writing

Kafka's messages are continuously appended to the file. This feature enables Kafka to make full use of the sequential read and write performance of the disk.

Sequential read and write does not require the seek time of the hard disk head, and only requires a small sector rotation time, so the speed is much faster than random read and write

2. Zero copy

After Linux kernel2.2, a system call mechanism called "zero-copy" appeared, which skips the copy of "user buffer" and establishes a direct mapping between disk space and memory, and the data is no longer copied to "User Mode Buffer"
insert image description here
insert image description here

3. Partition

The content in the topic in afka can be divided into multiple partitions, and each partition is divided into multiple segments, so each operation is performed on a small part, which is very portable and increases the ability of parallel operations
insert image description here

4. Batch sending

Kafka allows sending messages in batches. When the producer sends a message, it can cache the message locally and wait until the fixed condition is sent to Kafka.

  1. Wait for the number of messages to reach a fixed number
  2. send once in a while

5. Data compression

Kafka also supports the compression of message collections. Producer can compress message collections in GZIP or Snappy format. The
advantage of compression is to reduce the amount of transmitted data and reduce the pressure on network transmission.

Batch sending and data compression are used together. If data compression is performed on a single piece, the effect is not obvious

Guess you like

Origin blog.csdn.net/MortShi/article/details/123132845