KAFKA: How did one second release one million messages

http://rdcqii.hundsun.com/portal/article/709.html

KAFKA is distributed publish - subscribe messaging system is a distributed, divided, redundant backup of persistent logging service. It is mainly used for processing the data stream active.

The scene is now widely used in building and real-time data streaming applications in the pipeline, with a scale, fault-tolerant, fast, etc., and has run a number of medium-sized companies in the production environment, successfully applied to large areas of data, share this article I know KAFKA.

1 KAFKA high throughput performance secret

The first highlight specific KAFKA is "fast", and is the kind of metamorphosis "fast" in the ordinary cheap virtual machine, such as general SAS disk to do a virtual machine, according to LINDEDIN statistics, the latest data is to use every day KAFKA messages processed more than 1 trillion, at the peak per second will be released over one million messages, even under high memory and CPU is not the case, Kafka can speed up reached 100,000 pieces of data per second, and also persistent storage.

As the message queue, to be read with write to undertake two functions, the first is written, is written to the message log KAFKA, then, KAFKA in the "write" on how to do is to write fast perverted it?

1.1 KAFKA let's write the code fly fast

First, the production may be used to provide an end API KAFKA publish a message to one or more Topic (subject) (guaranteed sequence data) or a plurality of partitions (parallel processing, but does not guarantee that the data sequence). Topic data can be simply understood as a category, it is used to distinguish the data.

KAFKA maintains a log partition Topic in order to write sequential manner additional message to each partition, each partition is immutable message queue. Partition messages are present in the form kv. 
▪ k represents the offset, called offset, a 64-bit integer that uniquely identifies, offset represents all Topic partition start byte position of the message flow of the message. 
▪ v is the actual message content, each partition is unique for each offset exists, all partitions messages are write-once, you can adjust the offset before the message has not expired to achieve multiple reads.

The above-mentioned KAFKA "fast" The first factor: the order of messages written to disk.

We know that most of the disk or mechanical structure (not discussed within the scope SSD), if the message is written in a random manner to disk, it will press cylinders, heads, sectors manner (addressing sequence) slow mechanical motion (relative memory) will consume a lot of time, resulting in a write speed of the disk can only reach a few hundred millionths of memory write speed, random write in order to avoid time consuming to bring, KAFKA take written order stored data, as shown below:

Write pictures described here

New messages can only be appended to the end of the existing message, and the message does not have production support random deletion and random access, consumers can access the data had been consumed by resetting the offset of the way.

Even sequential read and write, too often a large number of small I / O operations will result in the same disk bottleneck, so KAFKA process here is to put together the messages sent in bulk, thus reducing excessive read and write disk IO, rather than sending a single message.

Another byte is inefficient replication, in particular the impact load is relatively high is significant. To avoid this situation, by the use KAFKA Producer, broker shared and standardized consumer binary message format, so that data blocks can be freely transferred between them, without conversion, reducing the cost of overhead bytes copied.

Meanwhile, KAFKA using MMAP (Memory Mapped Files, memory-mapped files) technology. Many modern operating systems make extensive use of main memory disk cache, a modern operating system can put all the remaining space in the memory as a disk cache, memory and when the recovery is almost no performance penalty.

Since KAFKA is based on the JVM, and anyone who had dealings with the Java memory usage have to know two things: 
▪ object memory overhead is very high, often twice the size of the actual data to be stored; 
▪ With the increase of data, java garbage collection will be more frequent and slow.

Based on this, the file system, while relying page cache than the other data structures and maintaining cache memory more attractive: 
▪ do not use in-process cache, you free up memory space, the space can be used to store the page cache almost doubled. 
▪ If KAFKA restart, the buffer will be lost, but using the operating system's page cache can still continue to use.

Some may ask KAFKA so frequently use the page cache, if memory size is not enough how to do? 
KAFKA will write data to the persistent log rather than flushed to disk. It is actually just moved to the page cache kernel.

Use the file system cache pages and rely on maintenance than a memory cache or other structure is better, it can be directly mapped directly to implement the file to physical memory using the operating system's page cache. After completion of the mapping operation of the physical memory at the appropriate time will be synchronized to the hard disk.

1.2 KAFKA let's read the code fly faster

KAFKA except when the received data is written quickly, another feature that is made faster when the push data.

KAFKA this message queues push and pull of the approach taken in the production side and the consumer side, respectively, that is, you can consider the production side KAFKA is a bottomless pit, how much data can be entered, push hard, the consumer side is according to their spending power, how much data do you pull yourself up KAFKA here, KAFKA to ensure that as long as there are data, how much the consumer end, all he can come and get your own.

▲ zero-copy

Specific to the message floor preservation, broker message log maintained by itself directory files, each file is saved in binary, producers and consumers to use the same format to deal with. Maintain the public's format and allows to optimize the most important operations: network transmission persistent log block. Modern operating systems provide a unix optimized code path, for transmitting data from the page buffer to Socket; in Linux, a system call by sendfile done. Java provides access to this system call: FileChannel.transferTo API.

To understand the impact senfile, it is important to understand the file to transfer data from the common data path of the socket, as shown below, data from the disk to the socket to go through the following steps:

Write pictures described here

▪ 操作系统将数据从磁盘读入到内核空间的页缓存 
▪ 应用程序将数据从内核空间读入到用户空间缓存中 
▪ 应用程序将数据写回到内核空间到socket缓存中 
▪ 操作系统将数据从socket缓冲区复制到网卡缓冲区,以便将数据经网络发出

这里有四次拷贝,两次系统调用,这是非常低效的做法。如果使用sendfile,只需要一次拷贝就行:允许操作系统将数据直接从页缓存发送到网络上。所以在这个优化的路径中,只有最后一步将数据拷贝到网卡缓存中是需要的。

Write pictures described here

常规文件传输和zeroCopy方式的性能对比:

Write pictures described here

假设一个Topic有多个消费者的情况, 并使用上面的零拷贝优化,数据被复制到页缓存中一次,并在每个消费上重复使用,而不是存储在存储器中,也不在每次读取时复制到用户空间。 这使得以接近网络连接限制的速度消费消息。

这种页缓存和sendfile组合,意味着KAFKA集群的消费者大多数都完全从缓存消费消息,而磁盘没有任何读取活动。

▲批量压缩

在很多情况下,系统的瓶颈不是CPU或磁盘,而是网络带宽,对于需要在广域网上的数据中心之间发送消息的数据流水线尤其如此。所以数据压缩就很重要。可以每个消息都压缩,但是压缩率相对很低。所以KAFKA使用了批量压缩,即将多个消息一起压缩而不是单个消息压缩。

KAFKA允许使用递归的消息集合,批量的消息可以通过压缩的形式传输并且在日志中也可以保持压缩格式,直到被消费者解压缩。

KAFKA支持Gzip和Snappy压缩协议。

2 KAFKA数据可靠性深度解读

Write pictures described here

KAFKA的消息保存在Topic中,Topic可分为多个分区,为保证数据的安全性,每个分区又有多个Replia。

▪ 多分区的设计的特点: 
1.为了并发读写,加快读写速度; 
2.是利用多分区的存储,利于数据的均衡; 
3.是为了加快数据的恢复速率,一但某台机器挂了,整个集群只需要恢复一部分数据,可加快故障恢复的时间。

Write pictures described here

每个Partition分为多个Segment,每个Segment有.log和.index 两个文件,每个log文件承载具体的数据,每条消息都有一个递增的offset,Index文件是对log文件的索引,Consumer查找offset时使用的是二分法根据文件名去定位到哪个Segment,然后解析msg,匹配到对应的offset的msg。

2.1 Partition recovery过程

每个Partition会在磁盘记录一个RecoveryPoint,,记录已经flush到磁盘的最大offset。当broker 失败重启时,会进行loadLogs。首先会读取该Partition的RecoveryPoint,找到包含RecoveryPoint的segment及以后的segment, 这些segment就是可能没有完全flush到磁盘segments。然后调用segment的recover,重新读取各个segment的msg,并重建索引。每次重启KAFKA的broker时,都可以在输出的日志看到重建各个索引的过程。

2.2 数据同步

Producer和Consumer都只与Leader交互,每个Follower从Leader拉取数据进行同步。

Write pictures described here

Write pictures described here

如上图所示,ISR是所有不落后的replica集合,不落后有两层含义:距离上次FetchRequest的时间不大于某一个值或落后的消息数不大于某一个值,Leader失败后会从ISR中随机选取一个Follower做Leader,该过程对用户是透明的。

当Producer向Broker发送数据时,可以通过request.required.acks参数设置数据可靠性的级别。

This configuration is shown that when a request is considered confirmed Producer completion value. In particular, how many other brokers must have submitted their data to the log and their Leader to confirm this information.

▪ Typical values: 
0: Producer never wait for confirmation from the broker. This option provides minimal latency but at the same time the greatest risk (because when the server goes down, data will be lost). 
1: Leader replica obtained indicates the acknowledgment has been received information data. The selection is small delay while ensuring that the server acknowledge receipt of success. 
-1: Producer get all synchronous replicas have received confirmation of the data. At the same time the largest delay, however, this approach does not completely eliminate the risk of lost messages, because the number of synchronous replicas may be 1. If you want to make sure some replicas received the data, then you should look at the option min.insync.replicas set Topic-level settings.

Set only acks = -1 can not guarantee that data is not lost when the ISR list only Leader, the same may cause data loss. To ensure that data is not lost in addition to setting acks = -1, but also to ensure equal size ISR is greater than 2.

▪ the specific parameter settings: 
request.required.acks: all set to -1 waiting list ISR Replica message received after the write succeeds mining operators. 
min.insync.replicas: to> = 2, to ensure that at least two ISR Replica. 
Producer: to make a trade-off between throughput and data reliability.

KAFKA as a modern leader in messaging middleware, with its high speed and reliability to win the majority of users and the market, many of the design concepts are very worthy of our study, described in this article is just tip of the iceberg, hoping to for us to know KAFKA have a certain role.

Guess you like

Origin www.cnblogs.com/lenmom/p/12013475.html