See original: https://www.jianshu.com/p/3e54a5a39683
Message middleware -Kafka data store (a)
Abstract: The message store is very important for each one message queue, then Kafka in this regard is how to design efficient to do it?
Kafka The distributed message queue using the file system and operating system page cache (page cache) and cache messages are stored, abandoned the Java heap caching mechanism, while random writes changed the order of writing, combined with the Zero-Copy feature greatly improves IO performance. Instituted disk's file system, I believe that many students know about the hard disk to store all know: "linear writing speed of a SATA RAID-5 disk array can reach hundreds of M / s, and random write speed is only 100 KB / s, linear writing speed is randomly written a thousand times, "which can be seen on the disk write speed key is speed message depends on our use. In view of this, Kafka design is based data storage implemented on the basis of the additional files, since the sequence is added, to provide a persistent message by O (1) The disk data structures, and even in such a structure for tens of TB-level message store can maintain stable performance over time. In the ideal case, as long as the disk space is large enough to have been able to append messages. In addition, Kafka can be configured to let users decide for themselves already place orders persistence time saved messages, message processing to provide a more flexible way. This article will focus on the message storage structure, storage of data in Kafka and how to find a message via offset and so on.
A, Kafka introduces several important concepts
(1) Broker: messaging middleware processing node, a node is a Broker Kafka, one or more Broker cluster may be composed of a Kafka;
(2) Topic: abstract theme classification of a set of messages, such as for example a log page view, click log and so can be divided into abstract categories in the form of topic. Physically, Topic different messages stored separately, a Topic logically stored in the message while the message, but the user need only specify Topic to such producers or consumers do not care about the data on one or more of the data stored in the broker HE at;
(. 3) the partition: each subject is further divided into one or several partitions (partition). Each partition corresponds to a folder on a local disk, the partition naming rule is then subject name "-" connector, and then followed by the division number, the division number from zero to the total number of partitions minus -1;
(. 4) logsegment: Per partitions are divided into a plurality of segments logs (logsegment) composition, the log segment is a minimum unit slice Kafka log object; logsegment be a logical concept, corresponding to a particular log file ( ".log" data file), and two index files ( ".index" and ".timeindex", respectively, and the offset index file message timestamp index file);
(. 5) offset: each partition in an ordered sequence by the immutable message composition, these messages are sequentially added to the partition. Each message has a sequence number known as continuous offset- offset for the partition uniquely identifies the message (the message does not indicate the physical location on the disk);
(. 6) the Message: the message is stored in Kafka smallest basic unit, that is, a commit log, the message header and a variable length message body composed of a fixed length;
Two, Kafka log data storage structure
Kafka messages based on the theme (Topic) as the basic organizational unit, independent of each other between the various topics. Here an abstract theme only logical, and in the actual data file is stored in a message store is physically Kafka or more partitions (the Partition) configuration, each partition corresponding to a file on the local disk folders, each folder contains log index file ( ".index" and ".timeindex") data and log files ( ".log") in two parts. The number of partitions you can specify when you create the theme, can also be modified after creation Topic. (Ps: Partition Topic number can only increase and not decrease, this is beyond the scope of this reduction in space, we can think at first).
In Kafka used precisely because of the design model partition (the Partition), the broken up into a plurality of partitions by topic (Topic) message, and save the distribution process to achieve high throughput of messages on different nodes Kafka Broker. Producers and consumers which can operate in parallel multi-thread, and each thread is processing a data partition.
Meanwhile, Kafka in order to achieve high availability cluster, may be provided in each Partition with one or more copies (Replica), a copy of the partition Broker distributed on different nodes. At the same time, from a copy of the copy will be selected as a Leader, Leader responsible for reading and writing with a copy of the client. The other copies will synchronize data from the replica as Leader Follower.
Log files are stored in the analysis 2.1Kafka partitions / replicas
After the completion of Kafka set up a cluster on three virtual machines (Kafka Broker node number is three), you can create a theme and by a specified number of partitions, and copy the following commands in Kafka Broker node / bin:
./kafka-topics.sh --create --zookeeper 10.154.0.73:2181 --replication-factor 3 --partitions 3 --topic kafka-topic-01
After you create a theme, partitions and copies can be found in the state of the subject matter (the way the main topic lists all partitions corresponding copy and ISR list information):
./kafka-topics.sh --describe --zookeeper 10.154.0.73:2181 --topic kafka-topic-01
Topic:kafka-topic-01 PartitionCount:3 ReplicationFactor:3 Configs:
Topic: kafka-topic-01 Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
Topic: kafka-topic-01 Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,1,0
Topic: kafka-topic-01 Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 1,2,0
By implementing a simple Kafka Producer the demo, to complete the message sent by the producer to Kafka Broker functionality. After using Producer generate a lot of messages, you can see three virtual machines deployed cluster directory there are three partitions in Kafka's config / server.properties configuration file specified log data storage directory "log.dirs", at the same time each partition presence directory number corresponding to the log file and log data of the index file, as follows:
#1、分区目录文件
drwxr-x--- 2 root root 4096 Jul 26 19:35 kafka-topic-01-0
drwxr-x--- 2 root root 4096 Jul 24 20:15 kafka-topic-01-1
drwxr-x--- 2 root root 4096 Jul 24 20:15 kafka-topic-01-2
#2、分区目录中的日志数据文件和日志索引文件
-rw-r----- 1 root root 512K Jul 24 19:51 00000000000000000000.index
-rw-r----- 1 root root 1.0G Jul 24 19:51 00000000000000000000.log -rw-r----- 1 root root 768K Jul 24 19:51 00000000000000000000.timeindex -rw-r----- 1 root root 512K Jul 24 20:03 00000000000022372103.index -rw-r----- 1 root root 1.0G Jul 24 20:03 00000000000022372103.log -rw-r----- 1 root root 768K Jul 24 20:03 00000000000022372103.timeindex -rw-r----- 1 root root 512K Jul 24 20:15 00000000000044744987.index -rw-r----- 1 root root 1.0G Jul 24 20:15 00000000000044744987.log -rw-r----- 1 root root 767K Jul 24 20:15 00000000000044744987.timeindex -rw-r----- 1 root root 10M Jul 24 20:21 00000000000067117761.index -rw-r----- 1 root root 511M Jul 24 20:21 00000000000067117761.log -rw-r----- 1 root root 10M Jul 24 20:21 00000000000067117761.timeindex
As can be seen from the above, each partition corresponds to a physical folder, named after a title based access rules partition "-" connector, and then followed by the section number, the section number starting from 0, the maximum number of partitions The total number minus 1. 1 to multiple copies, copies of each partition have partition distributed on different clusters of the agent in order to improve usability. From the storage perspective, each copy is logically partition can be abstracted as a log (the Log) objects, i.e., objects are partitioned copy of the corresponding log. The figure is the distribution of a physical view of a main / backup copy of the three nodes consisting of Kafka Broker cluster partition:
2.2Kafka log memory structure and the index data files
在Kafka中,每个Log对象又可以划分为多个LogSegment文件,每个LogSegment文件包括一个日志数据文件和两个索引文件(偏移量索引文件和消息时间戳索引文件)。其中,每个LogSegment中的日志数据文件大小均相等(该日志数据文件的大小可以通过在Kafka Broker的config/server.properties配置文件的中的“log.segment.bytes”进行设置,默认为1G大小(1073741824字节),在顺序写入消息时如果超出该设定的阈值,将会创建一组新的日志数据和索引文件)。
Kafka将日志文件封装成一个FileMessageSet对象,将偏移量索引文件和消息时间戳索引文件分别封装成OffsetIndex和TimerIndex对象。Log和LogSegment均为逻辑概念,Log是对副本在Broker上存储文件的抽象,而LogSegment是对副本存储下每个日志分段的抽象,日志与索引文件才与磁盘上的物理存储相对应;下图为Kafka日志存储结构中的对象之间的对应关系图:
为了进一步查看“.index”偏移量索引文件、“.timeindex”时间戳索引文件和“.log”日志数据文件,可以执行下面的命令将二进制分段的索引和日志数据文件内容转换为字符型文件:
# 1、执行下面命令即可将日志数据文件内容dump出来
./kafka-run-class.sh kafka.tools.DumpLogSegments --files /apps/svr/Kafka/kafkalogs/kafka-topic-01-0/00000000000022372103.log --print-data-log > 00000000000022372103_txt.log #2、dump出来的具体日志数据内容 Dumping /apps/svr/Kafka/kafkalogs/kafka-topic-01-0/00000000000022372103.log Starting offset: 22372103 offset: 22372103 position: 0 CreateTime: 1532433067157 isvalid: true keysize: 4 valuesize: 36 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] key: 1 payload: 5d2697c5-d04a-4018-941d-881ac72ed9fd offset: 22372104 position: 0 CreateTime: 1532433067159 isvalid: true keysize: 4 valuesize: 36 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] key: 1 payload: 0ecaae7d-aba5-4dd5-90df-597c8b426b47 offset: 22372105 position: 0 CreateTime: 1532433067159 isvalid: true keysize: 4 valuesize: 36 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] key: 1 payload: 87709dd9-596b-4cf4-80fa-d1609d1f2087 ...... ...... offset: 22372444 position: 16365 CreateTime: 1532433067166 isvalid: true keysize: 4 valuesize: 36 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] key: 1 payload: 8d52ec65-88cf-4afd-adf1-e940ed9a8ff9 offset: 22372445 position: 16365 CreateTime: 1532433067168 isvalid: true keysize: 4 valuesize: 36 magic: 2 compresscodec: NONE producerId: -