[Message queue] Kafka file storage mechanism

1. What is Kafka

Kafka was originally developed by Linkedin. It is a distributed, partitioned, multi-copy, multi-subscriber, distributed log system based on zookeeper coordination (also can be used as an MQ system). It can be commonly used for web/nginx logs and access Logs, messaging services, etc., Linkedin contributed to the Apache Foundation in 2010 and became a top open source project.

The performance of a commercial message queue is good or bad, and its file storage mechanism design is to measure the technical level and one of the most critical indicators of a message queue service. The following will analyze how Kafka achieves efficient file storage from the perspective of Kafka file storage mechanism and physical structure, and its actual application effects.

  • Broker: The message middleware processing node. A Kafka node is a broker, and multiple brokers can form a Kafka cluster.
  • Topic: A type of messages, such as page view logs, click logs, etc., can exist in the form of topics, and the Kafka cluster can be responsible for the distribution of multiple topics at the same time.
  • Partition: The physical grouping of topics. A topic can be divided into multiple partitions, and each partition is an ordered queue.
  • Segment: The partition is physically composed of multiple segments, which are described in detail in 2.2 and 2.3 below.
  • offset: Each partition consists of a series of ordered and immutable messages, which are successively appended to the partition. Each message in the partition has a continuous sequence number called offset, which is used to uniquely identify a message in the partition.

The analysis process is divided into the following 4 steps:

  • Partition storage distribution in topic
  • File storage method in partiton
  • Segment file storage structure in partiton
  • How to find message by offset in partition

Through the detailed analysis of the above 4 processes, we can clearly understand the mystery of the kafka file storage mechanism.

Two, Kafka file storage mechanism

2.1 Partition storage distribution in topic

Assuming that there is only one broker in the Kafka cluster in the experimental environment, xxx/message-folder is the data file storage root directory, and the server.properties file configuration in the Kafka broker (parameter log.dirs=xxx/message-folder), for example, create 2 topic names They are report_push, launch_info, the number of partitions is partitions=4 The storage path and directory rules are: xxx/message-folder

              |--report_push-0
              |--report_push-1
              |--report_push-2
              |--report_push-3
              |--launch_info-0
              |--launch_info-1
              |--launch_info-2
              |--launch_info-3

In Kafka file storage, there are multiple different partitions under the same topic, and each partition is a directory. The partition naming rule is topic name + sequential number. The first partition number starts from 0, and the maximum number is the number of partitions minus 1. If it is a multi-broker distribution, please refer to the analysis of the principle of kafka cluster partition distribution

2.2 File storage method in partiton

The following schematic diagram illustrates the storage method of files in the partition:

image

  • Each partition (directory) is equivalent to a huge file that is evenly distributed among multiple data files of equal size segment (segment). However, the number of messages in each segment file is not necessarily equal. This feature facilitates the rapid deletion of old segment files.
  • Each partiton only needs to support sequential read and write, and the segment file life cycle is determined by the server configuration parameters.

The advantage of this is that it can quickly delete useless files and effectively improve disk utilization.

2.3 Segment file storage structure in partiton

Readers learn about the partition storage mode of the Kafka file system from section 2.2. This section analyzes the composition and physical structure of the segment file in the partition in depth.

  • Segment file is composed of two parts, index file and data file respectively. These two files correspond one-to-one and appear in pairs. The suffixes ".index" and ".log" are respectively represented as segment index files and data files.
  • Segment file naming rules: The first segment of the global partition starts from 0, and each subsequent segment file is named the offset value of the last message of the previous segment file. The maximum value is a 64-bit long size, 19-bit numeric character length, and zeros are used to fill in no digits.

The following file list is an experiment done by the author on the Kafka broker. Create a topicXXX containing 1 partition, set the size of each segment to 500MB, and start the producer to write a large amount of data to the Kafka broker, as shown in Figure 2 below. The image description of the segment file list The above 2 rules:

image

Taking the pair of segment file files in Figure 2 as an example, the physical structure of the index<—->data file in the segment is as follows:

image

The index file in FIG. 3 above stores a large amount of metadata, the data file stores a large amount of messages, and the metadata in the index file points to the physical offset address of the message in the corresponding data file. Taking the metadata 3,497 in the index file as an example, the third message is indicated in the data file in turn (the 368772th message in the global partiton), and the physical offset address of the message is 497.

From Figure 3 above, we know that the segment data file is composed of many messages. The following describes the physical structure of the message in detail as follows:

image

Parameter Description:

Keyword explain
8 byte offset Each message in the partition has an ordered id number. This id number is called an offset, which can uniquely determine the position of each message in the partition. That is, the offset represents the number of message of the partiion
4 byte message size message size
4 byte CRC32 Use crc32 to verify message
1 byte “magic” Indicates the version number of the Kafka service program protocol released this time
1 byte “attributes” Indicates an independent version, or identifies the compression type, or encoding type.
4 byte key length Indicates the length of the key. When the key is -1, the K byte key field is not filled
K byte key Optional
value bytes payload Represents the actual message data.

2.4 How to find message by offset in partition

For example, to read the message with offset=368776, you need to search through the following 2 steps.

  • The first step is to find the segment file as shown in Figure 2 above, where 00000000000000000000.index represents the first file, and the start offset (offset) is 0. The start offset of the message volume of the second file 00000000000000368769.index is 368770 = 368769 + 1. Similarly, the starting offset of the third file 00000000000000737337.index is 737338=737337 + 1, and the other subsequent files are followed by analogy. These files are named and sorted by the starting offset, as long as the offset ** You can quickly locate a specific file by searching the **file list in binary. When offset=368776, locate 00000000000000368769.index|log

  • The second step is to find the message through the segment file. The first step is to locate the segment file. When offset=368776, locate the metadata physical location of 00000000000000368769.index and the physical offset address of 00000000000000368769.log, and then pass 00000000000000368769.log Search sequentially until offset=368776.

The advantages of this can be seen from Figure 3 above. The segment index file adopts a sparse index storage method, which reduces the size of the index file. It can directly operate on the memory through mmap. The sparse index sets a metadata pointer for each corresponding message of the data file. Dense index saves more storage space, but it takes more time to look up.

3. Kafka file storage mechanism-actual operation effect

lab environment:

  • Kafka cluster: consists of 2 virtual machines
  • cpu: 4 nuclei
  • Physical memory: 8GB
  • Network card: Gigabit network card
  • jvm heap: 4GB
  • For detailed Kafka server configuration and optimization, please refer to: kafka server.properties configuration details

image

As can be seen from the above Figure 5, Kafka rarely reads a large number of disk operations when it is running, and mainly writes to disks in batches on a regular basis, so operating disks is very efficient. This is closely related to the design of reading and writing messages in Kafka file storage. Reading and writing messages in Kafka has the following characteristics:

Write message

  • The message is transferred from the java heap to the page cache (that is, physical memory).
  • The disk is flushed by the asynchronous thread, and the message is flushed to the disk from the page cache.

Read message

  • The message is directly transferred from the page cache to the socket and sent out.
  • When the corresponding data is not found from the page cache, disk IO will be generated at this time, from the disk Load message to the page cache, and then sent directly from the socket

Kafka efficient file storage design features

  • Kafka divides a large parition file in a topic into multiple small file segments. Through multiple small file segments, it is easy to periodically clear or delete files that have been consumed, reducing disk usage.
  • The index information can quickly locate the message and determine the maximum size of the response.
  • By mapping all index metadata to memory, segment file IO disk operations can be avoided.
  • Through sparse storage of index files, the space occupied by index file metadata can be greatly reduced.

related articles

  1. Linux Page Cache mechanism 
  2. Kafka official documentation

 

Guess you like

Origin blog.csdn.net/qq_41893274/article/details/112746898