Kafka message retrieval principle

1. The composition of the message structure

Insert picture description here

2. The principle of message retrieval

In the process of reading data, the data is a record in a certain segment file corresponding to a certain partition of a certain topic. How to find this message efficiently determines the performance of Kafka.
Insert picture description here
The storage directory of data files in Kafka. What we can see is that there are many sets of files under this partition. Each set is composed of a segment data file, an index file, and a timeindex index file.
Segment is the file that saves the data. The relative offset of each message in the index file is the offset of a msg in the segment file, and the offset of each message we are talking about refers to this message in this one The unique identifier among several segments in the partition.
Insert picture description here
The sequence number of the index file is the relative offset of the message in the log file.
OffsetIndex is a sparse index, which means that the relative offset and position of all messages will not be stored
in the process of message retrieval. Take 00000000001560140916 under this partition directory as an example:
positioning Message with offset 1560140921

① Locate the specific segment log file.
Because the file name of the log log file is offset-1 of the first message in this file.
Therefore, you can locate the log file where the message is located according to the offset: 00000000001560140916.log ②Calculate
the relative offset of the searched offset in the log file. The offset
of the first message in the segment file = 1560140917.
Calculate the relative offset of the message: the offset that
needs to be located. -The offset of the first message in the segment file + 1 = 1560140921-1560140917 + 1 = 5
Look up the index file, you can locate the message with an offset byte of 456 in the log file.
In summary, you
can directly read the data offset by 456 bytes in the folder 00000000001560140916.log.
1560140922 -1560140917 +1 = 6 If the relative offset of the searched offset in the log file does not exist in the index file, it
can be searched in descending order according to its closest upper offset in the index file.

Guess you like

Origin blog.csdn.net/Erica_1230/article/details/113796456