Kafka's Topic, Partition and Message

Kafka's Topic and Partition

Topic

  • Topic Kafka is the basic unit of data write operation, a copy can be specified
  1. Topic comprising one or more a Partition, Topic is built can manually specify the number of number of Partition, a considerable number of the server
  2. Each message belongs to one and only one Topic
  3. Producer publish data, you must specify which posted the message to the Topic
  4. Consumer subscription news, information must also specify which Topic subscription

 

 

 

 

 

 

 

Partition

  • Each Partition only on a Broker, physically corresponding to each Partition is a folder
  • Kafka is used by default hash partition, so that different partitions of data is not the same situation will arise, but can override the partitioner
  • Partition includes a plurality of Segment, a file corresponding to each of Segment, Segment size can manually specify, when Segment reaches a threshold value, will no longer write data is the same size each Segment
  • Segment becomes unrecordable by a plurality of composition, is recorded only append to the Segment, the individual will not be deleted or modified, the number of each Message Segment is not necessarily equal

 

 

  • When clearing outdated logs, supported the deletion of one or more Segment, the default seven days of data
  • Kafka is actually used to write memory-mapped files, disk read and write sequential technology to improve performance. Producer produced messages are sent according to a certain grouping strategy to broker the partition of the time, these messages if does not fit in memory, will be placed on file in the directory partition, partition the directory name is the name with the topic a serial number. There are two types of files in this directory, one is based on log file with the extension, and the other is the index file with the extension, each log file and a corresponding index file, the file is a Segment File, in which the log file is a data file, which is stored in Message, and index file is an index file. Index file records metadata information corresponding to the physical offset of the data file Message.
  • LogSegment file naming rules, partition global first Segment from 0 (20 0) start, subsequent file name of each file is the offset value of the last message on a file, what good is it so named ? If we have a Consumer been consumed to offset = x, so if you want to continue to spend, then you can use a binary search to look for LogSegment file search, you can navigate to a file, and then take the x value to the corresponding index data file to find the location where the article x. Consumer when reading data, the actual read Index of offset, and will record where the last read.

 

 

 

  • Re-analyze in detail Index file, the left half of the FIG Index file, which is stored on the n key-value, wherein the key is a Message ID in the log file, such as 1,3,6,8 ... .., represents the article 1, 3, 6, 8 messages, etc., but not as index file and data file are indexed each message, instead of using a sparse storage mode, bytes of data at regular intervals to establish an index. This avoids a sequential index files take up too much space, so the index files can be kept in memory, but the drawback is not indexed Message can not navigate to its first position in the log file, in which case you need to do scan, but this scanning range of the order will be small. It indicates an offset address value of a physical value of the message.

 

  • While Kafka is writing data sequentially, but difficult to ensure global consumption is ordered. When there are a plurality of partition, message to the packet memory when the partition is already a disorder, such as 0-10 to keep this part of the message is saved to partition1,11-20 partition2 like. The only guarantee is that for a topic in the data, internal order in the partition of news consumption, global order can not be done.

 

 

 

 

Message of a fixed length header and a variable length message body of the body composition

8 byte offset within each message parition (partition) has an ordered id number, this id number is called an offset (offset), which uniquely identify the location of each message within the parition (partition). That offset represents the number of message partiion of
4 byte message sizemessage size
4 byte CRC32 with crc32 checksum message
represents 1 byte "magic" This release Kafka service program protocol version
1 byte "attributes" as a stand-alone version of a compression type, or logos , or encoding type.
4 byte key length indicates the length of the key, when the key is -1, K byte key field blank
K byte key optional
value bytes payload represents the actual message data.

 

 

Reference
storage mechanism Kafka file: HTTPS: //www.cnblogs.com/jun1019/p/6256514.html
https://baijiahao.baidu.com/s?id=1608205621370302980&wfr=spider&for=pc

 

https://blog.csdn.net/lrxcmwy2/article/details/82853300

Guess you like

Origin www.cnblogs.com/Allen-rg/p/11598254.html