Data file of kafka partition (offffset, MessageSize, data)

Each Message in the partition contains the following three attributes: offset, MessageSize, data, where offset represents the offset of the Message in this partition, and offset is not the actual storage location of the Message in the partition data file, but logically A value, which uniquely determines a Message in the partition. It can be considered that offset is the id of the Message in the partition; MessageSize indicates the size of the message content data; data is the specific content of the Message.

Table of contents

1. Offset

Two, MessageSize

3. data


 

1. Offset

In Kafka, each partition (Partition) has a unique offset (Offset), which is used to identify the position of the message in the partition. The offset can be understood as the number or index of the message in the partition.

Each message is assigned a specific offset relative to the partition it belongs to. When a message is written to a Kafka partition, Kafka will assign an incremental offset to each message so that the message can be accurately located later. The offset is a 64-bit long integer value, usually represented as an integer.

The function of the offset is to provide the sequence and positioning capability of the message. Consumers can use offsets to specify where to start consuming messages from a partition. After each consumption, the consumer will submit the offset of the consumed message so that it can continue to consume from the correct position the next time it is consumed. This ensures that messages are consumed sequentially, while also providing flexible consumption location tracking and fault tolerance.

Through the offset, Kafka can guarantee that the messages are always written and consumed in order, and the consistency can be maintained even in the case of failure or rebalance. The management of offsets is automatically handled by the Kafka cluster, and consumers only need to focus on submitting the correct offsets.

 

Two, MessageSize

In Kafka, MessageSizeit is not the attribute of each message, but the size of the message, that is, the number of bytes in the message body.

Each message consists of two parts in Kafka: message header (Message Header) and message body (Message Body). The message header contains some metadata, such as the subject of the message, partition, offset, etc., while the message body is the actual data content.

To obtain the size of the message, it can be obtained by calculating the sum of the bytes of the message header and the message body. Kafka provides ConsumerRecordobjects to represent the consumed messages, which contain the attributes and data of the messages. Using the object, you can get the byte array of the message body ConsumerRecordby calling the method, and use the property to get the length of the byte array, that is, the size of the message.value()length

The sample code is as follows:

ConsumerRecord<String, String> record = ... // 从消费者获取到的消息
String message =.value();
int messageSize = message.getBytes().length;
System.out.println("消息大小:" + messageSize + "字节");

It should be noted that due to Kafka's message size limit, larger messages may be split into multiple fragments and stored in multiple different messages. Therefore, if you want to obtain the size of the entire message, you may need to consider accumulating multiple fragments.

 

3. data

In Kafka, each message can contain a property called "data", which is the actual data content of the message. This is a key-value pair where the key is "data" and the value is the payload of the message. In Kafka's message mechanism, data is transmitted and stored in the form of byte arrays.

In Java, you can get the "data" attribute in the message by using Kafka's consumer API. Here is a sample code:

ConsumerRecord<String, String> record = ... // 从消费者获取到的消息
String data = record.value();
System.out.println("消息的数据内容:" + data);

The method here value()returns the data content of the message, that is, the value of the "data" attribute. In this example, we store the data content in a variable of String type, and you can choose to use different data types for storage and processing according to the actual situation.

It should be noted that Kafka allows you to customize the key-value pair attributes of the message, so in addition to "data", the message can also contain other custom attributes. These attributes can be defined and used according to business requirements, so as to carry more metadata information during message processing.

Guess you like

Origin blog.csdn.net/2301_77899321/article/details/132218497