Implementation principle of messages in Kafka message queue

1. The concept of Kafka message data storage

The producer sends message data and stores it in the Topic of the Kafka message queue. Topic is a logical concept, and the message data is finally stored in the specified data storage path. Therefore, there are two ways of data consumption, consumption from offset +1 Data and start consumption from the first piece of data. These two consumption methods actually read the data files in the hard disk.

In Kafka's data storage directory, you can see that there are many offsets directories. The offsets directory refers to the offset. In the data storage directory, each Topic topic has an independent directory to store the message data in it.

image-20220317231315826

In the data directory of the topic, there are message index files, message time index files, and message data files. Among them, the file 00000000000000000000.log is the file for storing message data.

image-20220317231537287

2. Conceptual principle of message offset

Consuming data at offset +1 means not consuming old data, but only consuming new message data.

Producers write data into Kafka's Topic. Topic is a concept at the logical level. Message data will eventually be written into the data file 00000000000000000000.log in the Topic storage directory. When consumers consume data, they read this For data files, the default consumption method is to obtain messages from offset +1 and then consume them. This offset can be understood as the last message in the data file. Consumers locate the message according to the offset, and then Add another message data location, as shown in the figure below, there are already 3 message data in the table, and the consumer locates messages3 (the last piece of data) through the offset, but this is only the first condition, the second The condition is to add 1 piece of data at the offset position, and the data that the final consumer will start to consume is the content of the fourth message.

Offset + 1 is to start consuming data "after" the last message.

A common understanding is that old data consumers will not consume. When the consumer is turned on, only when new message data is written to the topic will it be read and consumed by the consumer.

When consumers consume this kind of offset data, they use offset to find the location of the data to be consumed.

image-20220318101510824It will be easier to understand the principle and concept of consumption data at offset +1 through the following figure:

The three pieces of data message1-3 in the data file are all old data. Use the offset to locate the last message data message3 in the data file, and then locate the fourth message in the data file after +1 message, from This position starts to be consumed by consumers, and the data after the fourth item will be consumed by consumers.

image-20220318101557224

3. Conceptual principle of sequential consumption of message data

Sequential consumption refers to consumption from the first message data in the Topic.

When explaining the concept of offset consumption above, we already know that in Kafka, each message data will be stored in a data file, and each message data in this data file has a sequential concept, that is, it will be in accordance with the message data The sequence of writing is written to the data file in turn, and the message is persisted to the data file, so we can consume the message data sequentially according to the sequence of the message data.

Consuming message data sequentially means consuming from the first message in the topic, regardless of whether it is old data or whether it has been consumed before, ignoring these characteristics, and still starting to consume from the first message data.

The message data written to the data file is ordered by default, and the order of the message is described by the offset offset.

As shown in the figure below: When consumers consume data, they will start to consume from the first message data in the data file 00000000000000000000.log, and consume one by one in order.

image-20220318101621805The most important concept of sequential consumption is to consume from the first message data regardless of old or new data.

image-20220318101639527

4. The concept and implementation of message unicast consumption

4.1. The concept of unicast consumption

Unicast consumption of message data means that only one consumer in a consumer group can consume message data in a topic.

As shown in the figure below, the producer writes the message data into the topic, and the consumer is a consumer group testgroup1. In this consumer group, only one consumer can consume the message data in the topic. Generally speaking, it will The newly started consumer consumes the data in the Topic, but there are also cases where the previously started consumer consumes the data. All in all, no matter which consumer it is, only one consumer in a consumer group can consume data .

image-20220318101803499

4.2. Implementation of unicast consumption

In order to verify the above theory, we can start two consumers in the same group and observe who will consume the data in the end. In theory, only one of them will consume the data.

1) Start the first consumer client in the testgroup1 group

[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup1 --topic test_topic1

2) Start the producer to produce the first message of message data

[root@kafka bin]# ./kafka-console-producer.sh --broker-list 192.168.81.210:9092 --topic test_topic1
>java

3) The first consumer successfully consumes the data

[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup1 --topic test_topic1
java

4) Start the second consumer client in the testgroup1 group

[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup1 --topic test_topic1

5) The sender sends new message data

[root@kafka bin]# ./kafka-console-producer.sh --broker-list 192.168.81.210:9092 --topic test_topic1
>java
>
>
>devops jiangxl
>group new consumer  

6) Observe the reception of consumers

At this time, it will be found that the first consumer will no longer consume data, but a second consumer will consume data in the topic, and the newly started consumer will consume data.

[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup1 --topic test_topic1


devops jiangxl
group new consumer

image-20220318111820983

5. The concept and implementation of message multicast consumption

5.1. Concept of multicast consumption

Multicast consumption of message data means that multiple consumer groups are allowed to consume data in a topic at the same time, but only one consumer in a consumer group is consuming message data.

The difference between multicast and unicast is that unicast is a consumer group, multicast is multiple consumer groups consume data at the same time, and one message data will be consumed by two consumer groups.

As shown in the figure below: two groups of consumers will consume the message data in the Topic at the same time, and a certain consumer in a consumer group will consume the data

image-20220318101834102

Multicast consumption is often used in scenarios that need to be consumed by multiple consumers at the same time.

5.2. Implementation of multicast consumption

1.先启动两组消费者
[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup1 --topic test_topic1
[root@kafka bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.81.210:9092 --consumer-property group.id=testgroup2 --topic test_topic1


2.发送者产生消息数据
[root@kafka bin]# ./kafka-console-producer.sh --broker-list 192.168.81.210:9092 --topic test_topic1
>duo bo xiao fei
>
>hahahahahha
>
>usaisaiadosjlasjdlksajdl

3.观察数据消费情况
如下图所示,消息数据会被两组消费者同时消费。

image-20220318120257091

6. View consumer groups and detailed information

The concept of consumer groups is used in unicast and multicast, and we can query consumer information through commands.

1.查看系统中有哪些消费组
[root@kafka bin]# ./kafka-consumer-groups.sh --bootstrap-server 192.168.81.210:9092 --list
testgroup1
testgroup2

2.查看消费组的详细信息
[root@kafka bin]# ./kafka-consumer-groups.sh --bootstrap-server 192.168.81.210:9092 --describe --group testgroup1

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                HOST            CLIENT-ID
testgroup1      test_topic1     0          63              79              16              consumer-testgroup1-1-b329b404-a901-4ab9-8485-6f2915719187 /192.168.81.210 consumer-testgroup1-1

#字段说明
GROUP:消费组的名称
TOPIC:当前消费组消费的Topic信息
PARTITION:Topic的分区号
CURRENT-OFFSET:当前消费组消费到的偏移量位置,也就是上次消费者消费的最后一条数据的位置,值为69,也就是上次消费到了第69条数据
LOG-END-OFFSET:Topic中当前分区的结束偏移量位置,可以理解为Topic中最后一条消息数据是第多少条,也可以理解为是Topic中消息的总条数
LAG:当前消费组没有消费的数据条目,也就是LOG-END-OFFSET值减CURRENT-OFFSET值得出的结果
CONSUMER-ID:消费者组的ID号
HOST:kafka节点地址
CLIENT-ID:消费组中的消费者ID号

image-20220318131958802

7. The concept of Topic topic and partition in Kafka

7.1. Topic topic concept

Topic is a logical concept in Kafka. Kafka classifies messages from different applications and stores them in different Topics. Different Topics will be consumed by consumers who subscribe to the Topic.

The view topic faces a problem. If there is a lot of message data written in this topic, which may reach hundreds of gigabytes or even several terabytes of storage space, the message data in the topic is persisted to the local data file, which is equivalent to When large, both the consumer and the sender will encounter performance bottlenecks.

In order to solve the problem of performance impact caused by excessive message volume, Kafka designed the concept of partition for Topic.

7.2. Partition partition concept in Topic

Kafka's Topic can store message data in partitions. As shown in the figure below, a Topic can be divided into three partitions 0/1/2. When multiple senders send message data, they can write to the three partitions at the same time. For message data, when consumers read data, they can also read and consume three partitions at the same time.

Before Partition partition is used, 100 message data will be stored in a Topic, and then persisted to a local data file. After Partition partition is used, 100 message data will be distributed and stored as 30 without partition. Persist to different data files.

The partitions are assigned starting from 0, the first partition is xxx-0, the second partition is xxx-1, and the third partition is xxx-2.

image-20220318144317696

The benefits of storing message data in topic partitions:

  • When the business is highly concurrent, it can avoid the problem that the topic persists to the local data file too large. The Partition partition logically splits the topic and persists the message data into different data files, regardless of whether it is read or written Both are stronger than all together.
  • Both the sender and the consumer can read and write to three or more partitions of a topic at the same time. In the original case, both the sender and the consumer write and read data one by one into the topic. After using the Partition partition, Split the topic into several parts, and consumers and senders can read and write data to these partitions at the same time.
  • Originally, there was only one entry for reading and writing messages. After using Partition, it is equivalent to dividing one entry into multiple entries. Reading and writing data do not need to be carried out one by one, thus improving the efficiency of senders and consumers. throughput.

7.3. Create multi-partition Topic

1.创建多分区类型的Topic
[root@kafka bin]#./kafka-topics.sh --create --zookeeper 192.168.81.210:2181 --replication-factor 1 --partitions 3 --topic test_topic2
Created topic test_topic2.
`--partitions`参数用于指定分区数量

2.查看Topic在数据路径产生的文件
[root@kafka kafka]# ll data/kafka-logs/test_topic2-*
data/kafka-logs/test_topic2-0:
总用量 4
-rw-r--r-- 1 root root 10485760 3月  18 16:06 00000000000000000000.index
-rw-r--r-- 1 root root        0 3月  18 16:06 00000000000000000000.log
-rw-r--r-- 1 root root 10485756 3月  18 16:06 00000000000000000000.timeindex
-rw-r--r-- 1 root root        8 3月  18 16:06 leader-epoch-checkpoint

data/kafka-logs/test_topic2-1:
总用量 4
-rw-r--r-- 1 root root 10485760 3月  18 16:06 00000000000000000000.index
-rw-r--r-- 1 root root        0 3月  18 16:06 00000000000000000000.log
-rw-r--r-- 1 root root 10485756 3月  18 16:06 00000000000000000000.timeindex
-rw-r--r-- 1 root root        8 3月  18 16:06 leader-epoch-checkpoint

data/kafka-logs/test_topic2-2:
总用量 4
-rw-r--r-- 1 root root 10485760 3月  18 16:06 00000000000000000000.index
-rw-r--r-- 1 root root        0 3月  18 16:06 00000000000000000000.log
-rw-r--r-- 1 root root 10485756 3月  18 16:06 00000000000000000000.timeindex
-rw-r--r-- 1 root root        8 3月  18 16:06 leader-epoch-checkpoint
#每一个分区都有独立的数据文件

8. Contents stored in message data files in Kafka

8.1. Topic message data persistence file

There are many files in Kafka's message data persistence directory, as shown in the figure below, each Topic topic has a separate directory for storage, and the naming format is, You can see that the topic 主题名称-分区名称has __consumer_offsets50 partitions, and each partition has an independent directory Storage, and themes we created ourselves, etc.

image-20220318161004772

The directory of each topic partition contains several files as shown in the figure below, among which three files are more important.

00000000000000000000.index: record an index of each message data, through this index and timeindex index, you can find the generation time of the message data.

00000000000000000000.log: All message data will be saved in this file.

00000000000000000000.timeindex: the time index for recording message data.

image-20220318161407224

8.2. Concept of Kafka internal topic consumer_offsets

The role of the Consumer_offset topic is to record the offset position of consumer consumption data.

Describe the role of the consumer_offsets topic as shown in the figure below.

A business topic saves 100 message data. Consumer 1 consumes message data from the topic, and at the same time periodically writes the offset of the consumed data to the Consumer_offset topic inside kafka. If consumer 1 is consuming to the 50th When there are 1 message data, the abnormal downtime is unavailable. At this time, after consumer 2 is started, it can write the message offset in the Consumer_offset topic through consumer 1, and calculate where the consumer is based on the offset. If the message is down when the message is consumed, it will be consumed immediately from this message, so as to avoid the problem that the message will not be consumed for a long time.

Consumer 1 is down from the 50th message. After consumer 2 is started, it obtains the offset of consumer 1's processing message from the consumer_offsets topic. It is known that consumer 1 was down when it processed the 50th message last time. Add 1 here according to the consumption method of the offset, and finally judge that consumption should start from the 51st message.

image-20220318164028200

In short, the Consumer_offset topic is to record the offset value of consumer consumption data, and other consumers can further consume data based on this value.

Kafka's internal Consumer_offset topic contains 50 partitions. Since each consumer will report the consumed offset to this topic, Kafka improves the concurrency of this topic. Therefore, it is set to 50 partitions.

Message data in the Consumer_offset topic is retained for 7 days by default.

The message data that the consumer regularly writes the offset to the Consumer_offset topic is in the form of key+value, and the data content is:

key:consumerGroupID+topic+分区号

value: the value of the current offset offset

The Consumer_offset topic has 50 partitions, so how does the consumer know which partition to write message data to? In fact, there is a formula to calculate, as shown below.

hash(consumerGroupId)%__consumer_offsets主题的分区数

Guess you like

Origin blog.csdn.net/weixin_44953658/article/details/131410382