Kafka consumer consumer principle detailed explanation

1. Consumption mode

Kafka uses a publish-subscribe model: one-to-many. There are two publish and subscribe modes:

  • The push model is difficult to adapt to consumers with different consumption rates, because the message sending rate is determined by the broker. Its goal is to deliver messages as quickly as possible, but this can easily cause consumers to be too late to process messages. Typical manifestations are denial of service and network congestion. The pull mode can consume messages at an appropriate rate according to the consumer's consumption capacity.

  • Kafka consumer adopts pull mode to read data from broker. The disadvantage of the pull mode is that if Kafka has no data, consumers may fall into a loop and return empty data all the time. In response to this, Kafka consumers will pass in a duration parameter timeout when consuming data. If there is currently no data available for consumption, the consumer will wait for a period of time before returning. This period of time is called timeout.

Two, partition allocation strategy

  • There are multiple consumers in a consumer group and multiple partitions in a topic, so partition allocation is bound to be involved, that is, determining which partition is consumed by which consumer.
  • Factors to consider on how to set the partition value. A partition can only be consumed by one consumer in the same consumer group (a consumer can consume multiple partitions at the same time). Therefore, if the number of partitions set is less than the number of consumers, some consumers will not be able to consume data . Therefore, the recommended number of partitions must be greater than the number of consumers running at the same time. On the other hand, it is recommended that the number of partitions is greater than the number of cluster brokers, so that the leader partition can be evenly distributed among the brokers, and finally the cluster load balance. In Cloudera, each topic has hundreds of partitions. It should be noted that Kafka needs to allocate some memory for each partition to cache message data. If the number of partitions is larger, Kafka must allocate a larger heap space.
  • Kafka has two allocation strategies, one is RoundRobin and the other is Range.

Three, offset maintenance

Since the consumer may experience failures such as power outages and downtime during the consumption process, after the consumer recovers, it needs to continue to consume from the location before the failure, so the consumer needs to record in real time which offset it consumes so that it can continue to consume after the failure recovery.
consumer group +topic + partition uniquely determines an office

Insert picture description here
The high version offset is no longer stored on zk. Starting from version 0.9, the consumer saves the offset in a built-in Kafka topic by default, which is __consumer_offsets.

Insert picture description here
Insert picture description here
Insert picture description here
If you are particularly curious and really want to see the offset or something, you can also do the following:

Modify the configuration file consumer.properties

[root@centos7-4 kafka]# vim config/consumer.properties

添加如下配置
exclude.internal.topics=false

Start another consumer

[root@centos7-4 kafka]# bin/kafka-console-consumer.sh --topic __consumer_offsets 
--bootstrap-server localhost:9092 --formatter 
"kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" 
--consumer.config config/consumer.properties --from-beginning

Insert picture description here

Four, consumer group test

We can verify that consumers in the same consumer group can only consume data from one partition at a time.

  1. Modify the group.id property in the kafka/config/consumer.properties configuration file to any group name on centos7-3 and centos7-4.
[root@centos7-4 kafka]# vim config/consumer.properties

修改 group.id=jh
[root@centos7-3 kafka]# vim config/consumer.properties

修改 group.id=jh
  1. Start consumers on centos7-3 and centos7-4 respectively
[root@centos7-3 kafka]# bin/kafka-console-consumer.sh --bootstrap-server 
localhost:9092 --topic testgroup --from-beginning --consumer.config 
config/consumer.properties
[root@centos7-4 kafka]# bin/kafka-console-consumer.sh --bootstrap-server 
localhost:9092 --topic testgroup --from-beginning --consumer.config 
config/consumer.properties
  1. Start the producer on centos7-1
[root@centos7-1 kafka]# bin/kafka-console-producer.sh --broker-list 
centos7-1:9092 --topic testgroup 

4. Effect display
Insert picture description here
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_46122692/article/details/109270433