About the partition allocation strategy of consumers in Kafka

Before talking about partition allocation, we must first emphasize that the consumer uses the pull mode to read data from the broker . Because the push model is difficult to adapt to consumers with different consumption rates, because the message sending rate is determined by the broker. Its goal is to deliver messages as quickly as possible, but this can easily cause consumers to be too late to process messages. Typical manifestations are denial of service and network congestion.

Let's take a look at the diagram of the partition and consumer group as shown below

Insert picture description here

You might think that consumers in this picture will consume the data of a specific partition in the topic. This also leads to our partition allocation strategy: there are multiple consumers in a consumer group, and multiple partitions in a topic, so it must be It involves the allocation of partitions, that is, determining which consumer consumes that partition.

Kafka has two different allocation strategies for consumers: one is RoundRobinAssignor (polling partition), and the other is Range .

RoundRobinAssignor polling partition :
The principle of the RoundRobinAssignor strategy is to sort the partitions of all consumers in the consumer group and all topics subscribed by the consumers in lexicographic order, and then assign the partitions to each consumer one by one by polling consumers.
The polling partition is divided into the following two cases:
Case 1: The messages subscribed by all consumers in the same consumer group are the same.
Case 2: The messages subscribed by all consumers in the same consumer group are different.

For the first situation: if the subscription information of all consumers in the same consumer group is the same, then the partition allocation of the RoundRobinAssignor strategy will be even.

Case 1: Suppose there is a consumer group at this time, there are three consumers c1, c2, c3, and all consumers in the consumer group have subscribed to all topics, the topic contains two, of which the partition of the T1 topic The partition containing topics p0, p1, p2, and T2 contains p0, p1, p2, p3, and p4. Therefore, the consumption method at this time is as shown in the following figure. Because the messages subscribed in the consumer group are the same, the order of the partitions can be directly assigned to the consumers in turn.

Insert picture description here

Case 2:
Suppose there are T1 (including p0, p1, and p2 partitions) and T2 (including p0, p1, p2) in the consumer group. The consumer group contains c1 and c2, where c1 subscribes to T1, c2 subscribe to T2

Insert picture description here

Because according to the method of RoundRobinAssignor, he will sort the topic name + partition number as the hash value, assuming that the sorting is T2p1, T1p0, T1p2, T2p0, T1p1, T2p2.
Then the final assignment is

Insert picture description here
In this case, there will be a contradiction with the topic subscribed by our consumers, because c1 only subscribes to T1, but T2 appears, and c2 only subscribes to T2, but T1 appears. In
other words, the polling strategy of case two Next, consumers in the consumer group cannot specify the topics they want to consume. The topics you specify will be taken out to the entire group for uniform distribution. I will also consume topics that I have not specified . This is why RoundRobinAssignor is not our default partition assignment method.

Range partition allocation method
Because RoundRobinAssignor has the above shortcomings, Kafka uses range as the default partition allocation strategy. Range can individually set the topic to be consumed for the specified consumers in a consumer group. What I want to emphasize here is that the
1.range strategy is for each topic, and there is no correlation when assigning between topics.
2. The range partition strategy is to determine how many partitions each consumer should consume through the number of partitions/the number of consumers. If you can't divide it, the first few consumers will consume one more partition.

Case 1: Assuming that there is a total of topic T0 in the Kafka cluster, T0 contains a total of 7 partitions, and there are a total of three consumers in the consumer group, then 7%3 and 1. The
distribution plan is

Insert picture description here

Case 2: If it is the following cluster (Note: where c1 and c2 are both consuming T1 and T2 )

Insert picture description here

Then after Range allocation, c1: T1p0, T1p1, T2p0, T2p1.
c2: T1p2, T2p2.
Now there are only two topics, and there are four more topics for c1 than c2. If there are two more topics, then C1 will consume more partitions than c2. This is also the disadvantage of Range allocation. But the advantage is that different consumers in the same consumer group can specify the topics that they need to consume separately, and only when other consumers and the current consumer specify the same topic, different partitions of this topic will be allocated together .

Summary:
1. If you want to use RoundRobinAssignor, you must ensure that all consumers in the consumer group subscribe to the same topic.
2. RoundRobinAssignor has an advantage that the difference in the number of partitions consumed between different consumers of the same consumer group will not exceed 1 at most, because it is always allocated by polling.
3. RoundRobinAssignor is based on the group, and Range is based on the topic.

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/107309771