Kafka's partition strategy

Kafka partition strategy

1. Range strategy

The Range strategy is for each topic. First, the partitions in the same topic are sorted by serial number, and the consumers are sorted in alphabetical order. In our example, the sorted partitions will be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; the consumer threads sorted will be C1-0, C2-0 , C2-1. Then divide the number of partitions by the total number of consumer threads to determine how many partitions each consumer thread consumes. If the division is not enough, the first few consumer threads will consume one more partition. In our example, we have 10 partitions, 3 consumer threads, 10/3 = 3, and inexhaustible, then consumer thread C1-0 will consume one more partition, so the final partition allocation results look at It looks like this:

C1-0 will consume 0, 1, 2, 3 partitions
C2-0 will consume 4, 5, 6 partitions
C2-1 will consume 7, 8, 9 partitions

If we have 11 partitions, then the result of the final partition allocation looks like this:

C1-0 will consume 0, 1, 2, 3 partitions
C2-0 will consume 4, 5, 6, 7 partitions
C2-1 will consume 8, 9, 10 partitions

If we have 2 topics (T1 and T2), each with 10 partitions, then the result of the final partition allocation looks like this:

C1-0 will consume the 0, 1, 2, 3 partitions of the T1 topic and the 0, 1, 2, 3 partitions of the T2 topic
C2-0 will consume the 4, 5, and 6 partitions of the T1 topic and the 4, 5, and 4 of the T2 topic 6 partition
C2-1 will consume 7, 8, 9 of the T1 topic and 7, 8, 9 of the T2 topic

It can be seen that the C1-0 consumer thread consumes 2 more partitions than other consumer threads. This is an obvious drawback of the Range strategy.

2.RoundRobin strategy

There are two prerequisites that must be met to use the RoundRobin strategy:

The num.streams of all consumers in the same Consumer Group must be equal;
the topics subscribed by each consumer must be the same.
So here it is assumed that num.streams = 2 of the two consumers mentioned earlier. The working principle of the RoundRobin strategy: compose the partitions of all topics into a TopicAndPartition list, and then sort the TopicAndPartition list according to hashCode. The text here may be unclear. You should understand it by looking at the following code:

val allTopicPartitions = ctx.partitionsForTopic.flatMap {
    
     case(topic, partitions) =>
  info("Consumer %s rebalancing the following partitions for topic %s: %s"
       .format(ctx.consumerId, topic, partitions))
  partitions.map(partition => {
    
    
    TopicAndPartition(topic, partition)
  })
}.toSeq.sortWith((topicPartition1, topicPartition2) => {
    
    
  /*
   * Randomize the order by taking the hashcode to reduce the likelihood of all partitions of a given topic ending
   * up on one consumer (if it has a high enough stream count).
   */
  topicPartition1.toString.hashCode < topicPartition2.toString.hashCode
})

Finally, the partitions are allocated to different consumer threads according to the round-robin style.

In our example, if the topic-partitions groups sorted by hashCode are T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9, our consumer threads are sorted as C1-0, C1-1, C2-0, C2-1, and the final partition allocation result is:

C1-0 will consume T1-5, T1-2, T1-6 partitions;
C1-1 will consume T1-3, T1-1, T1-9 partitions;
C2-0 will consume T1-0, T1-4 partitions;
C2-1 will consume T1-8, T1-7 partitions;

That is, all topic partitions are sorted according to hashcode, and then polled for consumption, which avoids the deviation of the number of consumed topics.
The partition assignment of multiple topics is similar to that of a single topic, so I won’t introduce it here.

According to the above detailed introduction, I believe that everyone has a very clear principle of Kafka's partition allocation strategy. Unfortunately, at present we cannot customize the partition allocation strategy. We can only select range or roundrobin through the partition.assignment.strategy parameter. The default value of the partition.assignment.strategy parameter is range.

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108830943