Kafka 的 rebalance

background

Once I saw kafka's rebalance, I thought it was very interesting, so I tried it myself

experimental design

Design a topic with 3 partitions to verify the relationship between consumer 1, 2, 3, and 4 time partitions and consumers respectively

Experimental procedure

  1. Create a topic with 3 partitions
//创建TopicName为topic-partition3的Topic并设置分区数为3以及副本数为1
@Bean
public NewTopic initialTopic() {
    return new NewTopic("topic-partition3",3, (short) 1 );
}
复制代码
  1. A producer produces a piece of data to each of the 3 partitions every 1000ms
@Test
public void testProducer() throws InterruptedException {
    for (int i = 0; i < 5000; i++) {
        kafkaTemplate.send("topic-partition3", 0, null, "data package from partition0 [" + i + "]");
        kafkaTemplate.send("topic-partition3", 1, null, "data package from partition1 [" + i + "]");
        kafkaTemplate.send("topic-partition3", 2, null, "data package from partition2 [" + i + "]");
        //休眠1秒
        Thread.sleep(1000);
    }
}
复制代码
  1. Consuming data one by one in the same consumer group, each one takes 800ms
//声明consumerID为ypq-consumer, 消费组ypq-group, 为监听topicName为topic-partition3的Topic
@KafkaListener(id = "ypq-consumer", groupId = "ypq-group", topics = "topic-partition3")
public void listen(String msgData) throws InterruptedException {
    LOGGER.info("consume data : " + msgData);
    // 模拟耗时处理
    Thread.sleep(800);
}
复制代码
  1. When there is only one consumer, C1 printsypq-group: partitions assigned: [topic-partition3-1, topic-partition3-2, topic-partition3-0]
  2. When there are two consumers, C1 prints ypq-group: partitions assigned: [topic-partition3-1, topic-partition3-0], C2 printsypq-group: partitions assigned: [topic-partition3-2]
  3. Three consumers, C1 print ypq-group: partitions assigned: [topic-partition3-0], C2 print ypq-group: partitions assigned: [topic-partition3-1], C3 printypq-group: partitions assigned: [topic-partition3-2]
  4. When there are four consumers, C1-3 prints unchanged, C4 printsypq-group: partitions assigned: []

Experimental Statistics

number of consumers P1 P2 P3
1 C1 C1 C1
2 C1 C1 C2
3 C1 C2 C3
4 C1 C2 C3

Summarize

  1. When the number of consumers is greater than the number of partitions, idle consumers are generated, because each partition can only be consumed by one consumer at most
  2. Kafka's default rebalance strategy is range, and the remainder of the number of partitions to the number of consumers will be allocated to the top consumers, so the top consumers are under great pressure
  3. Data in different partitions under the same topic cannot be guaranteed to be in order, but data in the same partition is guaranteed to be in order
  4. If a consumer wants to consume multi-threaded, it needs to ensure the data order of multi-threaded consumption by itself
  5. Consumers also need to consider scenarios of repeated data consumption

Guess you like

Origin juejin.im/post/7087495701642346526