Kafka consumer group three partition allocation strategies roundrobin, range, StickyAssignor

 

There are multiple consumers in a consumer group and multiple partitions in a topic, so partition allocation is bound to be involved, that is, to determine which consumer consumes that partition.

Kafka has two allocation strategies, one is roundrobin and the other is range. The latest StickyAssignor strategy

Moving the ownership of a partition from one consumer to another is called rebalance. When the following events occur, Kafka will perform a partition allocation:

  • Add consumers in the same Consumer Group

  • Consumers leave the Consumer Group they currently belong to, including shuts down or crashes

  • New partitions for subscribed topics

At present, we cannot customize the partition allocation strategy, we can only partition.assignment.strategyselect range or roundrobin through parameters. partition.assignment.strategyThe default value of the parameter is range.

Kafka provides the consumer client parameter partition.assignment.strategy to set the partition allocation strategy between consumers and subscription topics. By default, the value of this parameter is: org.apache.kafka.clients.consumer.RangeAssignor, that is, the RangeAssignor allocation strategy is adopted. In addition, Kafka also provides two other allocation strategies: RoundRobinAssignor and StickyAssignor. The consumer client parameter partition.asssignment.strategy can configure multiple allocation strategies, separated by commas.

This article assumes that we have a topic named T1, which contains 10 partitions, and then we have two consumers (C1, C2) to consume the data in these 10 partitions, and the num.streams of C1 = 1, C2 Num.streams = 2.

1.Range (default strategy) #

2.3.x version API introduction: http://kafka.apache.org/23/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html

API introduction of version 0.10: http://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html

Range is for each topic (that is, one topic has one topic point). First, the partitions in the same topic are sorted according to the serial number, and the consumers are sorted in alphabetical order. Then divide the number of Partitions by the total number of consumer threads to determine how many partitions each consumer thread consumes. If the division is not enough, the first few consumer threads will consume one more partition.

Assuming n=number of partitions/number of consumers, m=number of partitions% number of consumers, then the first m consumers are each assigned n+1 partitions, and the following (number of consumers-m) consumers are each assigned n Partitions.

If there are 10 partitions and 3 consumer threads, arrange the partitions in order of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; the consumer threads are C1-0, C2-0, C2 -1, then divide the number of partitions by the total number of consumer threads to determine how many partitions each consumer thread consumes. If it can't be divided, the first few consumers will consume one more partition. In our example, we have 10 partitions, 3 consumer threads, 10/3 = 3, and the division is inexhaustible, then consumer thread C1-0 will consume one more partition, so the final partition allocation result It looks like this:

C1-0:0,1,2,3
C2-0:4,5,6
C2-1:7,8,9

If there are 11 partitions it will be:

C1-0:0,1,2,3
C2-0:4,5,6,7
C2-1:8,9,10

If we have two topics T1 and T2, each with 10 partitions, the final distribution result will be like this:

C1-0:T1(0,1,2,3) T2(0,1,2,3)
C2-0:T1(4,5,6) T2(4,5,6)
C2-1:T1(7,8,9) T2(7,8,9)

It can be seen that the C1-0 consumer thread consumes 2 more partitions than other consumer threads

As above, only for one topic, the impact of C1-0 consumers' consumption of one more partition is not great. If there are more than N topics, then for each topic, consumer C1-0 will consume 1 more partition. The more topics, the partition consumed by C1-0 will consume N more partitions than other consumers. This is an obvious disadvantage of Range partitioning.

2.RoundRobin#

0.10版本API:http://kafka.apache.org/0102/javadoc/allclasses-noframe.html

2.3.x版本API:http://kafka.apache.org/23/javadoc/org/apache/kafka/clients/consumer/RoundRobinAssignor.html

Introduction to RoundRobin

The principle of the RoundRobinAssignor strategy is to sort the partitions of all consumers in the consumer group and all topics subscribed by the consumers in lexicographic order, and then assign the partitions to each consumer one by one through polling. The partition.assignment.strategy parameter value corresponding to the RoundRobinAssignor strategy is: org.apache.kafka.clients.consumer.RoundRobinAssignor.

There are two prerequisites that must be met to use the RoundRobin strategy:

  1. The num.streams (the number of consumer threads) of all consumers in the same consumer group must be equal;
  2. Each consumer must subscribe to the same topic.

So here it is assumed that num.streams = 2 of the two consumers mentioned earlier. The working principle of the RoundRobin strategy: compose the partitions of all topics into a TopicAndPartition list, and then sort the TopicAndPartition list according to hashCode. The text here may be unclear. You should understand it by looking at the following code:

val allTopicPartitions = ctx.partitionsForTopic.flatMap { case(topic, partitions) =>
  info("Consumer %s rebalancing the following partitions for topic %s: %s"
       .format(ctx.consumerId, topic, partitions))
  partitions.map(partition => {
    TopicAndPartition(topic, partition)
  })
}.toSeq.sortWith((topicPartition1, topicPartition2) => {
  /*
   * Randomize the order by taking the hashcode to reduce the likelihood of all partitions of a given topic ending
   * up on one consumer (if it has a high enough stream count).
   */
  topicPartition1.toString.hashCode < topicPartition2.toString.hashCode
})

Finally, the partitions are allocated to different consumer threads according to the round-robin style.

In our example, add the topic-partitions group sorted by hashCode as T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9, our consumer threads are sorted as C1-0, C1-1, C2-0, C2-1, and the final partition allocation result is:

C1-0 将消费 T1-5, T1-2, T1-6 分区;
C1-1 将消费 T1-3, T1-1, T1-9 分区;
C2-0 将消费 T1-0, T1-4 分区;
C2-1 将消费 T1-8, T1-7 分区;

Two cases of RoundRobin

  1. If the subscription information of all consumers in the same consumer group is the same, then the partition distribution of the RoundRobinAssignor strategy will be even.

    For example, suppose there are two consumers C0 and C1 in the consumer group, both subscribe to topics t0 and t1, and each topic has 3 partitions, then all subscribed partitions can be identified as: t0p0, t0p1, t0p2, t1p0 , T1p1, t1p2. The final distribution result is:

    消费者C0:t0p0、t0p2、t1p1
    消费者C1:t0p1、t1p0、t1p2
    
  2. If the information subscribed by consumers in the same consumer group is not the same, then when performing partition allocation, it is not a complete polling allocation, which may result in uneven partition allocation. If a consumer does not subscribe to a topic in the consumer group, the consumer will not be able to allocate any partition of this topic when assigning partitions.

    For example, suppose there are 3 consumers C0, C1, and C2 in the consumer group, and they subscribe to 3 topics: t0, t1, t2. These 3 topics have 1, 2, and 3 partitions, that is, the entire consumer group is subscribed The 6 partitions, t0p0, t1p0, t1p1, t2p0, t2p1, and t2p2, are created. Specifically, consumer C0 subscribes to topic t0, consumer C1 subscribes to topics t0 and t1, and consumer C2 subscribes to topics t0, t1, and t2. Then the final distribution result is:

    消费者C0:t0p0
    消费者C1:t1p0
    消费者C2:t1p1、t2p0、t2p1、t2p2
    

It can be seen that the RoundRobinAssignor strategy is not perfect either. This allocation is actually not the optimal solution, because the partition t1p1 can be allocated to the consumer C1.

3.StickyAssignor#

Let's look at the StickyAssignor strategy again. The word "sticky" can be translated as "sticky". Kafka has introduced this allocation strategy since version 0.11.x. It has two main purposes:

  1. The partition allocation should be as even as possible, and the number of topic partitions allocated to consumers differs by at most one;
  2. As far as possible, the partition allocation remains the same as the last allocation.

When the two conflict, the first goal takes precedence over the second goal. In view of these two goals, the specific implementation of the StickyAssignor strategy is much more complicated than the two allocation strategies of RangeAssignor and RoundRobinAssignor. Let's take a look at the actual effect of the StickyAssignor strategy.

Suppose there are 3 consumers in the consumer group: C0, C1 and C2, they all subscribe to 4 topics: t0, t1, t2, t3, and each topic has 2 partitions, which means that the entire consumer group subscribes to t0p0 , T0p1, t1p0, t1p1, t2p0, t2p1, t3p0, t3p1, these 8 partitions. The final distribution result is as follows:

消费者C0:t0p0、t1p1、t3p0
消费者C1:t0p1、t2p0、t3p1
消费者C2:t1p0、t2p1

At first glance, this seems to be the same as the result assigned by the RoundRobinAssignor strategy, but is this really the case?

At this time, assuming that consumer C1 leaves the consumer group, the consumer group will perform a rebalancing operation, and the consumer partition will be re-allocated. If the RoundRobinAssignor strategy is adopted, the assignment result at this time is as follows:

消费者C0:t0p0、t1p0、t2p0、t3p0
消费者C2:t0p1、t1p1、t2p1、t3p1

As shown in the allocation result, the RoundRobinAssignor strategy will re-polling allocation according to consumers C0 and C2. And if the StickyAssignor strategy is used at this time, the assignment result is:

消费者C0:t0p0、t1p1、t3p0、t2p0
消费者C2:t1p0、t2p1、t0p1、t3p1

It can be seen that the distribution results retain all the distribution results for consumers C0 and C2 in the previous distribution , and the original "burden" of consumer C1 is allocated to the remaining two consumers C0 and C2, and finally C0 and C2 The distribution has also been balanced.

If partition redistribution occurs, it is possible for the same partition that the previous consumer and the newly assigned consumer are not the same, and half of the processing for the previous consumer has to be reproduced in the newly assigned consumer Once again, this is obviously a waste of system resources. The StickyAssignor strategy, like the "sticky" in its name, gives the allocation strategy a certain degree of "stickiness", and makes the two allocations the same as possible, thereby reducing the loss of system resources and the occurrence of other abnormal situations.

The analysis so far is that the consumer's subscription information is the same. Let's take a look at the processing when the subscription information is different .

For example, there are 3 consumers in the same consumer group: C0, C1, and C2, and there are 3 topics in the cluster: t0, t1, and t2. These 3 topics have 1, 2, and 3 partitions, which means that the cluster There are 6 partitions: t0p0, t1p0, t1p1, t2p0, t2p1, and t2p2. Consumer C0 subscribes to topic t0, consumer C1 subscribes to topics t0 and t1, and consumer C2 subscribes to topics t0, t1, and t2.

If the RoundRobinAssignor strategy is used at this time, then the final assignment result is as follows (the same as when talking about the RoundRobinAssignor strategy, so I may not repeat it):

消费者C0:t0p0
消费者C1:t1p0
消费者C2:t1p1、t2p0、t2p1、t2p2

If the StickyAssignor strategy is adopted at this time, the final assignment result is:

消费者C0:t0p0
消费者C1:t1p0、t1p1
消费者C2:t2p0、t2p1、t2p2

It can be seen that this is an optimal solution (consumer C0 does not subscribe to topics t1 and t2, so any partitions in topics t1 and t2 cannot be assigned to it. The same can be inferred for consumer C1).

If consumer C0 leaves the consumer group at this time, the assignment result of the RoundRobinAssignor strategy is:

消费者C1:t0p0、t1p1
消费者C2:t1p0、t2p0、t2p1、t2p2

You can see that the RoundRobinAssignor strategy retains the original three partition assignments in consumers C1 and C2: t2p0, t2p1, and t2p2 (for result set 1). And if the StickyAssignor strategy is adopted, the assignment result is:

消费者C1:t1p0、t1p1、t0p0
消费者C2:t2p0、t2p1、t2p2

It can be seen that the StickyAssignor strategy retains the original 5 partition assignments in consumers C1 and C2: t1p0, t1p1, t2p0, t2p1, t2p2.

From the results, the StickyAssignor strategy is more excellent than the other two allocation strategies, and the code implementation of this strategy is also extremely complicated.

4. Range Strategy Demo #

package com.cw.kafka.consumer;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

/**
 *
 * @author 陈小哥cw
 * @date 2020/6/19 17:07
 */
public class CustomOffsetConsumer {
    public static void main(String[] args) {
        Properties properties = new Properties();
        // kafka集群,broker-list
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "cm1:9092,cm2:9092,cm3:9092");

        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // 消费者组,只要group.id相同,就属于同一个消费者组
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
        // 关闭自动提交offset
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

        // 1.创建一个消费者
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        // 消费者订阅topic
        consumer.subscribe(Arrays.asList("first"), new ConsumerRebalanceListener() {
            // 重新分配完分区之前调用
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                System.out.println("==============回收的分区=============");
                for (TopicPartition partition : partitions) {
                    System.out.println("partition = " + partition);
                }
            }

            // 重新分配完分区后调用
            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                System.out.println("==============重新得到的分区==========");
                for (TopicPartition partition : partitions) {
                    System.out.println("partition = " + partition);
                }
            }
        });

        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            for (ConsumerRecord<String, String> record : records) {

                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
                TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
                commitOffset(topicPartition, record.offset() + 1);
            }
        }

    }

    private static void commitOffset(TopicPartition topicPartition, long l) {

    }

    private static Long getPartitionOffset(TopicPartition partition) {
        return null;
    }
}

Start the program once, and the result is

==============回收的分区=============
==============重新得到的分区==========
partition = first-2
partition = first-1
partition = first-0

At this time, without closing the opened program, start the program again

Results of the first run

==============回收的分区=============
partition = first-2
partition = first-1
partition = first-0
==============重新得到的分区==========
partition = first-2

Results of the second run

==============回收的分区=============
==============重新得到的分区==========
partition = first-1
partition = first-0

This is because the consumer group id of the program that is run twice is test, which is the same consumer group. When the program is run for the second time, the original partition is recycled, and the partition is rebalanced and redistributed (the default range is allocated) ).

Guess you like

Origin blog.csdn.net/qq_32907195/article/details/112793630