3 partition strategies when Kafka generates messages

Abstract: When KafkaProducer sends a message, it needs to specify which partition to send to, so what are the partition strategies?

This article is shared from Huawei Cloud Community " Kafka Producer 3 Partition Allocation Strategy ", author: Shi Zhenzhen's grocery store.

When KafkaProducer sends a message, it needs to specify which partition to send to, so what are the partition strategies? Let's take a look today

Configuration using partition strategy:

1. DefaultPartitioner default partition strategy

Full path class name: org.apache.kafka.clients.producer.internals.DefaultPartitioner

  • If a partition is specified in the message, it is used
  • If no partition is specified but the key exists, the number of partitions is modulo the number of partitions using the murmur2 hash algorithm based on the serialized key.
  • If no partition or key exists, sticky partitioning strategy is used , see KIP-480 for sticky partitioning.

Sticky Partitioner

Why is there the concept of sticky partitions?

First, we specify that when the Producer sends a message, it will put the message into a ProducerBatch, which may contain multiple messages, and then package the Batch for sending. For this, see my previous article illustrating the Kafka Producer message caching model.

The advantage of this is that it can improve throughput and reduce the number of requests.

But there is a problem, because the message is sent, it must be sent when one of your Batch is full or the linger.ms time is up. If there are fewer messages produced, it will be difficult to fill the Batch, which means higher latency.

In the previous message sending, the message was polled to each partition. There are few messages. If you give all partitions traversal allocation, it is difficult for each ProducerBatch to meet the conditions.

So if I fill up one ProducerBatch first, can I allocate other partitions to reduce this delay?

For details, please see the picture below

The premise of this picture is:

Topic1 has 3 partitions. At this time, 9 messages without key are sent to Topic1, and the sum of these 9 messages does not exceed batch.size.

Then the previous allocation method and the allocation method of sticky partition are as follows

It can be seen that after using sticky partitions , at least one batch is filled up to send and then another batch is filled. It will not be as before. Although it is evenly distributed, a Batch is not full and cannot be sent immediately. Doesn't this increase the delay (it can only be sent when the linger.ms time is up)

Focus on:

  1. After a Batch is sent, a new sticky partition needs to be selected
    ①. The available partition is less than 1; then the logic of selecting partitions is to randomly select among all partitions .
    ②. Available partition=1; Then select this partition directly.
    ③. Available partition > 1 ; then randomly select among all available partitions .
  2. When the next sticky partition is selected, it is not allocated according to the principle of partition average. But the principle of randomness (of course not the same as the last partition)

For example, the batch just sent is partition 1. When the batch is full, new messages may be sent to 2 or 3 after sending. If the selected batch is 2, after the batch of 2 is full, the next selected batch will still be Probably 1, instead of saying that for averaging, choose 3 partitions.

2. UniformStickyPartitioner pure sticky partition strategy

Full path class name: org.apache.kafka.clients.producer.internals.UniformStickyPartitioner

The only difference between him and the DefaultPartitioner  partition strategy is.

If DefaultPartitionerd has a key, then it determines the partition according to the key. At this time, the sticky partition will not be used.
UniformStickyPartitioner will use the sticky partition to allocate it regardless of whether you have a key or not.

3. RoundRobinPartitioner partition strategy

Full path class name: org.apache.kafka.clients.producer.internals.RoundRobinPartitioner

  • If a partition is specified in the message, it is used
  • Distribute messages evenly to each partition.
  • has nothing to do with the key
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        int nextValue = nextValue(topic);
        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
        if (!availablePartitions.isEmpty()) {
            int part = Utils.toPositive(nextValue) % availablePartitions.size();
            return availablePartitions.get(part).partition();
        } else {
            // no partitions are available, give a non-available partition
            return Utils.toPositive(nextValue) % numPartitions;
        }
    }

The above is the specific code. There is one place to pay attention;

  1. When the available partition is 0, then all partitions are traversed.
  2. When there are available partitions, all available partitions are traversed.

 

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/5517563