【译】Kafka Producer Sticky Partitioner

Recently many things a little lazy, still bring a translation: the Apache Kafka used to live with Producer Improvements at The Sticky Partitioner


The message flow in the system time is critical to the performance of Kafka. In particular regard to the Producer, Producer end delay (Latency) is usually defined as: send a message to Kafka Producer response interval between messages. As the saying goes, time is money. We always want to reduce the delay as much as possible, let the system run faster. If the message sent by the Producer speed faster, the whole system can benefit.

Each Kafka topic includes a plurality of partitions. When Producer send a message to a topic, it first needs to confirm this message to be sent to which partition. If a partition at the same time we send multiple messages to the same, then the Producer can use these messages in bulk packaging (batch) sent. There is a batch processing overhead. In this one, batch each message will contribute their "power." If the batch is small, then the batch of messages greater contribution to overhead. Generally speaking, the small end of the batch will cause Producer generates more requests, causing more severe effects queue backlog (queuing), thereby pushing up the overall latency.

When the batch size reaches a threshold value (batch.size) or time batch accumulated messages exceeds the batch is deemed linger.ms "finished" or "ready." Producer and linger.ms batch.size parameters are terminated. batch.size default is 16KB, while linger.ms default value is 0 ms. Once the value or crossed linger.ms batch.size time, the system will immediately send batch.

At first glance, set linger.ms = 0 is only likely to lead to batch contains only one message. But it is not actually the case. Even linger.ms = 0, Producer multiple messages will still be packaged into a batch, as long as they are produced in a very short period of time and are destined for the same partition. This is because the system will take some time to process each PRODUCE request, but once the system can not process all the messages, it will put them into a batch inside.

A factor in determining how to form a batch partitioning strategy (partitioning strategy). If multiple messages are not sent to the same partition, they can not be put into a batch in. Fortunately, Kafka allows the user to select the appropriate partition plan provided by Partitioner implementation class. Partitioner interface is responsible for allocating a partition for each message. The default policy is hashed Key messages for the destination partition, but many times the message is not specified Key (or Key to null). At this time, Apache Kafka default policy is recycled prior to 2.4 theme all partitions, the message is sent to each partition in a round-robin fashion. Unfortunately, this strategy is very poor packaging, in actual use will increase the delay.

Given the small batch could lead to increased delay before the message is very low for no Key partitioning strategy efficiency. Community in version 2.4 introduces a sticky partitioning strategy (Sticky Partitioning Strategy). The strategy is a new strategy, can significantly reduce the message to the specified partition during the delay.

Sticky partitioning strategy

Viscous partitioner (Sticky Partitioner) The main idea of ​​solution without Key message into a small batch dispersion problem is to select a single partition send all non-Key message. Once the batch is full or the partition in the "completed" state, sticky partition will randomly select another partition and will try to stick with the partition - the so-called stick to this partition. In this way, once we stretch the entire running time, the message was posted evenly to each partition, partition to avoid tilt, while the Producer also reduce delays, because the allocation process has always been to ensure the formation of a large batch, rather than small batch. Briefly, the process as shown below:

 

To achieve this partition is sticky, Kafka 2.4 version Partitioner interfaces added a method called onNewBatch of. This method will be invoked predecessors created in the new batch, which is sticky when you want to change the partition Producer (Sticky Partition) is. Producer default partition is DefaultPartitioner achieve this functionality.

Basic test: Producer end delay

Any performance improvement must be to quantify the effect. Kafka provides a testing framework called Trogdor's (Translator: official website almost no Trogdor description, I have time to write a blog presentation under) used to run a variety of basic tests, which contains test Producer end delay. I used a test suite called Castle to perform ProduceBench test. These tests are run on a cluster of 3 Broker, Producer 1 to 3 together with a program.

Most of the tests using the following parameters. You can modify the configuration Castle, replace the default task parameters. Some tests are provided slightly different, the latter will be explicitly mentioned.

Duration of test 12 minutes
Number of brokers 3
Number of producers 1–3
Replication factor 3
Topics 4
linger.ms 0
acks all
keyGenerator {"type":"null"}
useConfiguredPartitioner true
No flushing on throttle (skipFlush) true

In order to obtain the best test results, we have to set useConfiguredPartitioner and skipFlush to true. This will ensure that the Producer use DefaultPartitioner perform partitioning strategy, and whether to send the batch to batch.size and linger.ms parameters to decide. Of course, you certainly want to keyGenerator set to null, so no news Key.

In comparison with the previous unmodified DefaultPartitioner weight, most tests show it delays were lower than the previous version. Particularly when comparing the 99th percentile value that a delay, sends a message to Producer 3 relating to the partition 16 at a rate of 1000 messages per second, using the new test results show that the viscosity of the partitioner delay before Producer of the half. Test results are as follows:

As the number of partitions, the delay effect is more pronounced decline. This is consistent with our previous speculation: the small number of large batch structure than the structure of a good number of small batch performance. In our tests, the number of partitions to 16 This difference has been obvious.

The next test to maintain Producer-side load unchanged but increases the number of partitions. The following figure shows the 16, 64 and 128 partitions test results, the results show that the delay partitioning strategy before as the number of partitions to deteriorate quickly. Even the partitions 16, the average value of the 99 percentile delay is still 1.5 times the viscosity of the partitioner.

Linger.ms Key Performance tests and various messages

As described above, waiting linger.ms artificially increase the delay. Viscous's goal is to prevent partition of this delay, the specific approach is to put all the messages are sent to a batch, the faster filling batch. The partitioner used together with a viscous linger.ms value greater than 0 can be significantly reduced latency in a relatively low throughput environment. Our Producer test launched a program to send 1000 messages per second, while linger.ms = 1000, the result of 99 percentile values ​​of partitioning strategy of delay before show a fivefold increase. The figure below shows the results of tests ProduceBench:

 If the transmission is not Producer Key message, then the partitioner viscosity can effectively improve client performance. If the Producer at the same time send a message and there is no Key Key messages, the performance is how it? We ran a test, randomly generated Key to the message at the same time mixed with some non Key messages, the results showed no significant difference in the delay of this indicator.

In this test scenario, I checked the mixed messages have the Key and Key free. This pack looks better, but because the Key value ignores the sticky partitioner, so income is not obvious. The median value of the figure shows the results of three runs 99 percentile delay value. From the test results, there was no significant change in delay, therefore, the median is possible to effectively characterize a typical test run results.

The second test scenario Key randomly produced at high throughput. As the realization of partition requires a little sticky code changes, so we have to ensure that the new increase will not affect the logic delay. In view of this and there is no need to pack or viscous behavior appears almost looks delay and before the result is normal. Key median result of the random tests as shown below.

Finally, I think I tested a most unsuitable for use sticky partition's scene - Sequential Key + super multi-partition scenario. Because the new logic occurs mainly in the new batch is created, and this scene is almost for each message creates a batch, so we have to check this delay will not push. The results show no significant difference in the delay, as shown below:

CPU usage during the test

During testing, we noticed that the partition is capable of reducing the stickiness measure CPU usage.

For example, when three Producer to send a message to the partition 16 at a rate of 10,000 messages per second, we observed a significant decrease CPU usage. The two figures represents the CPU usage for each row of nodes. Each node runs both programs run Producer Broker process. All nodes lines overlap. Increase throughput and decrease the number of partitions of test scenarios, we also observed a decline in the CPU.

to sum up

The main objective of viscous partitioner is to increase the number of messages per batch in order to reduce the total number of the batch, thereby eliminating unnecessary queueing time. Once the batch in the message increases, the number of batch becomes small, cost per message is reduced. Able to send messages faster using sticky partitioning strategy. Test data have demonstrated that this strategy can indeed reduce the delay is no Key message, and when the number of partitions enhance the effect more pronounced. In addition, the use of sticky partitioning strategy usually also reduces CPU usage. By Producer "stuck" a partition and send fewer large batch manner, Producer possible to effectively improve the transmission performance.

Meanwhile, the cool thing is: this sticky partition by default integrated in version 2.4 without additional configuration, out of the box!

 

Guess you like

Origin www.cnblogs.com/huxi2b/p/12540092.html