The more the number of partitions Kafka topic throughput higher? BULLSHIT !!!

Kafka partition is the smallest unit of a parallel operation, for the producer, for each partition written data is completely parallelized; for consumers, only a single partition Kafka message is a consumer thread consumed, a consumer group consumption degree of parallelism depends on the number of partitions entirely consumed. It seems that the more the number of partitions if a topic, in theory, can achieve greater throughput, it is true, as expected, like the fact that it?

We may wish to use kafka-producer-perf-test.sh scripts and script kafka-consumer-perf-test.sh both performance testing tool to actually test it. First we create the number of partitions theme 1,20,50,100,200,500,1000 respectively, the corresponding theme names are topic-1, topic-20, topic-50, topic-100, topic-200, topic -500, topic-1000, a copy of all subjects factor are set to 1.

Performance message middleware generally refers throughput. Effects aside hardware resources, the throughput of the message will be written to the size of the message, the message compression, message throughput parameter affect the transmission mode (synchronous / asynchronous), the type of message acknowledgment (ACKs), a copy of the factor, the message consumer amount is also affected by the application logic processing speed. The case will not consider the impact of these factors, all tests except for the number of partitions theme of the rest of the factors remain the same.

The test environment used in the case of a 3-node consists of three ordinary cloud host cluster composed of Kafka, every cloud host memory 8G, disk 40GB, 4-core CPU clocked at 2600MHz. JVM version 1.8.0_112, Linux system version 2.6.32-504.23.4.el6.x86_64.

Were sending one million messages to the body size of these topics using kafka-producer-perf-test.sh script for 1KB of the message, the corresponding test command as follows:

bin/kafka-producer-perf-test.sh --topic topic-xxx 
--num-records 1000000 --record-size 1024 
--throughput 100000000 --producer-props 
bootstrap.servers=localhost:9092 acks=1

The corresponding test results are shown in FIG. For a different hardware environment and even different batches of test results obtained will not be exactly the same, but the overall trend is still and will remain the same figure.

Here Insert Picture Description

In the figure above, we can see that the minimum number of partitions is 1:00 throughput, with the growth of the number of partitions, the corresponding throughput also followed up. Once the number of partitions exceeds a certain threshold value the overall throughput is falling instead of rising, that the more the number of partitions is not greater throughput. Here's the critical threshold number of partitions for different test environments will show different results, can be found in the practical application of a reasonable threshold interval through a similar test cases.

The above is a test message for the producers, for news consumers also have to consider terms of throughput. Scripts are using kafka-consumer-perf-test.sh consumption 1 million messages of these topics, the corresponding test command as follows:

bin/kafka-consumer-perf-test.sh --topic topic-xxx 
--messages 1000000 --broker-list localhost:9092

Results consumer performance testing as shown below. The same performance tests producers, for different test environments or different test batches of the obtained results are not the same, but the overall trend is still and will remain the same as in FIG.

Here Insert Picture Description

In the figure above, beginning with the corresponding throughput will increase the number of partitions how to grow. Once the number of partitions exceeds a certain threshold value the overall throughput is also falling instead of rising, also illustrate the greater the number of partitions does not make throughput has been growing.

The higher the greater the number of partitions throughput? A lot of information endorse this view, but many things will actually have a critical value, after exceeding this threshold, in line with many of the original logic of the established trend will become different. Readers need to have a clear understanding of this, know how to pseudo-truth, and field tests to verify regarded as a bridge leading to true knowledge.


We welcome the support of new work: "In-depth understanding of Kafka: the core design principles and practice" and "RabbitMQ practical guide", while welcoming the attention of the author micro-channel public number: Zhu servant of the blog.

Guess you like

Origin blog.csdn.net/u013256816/article/details/93772537
Recommended