Problems caused by kafka partition

How to choose the right number of partitions for Kafka cluster

Under normal circumstances, the more partitions in the Kafka cluster, the higher the throughput. However, we must be aware that the total number of partitions in the cluster is too large or there are too many partitions for a single broker node, which will have a potential impact on the availability of the system and message delay.
 
We can roughly calculate the number of partitions in the Kafka cluster through throughput. Assuming that for a single partition, the achievable throughput on the producer side is p, the achievable throughput on the consumer side is c, and the desired target throughput is t, then the number of partitions required by the cluster is at least max(t/p,t/c) . On the producer side, the throughput of a single partition will be affected by configuration parameters such as batch size, data compression method, confirmation type (synchronous/asynchronous), and replication factor. After testing, on the producer side, the throughput of a single partition is usually around 10MB/s. On the consumer side, the throughput of a single partition depends on the application logic processing speed of each message on the consumer side. Therefore, we need to measure the throughput on the consumer side.

Guess you like

Origin blog.csdn.net/yangshengwei230612/article/details/114626337