[Kafka] "Kafka Definitive Guide" - partition partition

In the example in the article ( [Kafka] "Kafka Definitive Guide" - write data ), ProducerRecord object contains the target themes, keys and values. Kafka's message is a key-value pairs, ProducerRecord object can contain only target themes and values, the key can be set as the default null, but most applications will use keys. There are two key purposes: Additional information can be used as a message, the message can also be used to determine which partition is written the theme . Cho has the same key interest rates will be written to the same partition. That is, if a process reads data from only one topic of partition (Chapter 4 will introduce more details), then all records with the same key will be read by the process. To create a record that contains a key, just like this create ProducerRecord objects:

If the key is null, the well and using the default partitioner, the record will be sent to the respective partitions within the subject matter available random. The partitioner polling (Round Robin) algorithm message equally distributed among the respective partition.

If the key is not empty, and used the default partition is then Kafka key will be hashed (using Kafka own hash algorithm, even if the upgrade version of Java, the hash value does not change), then the hash the value mapping message to a particular partition. The key point here is that the same key is always mapped to the same partition, so during the mapping, we will use the theme of all the district, not just the available partitions. This also means that, if the write data partition is unavailable, an error will occur. But this rarely happens. We will discuss Kafka's copy functionality and usability in Chapter 6.

Only in the case does not change the number of partitions of the theme, the mapping between a key and can partition remains unchanged. For example, in the case where the number of partitions remain the same, can ensure that the user record is always written to 045189 34 the partition. Partition data read from the elbow, can be variously optimized. However, once the theme has added a new partition, which can not guarantee - the old data still remain in the partition 34, but the new record may be written on a different partition. If you want to use the key to map the partition, it is best to create a theme of time to put a good zoning, and never add a new partition.

Implement a custom partitioning strategy

We have already discussed the default partition's features, it is the most used is the number of partitions. However, in addition to hash partitioning, sometimes data needs to be different partitions. Suppose you are a B2B supplier, you have a big customer, it is the manufacturer of handheld devices Banana. Banana accounted for 10% of your overall business share. If you use the default hash partitioning considered cowardly, account records Banana will be assigned accounts and other records together to the same partition, leading to the partition larger than some of the other partitions. The server may therefore be insufficient storage space, slow processing and other issues. We need to allocate separate partition for Banana, and then use the hash partitioning counted live on other accounts.

The following is an example of a custom partitioner:

 

Guess you like

Origin www.cnblogs.com/weknow619/p/10944438.html