[Kafka from entry to abandonment series 5] Kafka architecture in-depth-consumer strategy

In the previous blog [Kafka From Getting Started to Abandoning Series Four] Kafka Architecture Deep-Producer Strategy , a detailed analysis of more complex producer strategies, this blog will talk about relatively simple consumer strategies.

Consumption patterns

Messages can be delivered in two ways, one is that the broker pushes it to the consumer, the other is that the consumer pulls it from the broker. Each of these two methods has advantages and disadvantages:

  • The push mode is difficult to adapt to consumers with different consumption rates, because the message sending rate is determined by the broker. Its goal is to deliver messages as quickly as possible, but this can easily cause consumers to be too late to process messages. Typical manifestations are denial of service and network congestion.
  • The pull mode can consume messages at an appropriate rate according to the consumer's consumption capacity. The disadvantage of the pull mode is that if Kafka has no data, consumers may fall into a loop and return empty data all the time

For Kafka, the pull mode is more suitable. It can simplify the design of the broker. The consumer can independently control the rate of consuming messages, and the consumer can control the consumption by itself:

  • Control the way of consumption-it can be consumed in batches or item by item , and different submission methods can be selected to achieve different transmission semantics
  • Timeout return mechanism-Kafka consumers will pass in a duration parameter timeout when consuming data. If there is currently no data available for consumption, the consumer will wait for a period of time before returning. This period of time is timeout

Kafka's consumer demands can be met through pull and certain strategies. requires attention:

  • If the number of consuming threads is greater than the number of patitions, some threads will not receive the message;
  • If the number of patitions is greater than the number of consuming threads, some threads will receive multiple patition messages;
  • If a thread consumes multiple patitions, the order of the messages you receive cannot be guaranteed, and the messages within a patition are ordered.

These three points need to be noted, the relationship between message consumption and the number of partitions.

Partition allocation strategy

There are multiple consumers in a consumer group, and there are multiple partitions in a topic, so the allocation of partitions is bound to be involved, that is, to determine which consumer consumes that partition. Kafka has three allocation strategies: RoundRobin, Range, Sticky . Regardless of the strategy, when the number of consumers in the consumer group changes [increase or decrease] or the subscription topic partition increases, it will trigger reallocation

Rang strategy

The Range allocation strategy is for each topic. First , the partitions in the same topic are sorted by serial number, and the consumer threads are sorted in alphabetical order . Then divide the number of partitions by the number of consumer threads to determine how many partitions each consumer thread consumes. If you can't divide, the first few consumer threads will consume one more partition. Of course, the disadvantage of this is that the distribution to each consumer in each group is uneven . Examples are as follows:

Insert picture description here
In this way, ConsumerA will bear more and more pressure.

RoudRobin strategy

The RoudRobin strategy is also the polling strategy. The principle of the RoundRobin strategy is to sort the partitions of all consumers in the consumer group and all topics subscribed by the consumers in lexicographic order , and then assign the partitions to each consumer one by one through the polling algorithm. By:

  • If in the same consumer group, all consumers subscribe to the same message, then the partition distribution of the RoundRobin strategy will be even.
  • If within the same consumer group, the subscribed messages are not the same, then when the partition allocation is performed, it is not a complete polling allocation, which may cause uneven partition allocation. If a consumer does not subscribe to a topic in the consumer group, then when the partition is allocated, the consumer will not be allocated to any partition of this topic.

The advantage of this is that the distribution is more balanced .
Insert picture description here

Of course, the premise is that each consumer in the same consumer group must subscribe to the same topic

Sticky strategy

This partitioning strategy was introduced from version 0.11, and it has two main purposes

  • The partition distribution should be as even as possible
  • The partition allocation should be as same as the last allocation

Example analysis: For example, there are 3 consumers (C0, C1, C2), all subscribe to 2 topics (T0 and T1) and each topic has 3 partitions (p0, p1, p2), then the subscribed All partitions can be identified as T0p0, T0p1, T0p2, T1p0, T1p1, T1p2. At this time, after using the Sticky allocation strategy, the result of the partition allocation is the same as that of RoudRobin:
Insert picture description here
but if it is assumed that C2 fails and exits the consumer group, then the partition needs to be rebalanced. If the RoundRobin allocation strategy is used, it will follow the consumption C0 and C1 perform re-polling allocation, and the results after rebalancing are as follows:
Insert picture description here

But if the Sticky allocation strategy is used, the result after rebalancing will look like this:

Insert picture description here
Although the redistribution was triggered, the results of the last allocation of C0 and C1 were memorized. The advantage of this is that after partition redistribution occurs, it is possible for the same partition that the previous consumer and the newly assigned consumer are not the same, and half of the processing for the previous consumer will also be in the newly assigned consumer Process it again, then system resources will be wasted . Using the Sticky strategy can make the allocation strategy have a certain "stickiness", and make the two allocations the same as possible, which can reduce the loss of system resources and other abnormal situations.

Maintenance of offset

In reality, when consumers consume data, there may be various failures that can cause downtime. At this time, if the consumer recovers later, it needs to continue to consume from the location before the failure , rather than from the beginning . Start to consume. Therefore, consumers need to record which offset they have consumed in real time, so that they can continue to consume after subsequent failure recovery. Before Kafka 0.9 version, the consumer saves the offset in Zookeeper by default. Starting from version 0.9, the consumer saves the offset in a built-in Kafka topic by default. The topic is __consumer_offsets: in the
Insert picture description here
same group, when dynamically expanding partition allocation The newly entered consumer then consumes the partition message instead of re-consuming. The offset is divided according to: goup+topic+partion , so as to ensure that the machines in the group can continue to consume when there is a problem

Consumer group test

Modify the machine configuration file so that machine 102 and machine 103 are in a group:
Insert picture description here
then start the producer at 101 to send messages, and receive messages at 102 and 103
Insert picture description here
at the same time : found that only machine 102 received the message at the same time:
Insert picture description here
machine 103 did not receive it Any message:
Insert picture description here
Kafka's consumption criteria are verified: the same partition in the same group can only be consumed by one consumer. It is understandable that if multiple consumers in a group consume the same partition, how does the consumer group guarantee a single partition message What about the order?

Guess you like

Origin blog.csdn.net/sinat_33087001/article/details/108398002