Detailed Kafka (viii) consumer group re-balance the whole process

Re-balancing process consumer groups, its role is to give all consumers instances in which topics partition group agreed on consumption. Rebalance need the help of Kafka Broker end Coordinator components to complete the entire partition consumer group with the help of Coordinator of redistribution. Today we'll talk about this process in detail.

1. Trigger Conditions

Consumer group re-balance three conditions triggered:

  1. Changes in the number of group members
  2. Subscribe to a change in the number of topics
  3. Subscribe to the number of partitions theme changes

In the actual production environment, due to conditions caused by heavy balance 1 is the most common (consumer instances consumer group in turn starts also belong to the first case, each time the consumer group starts, will trigger heavy balance).

2. The notification method

Kafka consumers need to send heartbeat requests (Heartbeat Request) on a regular basis to the end of the Broker coordinator. 0.10.1.0 version from the beginning, Kafka introduces the thread (Heartbeat Thread) sent specifically heartbeat request. It re-balance notification mechanism to complete the heartbeat thread. When the coordinator decides to open a new one balance weight, will "REBALANCE_IN_PROCESS" encapsulated in response to the heartbeat request, be returned to the consumer instance.

Thus heartbeat.interval.ms parameter set in addition to the heartbeat time interval also determines the frequency of notification of the balance weight. If you want consumers to quickly get notifications, you can set this to a very small value, so that consumers can receive faster weight balance response message.

3. Consumer group state machine

Kafka consumer group designed a state machine (State Machine), a total of five states: Empty, Dead, PreparingRebalance, CompletingRebalance and Stable.

After understanding the meaning of these states, we look at a picture, which shows each state of the state machine flow.

The chart above shows the state of the consumer group to start the transfer process. A consumer group is the beginning Empty state, when the re-balancing process is on, it will be placed PreparingRebalance state members to wait, wait state after the change to CompletingRebalance distribution plan, completed the final transfer to the state Stable weight balance.

4. Process consumer side counterbalancing

In the consumer side, the balance weight is divided into two steps: the group are added and the waiting consumer leader (Leader Consumer) allocation scheme. These two steps correspond to specific types of requests: joinGroup request and SyncGroup request .

4.1 JoinGroup request

When the group joining the group, it sends a request to the coordinator JoinGroup. In this request, each member must report their subscribed threads, so the coordinator will be able to subscribe to gather information of all members. Once collected JoinGroup request all members will choose a coordinator as the leader of this group of consumers from these members.

Typically, the first members to send JoinGroup request automatically become leader. To distinguish between the leader and the leader a copy here before us, which is not a concept. Here is a leader in consumer specific examples, it is neither a copy nor a coordinator. Leaders consumers task is to collect subscription information for all members, and then based on that information, the development of specific zoning consumption allocation scheme.

After the elected leader, coordinator of the group of consumers will subscribe to respond to the body of information packed into JoinGroup request, and then sent to the leader, the leader in unified made by the distribution plan, go to the next step: Send SyncGroup request. As shown below:

4.2 SyncGroup request

在这一步中,领导者向协调者发送 SyncGroup 请求,将刚刚做出的分配方案发给协调者。值得注意的是,其他成员也会向协调者发送 SyncGroup 请求,只不过请求体中并没有实际的内容。这一步的主要目的是让协调者接收分配方案,然后统一以 SyncGroup 响应的方式分发给所有成员,这样组内所有成员就都知道自己该消费哪些分区了。

SyncGroup 请求的主要目的,就是让协调者把领导者制定的分配方案下发给各个组内成员。当所有成员都成功接收到分配方案后,消费者组进入到 Stable 状态,即开始正常的消费工作。

如下图所示:

5. Broker端(协调者Coordinator)重平衡流程

要分析协调者处理重平衡的全流程,需要分以下几个场景来讨论。

5.1 新成员加入场景

新成员入组是指组处于 Stable 状态后,有新成员加入。

当协调者收到新的 JoinGroup 请求后,它会通过心跳请求响应的方式通知组内现有的所有成员,强制它们开启新一轮的重平衡。具体的过程和之前的客户端重平衡流程是一样的。如下图所示:

5.2 组成员主动离组场景

主动离组,就是指消费者实例所在线程或进程调用 close() 方法主动通知协调者它要退出。这个场景就涉及到了第三类请求:LeaveGroup 请求。协调者收到 LeaveGroup 请求后,依然会以心跳响应的方式通知其他成员。如下图所示:

5.3 组成员崩溃离组场景

崩溃离组是指消费者实例出现严重故障,突然宕机导致的离组。它和主动离组是有区别的,因为后者是主动发起的离组,协调者能马上感知并处理。但崩溃离组是被动的,协调者通常需要等待一段时间才能感知到,这段时间一般是由消费者端参数 session.timeout.ms 控制的。也就是说,Kafka 一般不会超过 session.timeout.ms 就能感知到这个崩溃。当然,后面处理崩溃离组的流程与之前是一样的。如下图所示:

5.4 重平衡时协调者对组内成员提交位移的处理

正常情况下,每个组内成员都会定期汇报位移给协调者。当重平衡开启时,协调者会给予成员一段缓冲时间,要求每个成员必须在这段时间内快速地上报自己的位移信息,然后再开启正常的 JoinGroup/SyncGroup 请求发送。

 

 

发布了8 篇原创文章 · 获赞 0 · 访问量 7277

Guess you like

Origin blog.csdn.net/fedorafrog/article/details/104099674