Kafka consumer group static member (static consumer member)

Consumer parameter Kafka 2.3 after the release of the official website adds a new parameter: group.instance.id. The following is an explanation of this parameter:

A unique identifier of the consumer instance provided by end user. Only non-empty strings are permitted. If set, the consumer is treated as a static member, which means that only one instance with this ID is allowed in the consumer group at any time. This can be used in combination with a larger session timeout to avoid group rebalances caused by transient unavailability (e.g. process restarts). If not set, the consumer will join the group as a dynamic member, which is the traditional behavior.

Roughly meaning: it is a user-specified consumer member ID. Under each of these consumer group ID must be unique. Once you have set the ID, the consumer will be seen as a static member (Static Member). Static members together with the larger session timeout setting to avoid Rebalance temporarily unavailable due to members (rebooting for example) triggered. Thus, the consumer group static member is a concept newly introduced version 2.3, mainly to avoid unnecessary Rebalance.

 Rebalance Recap

Before we Kafka consumer group discussed Rebalance mechanism in an article. Its main role is assigned to all members of partitions for the consumer group. Client-side and end Broker need to participate to Rebalance process. In Broker end, Coordinator component responsible for member management, joinGroup request such treatment group member transmitted SyncGroup request, Heartbeat request and LeaveGroup request; the Client side, Leader Consumer member receives a member subscription information Coordinator transmitted, and according to a certain policy ( Range / Round-Robin / Sticky / custom) to develop distribution plan.

There are three conditions Rebalance occur:

  • Changes in the number of members, i.e., addition of new members or an existing member from the group (from the group including active and passive collapse from Group)
  • Subscribe to a change in the number of topics
  • Subscribe to the number of partitions theme changes

In fact, the latter two conditions can be combined into one, namely Rebalance only two trigger conditions: 1 change in the number of members; 2 subscription information is changed...

Rebalance process in that article also we talked about: First, each member of the group sends JoinGroup request, Coordinator will wait for a period of time they are added - this time max.poll.interval.ms maximum value to all members of the decision (in Kafka Connect is in the exclusive rebalance.timeout.ms parameter to specify). After each member sends a request to wait Coordinator send SyncGroup distribution plan, then begin normal consumption. While consumption of each consumer will be the regular (heartbeat.interval.ms) reported heartbeat, Coordinator told the assembly it was still alive.

 Issues for Rebalance

In the actual scene, Rebalance because the members from the group incurred should be regarded as the most, but Rebalance in some scenes is very unreasonable. For example, our company had this pain point: Consumer processing logic changes, the need to update the code on line again, this time will lead to Rebalance, but in fact to restart the Consumer may only need a few minutes, that my spending just a few interrupted minutes on it, Kafka absolutely no need for this to trigger a Rebalance, but no need to re-allocate partitions to maintain the previous distribution plan is sufficient. Although Sticky allocation community to a certain extent, can alleviate this problem, but Stop The World (STW) decided to feature Rebalance of the production environment Rebalance better.

 Static Member

In the current Rebalance design, each instance will be under consumer group is assigned a member of the Coordinator ID, a member.id. Many Kafka users have had this question: Can I manually set this member.id it? Unfortunately, this memberID Kafka is automatically generated, static member before being introduced, the rule is client.id-UUID, client.id here is the value of the parameter client.id Consumer end, and this ID will be with each round Rebalance changes. In other words, Coordinator unable to save persistent member.id a consumer instance. I thought it might be time constraints Rebalance all members must be forced to rejoin the part of the reason, because each member is Coordinator can not remember who. If you look at the source code, you can be found in each JoinGroup sent back restart Client, which encapsulates a UNKNOWN_MEMBER_ID empty string, there is no meaningful information to the terminal to the Broker. After receiving the Coordinator can only regard it as a new member. Conversely, if member.id can be remembered, the Coordinator can not tolerate it briefly turned off Rebalance, thus shortening the time window of the consumer group as a whole is not available.

To this end, the community at 2.3 and version 2.4 introduces the concept of static members (Static Member) and a new Consumer end parameters: group.instance.id. Once the configuration parameters, it will automatically become members of a static member, otherwise the same as before and still be regarded as a dynamic member. You may think that this is a new parameter to be persistent new member.id. It still can not be specified by the user, to build the rule is `group.instsance.id`-UUID. And member.id difference is that, after each member of the restart back, its static member ID value is the same, so all partitions before the member is assigned to the same, and in no time-out before the restart to come back is not a static member Rebalance the trigger.

 Static member Rebalance conditions

 Obviously, the difficulty of static members trigger Rebalance to be less than the dynamic members. If you use a static member, now changed to Rebalance trigger conditions:

  • New members join the group: This condition remains unchanged. When new members will certainly trigger Rebalance reallocation of partitions
  • Leader members to rejoin the group: for example, the topic distribution scheme changed
  • Existing members from the group longer than the session timeout: even if it is a static member, Coordinator it will not wait indefinitely. Once over session timeout will still trigger Rebalance
  • Coordinator request received LeaveGroup: active notification Coordinator permanent members from the group. After all, Kafka still have to provide a way for members to permanently withdraw from the group, then restart Rebalance still necessary

 Change Request Protocol

 To support group.instance.id, protocol format associated with consumer groups also do the corresponding changes. I looked Xiaguan network, JoinGroup, SyncGroup, LeaveGroup and OffsetCommit requested protocol format have made the appropriate changes. For example JoinGroup request Request and Response formats have increased group-instance-id field as follows:

JoinGroup Request (Version: 5) => group_id session_timeout_ms rebalance_timeout_ms member_id group_instance_id protocol_type [protocols]
  group_id => STRING
  session_timeout_ms => INT32
  rebalance_timeout_ms => INT32
  member_id => STRING
  group_instance_id => NULLABLE_STRING
  protocol_type => STRING
  protocols => name metadata
  name => STRING
  metadata => BYTES

JoinGroup Response (Version: 5) => throttle_time_ms error_code generation_id protocol_name leader member_id [members]
  throttle_time_ms => INT32
  error_code => INT16
  generation_id => INT32
  protocol_name => STRING
  leader => STRING
  member_id => STRING
  members => member_id group_instance_id metadata
  member_id => STRING
  group_instance_id => NULLABLE_STRING
  metadata => BYTES

 Other changes requested format is similar, there is not posted.

 Other Changes

Given the current static members or unavailable temporary restart does not trigger changes Rebalance, the largest community on the consumer group session expiration time also made changes. Consumer end before parameters group.min.session.timeout.ms value is 6 seconds - in order to restart completed within this time an application is usually very difficult, so the community is now replaced by the value of the default value of 30 minutes. That is, as long as the members of the Consumer configured with a static program code updates and restarted within 30 minutes to complete, Consumer Group Rebalance will not happen. Of course, during this time, the progress will be interrupted Consumer spending, but of partition structure will not change.

 to sum up

At present, some static member functions have been integrated into Kafka 2.3 version, there are still some features under development, will advance to the next version 2.4. From the current design point of view, static member mechanisms can help us to avoid a lot of this online environment unnecessarily Rebalance, it should be said is a very exciting new features. Meanwhile, the community of Stop The World Rebalance for brewing a major correction, the so-called incremental collaborative Rebalance (Incremental Cooperative Rebalance). General idea is to allow a single instance of using self consumer incremental or progressive manner Rebalance, avoid global STW. Related code under development, follow-up and I will bring features introduced in this regard.

Guess you like

Origin www.cnblogs.com/huxi2b/p/11386847.html