What is the Controller Broker of Kafka

Controller is the core component of Apache Kafka. Its main function is to manage and coordinate the entire Kafka cluster with the help of Apache ZooKeeper . Any Broker in the cluster can act as a controller. However, in the running process, only one Broker can become a controller and perform its management and coordination responsibilities. Next, we will discuss the Controller principle and internal operating mechanism. Through this article, you can learn:

  • What is Controller Broker
  • How was the Controller Broker elected
  • What is the main function of Controller Broker
  • How Kafka handles split brain
    Insert picture description here

What is Controller Broker

In a distributed system, there is usually a coordinator who will play a special role when an exception occurs in the distributed system. In Kafka, the coordinator is called the controller (Controller). In fact, the controller has nothing special. It is also an ordinary Broker, but it needs to be responsible for some extra work (tracking other Brokers in the cluster, And when appropriate, deal with newly added and failed Broker nodes, Rebalance partitions, assign new leader partitions, etc.). It is worth noting that there is always only one Controller Broker in the Kafka cluster.

How the Controller Broker was selected

The previous section explained what Controller Broker is, and each Broker has the possibility to act as a controller. So, how is the controller selected? When the cluster is started, how does Kafka confirm which Broker the controller is located in?

In fact, when the Broker starts, it will try to create a /controller node in ZooKeeper. Kafka's current rules for electing controllers are: the first broker that successfully creates a /controller node will be designated as the controller .

What is the specific role of Controller Broker

Controller Broker has many main responsibilities, mainly some management actions, mainly including the following aspects:

  • Create and delete topics, add partitions and assign leader partitions
  • Cluster Broker management (new Broker, Broker active shutdown, Broker failure)
  • preferred leader election
  • Partition redistribution

Deal with Brokers offline in the cluster

When a Broker node leaves the Kafka cluster due to a failure, the leader partition that exists in the Broker will be unavailable (because the client only performs read and write operations on the leader partition). In order to minimize downtime, it is necessary to quickly find a replacement leader partition.

Controller Broker can respond to failed Brokers, Controller Broker can get notification information from Zookeeper watch (zookeeper watch), ZooKeeper gives the client the ability to monitor znode changes, the so-called Watch notification function. Once a znode node is created or deleted, the number of child nodes changes, or the data stored in the znode itself changes, ZooKeeper will explicitly notify the client through the node change listener (ChangeHandler).

After each Broker is started, a temporary znode will be created under /Brokers/ids of zookeeper. When the Broker goes down or shuts down actively, the Broker and ZooKeeper session ends, and the znode will be deleted automatically. In the same way, ZooKeeper's Watch mechanism pushes this change to the controller, so that the controller can know that the Broker is down or down, so as to perform subsequent coordination operations.

The Controller will receive the notification and take action to determine which partitions on the Broker will become the leader partition. Then, it will notify each related Broker to either turn the topic partition on the Broker into the leader, or change LeaderAndIsrfrom a new leader partition by request Copy data in.

Process the newly added brokers to the cluster

By evenly distributing the leader partition copies on different Brokers in the cluster, the load balance of the cluster can be guaranteed. When the Broker fails, some partition copies on the Broker will be elected as the leader, which will cause multiple leader partition copies on a Broker. Since the client only interacts with the leader partition copy, this will add extra to the Broker Burden, and damage the performance and health of the cluster. Therefore, restoring the balance as soon as possible is beneficial to the healthy operation of the cluster.

Kafka believes that the initial allocation of replicas of the leader partition (each node is active) is balanced. These copies of the initially selected partition are the so-called preferred leaders . Because Kafka also supports rack-aware leader election (rack-aware leader election ), it tries to place the leader partition and follower partition on different racks to increase the fault tolerance against rack failures. Therefore, the existence of the replica of the leader partition will affect the reliability of the cluster.

By default, auto.leader.rebalance.enabled is true, which means that Kafka is allowed to periodically
re-elect the leader of some topic partitions . In most cases, Broker's failure is short-lived, which means that Broker usually recovers in a short time. So when a node leaves the cluster, the metadata associated with it will not be deleted immediately.

When the Controller notices that the Broker has joined the cluster, it will use the Broker ID to check whether there is a partition on the Broker. If it does, the Controller will notify the newly joined Broker and the existing Broker, and the follower partition on the new Broker will start to replicate again Message of the existing leader partition. In order to ensure load balancing, the Controller will elect the follower partition on the newly added Broker as the leader partition.

Note : The selection of the Leader partition mentioned above is strictly to change the Leader partition. In order to achieve load balancing, it may cause the original normal Leader partition to be forcibly changed to the follower partition. The cost of changing the leader is very high. All clients that originally sent requests to Leader partition A (the original Leader partition) must switch to send requests to B (the new Leader partition). It is recommended that you set this parameter in the production environment Into false.

Synchronized copy ( in-Sync Replica , ISR) list

The replicas in the ISR are all replicas that are synchronized with the Leader, so followers not in the list will be considered out of sync with the Leader. So, what replicas exist in the ISR? The first thing to be clear is: Leader copy always exists in ISR. Whether the follower copy is in the ISR depends on whether the follower copy is "synchronized" with the leader copy.

It is very important to always ensure that you have a sufficient number of synchronized copies. To promote a follower to a leader, it must exist in the list of synchronized replicas . Each partition has a synchronized copy list, which is updated by the Leader partition and the Controller.

The process of selecting a partition in the synchronized replica list as the leader partition is called clean leader election . Note that this is to be distinguished from the process of selecting a partition as the leader partition in the asynchronous replica. The process of selecting a partition as the leader in the asynchronous replica is called unclean leader election . Since the ISR is dynamically adjusted, there will be cases where the ISR list is empty. Generally speaking, the non-synchronized copies are too far behind the leader. Therefore, if these copies are selected as the new leader, data loss may occur. After all, the messages saved in these copies lag far behind the messages in the old Leader. In Kafka, the process of electing this kind of copy can be controlled by the broker-side parameter **unclean.leader.election.enable** whether to allow Unclean leader election. Enabling the Unclean leader election may cause data loss, but the advantage is that it allows the partition leader copy to always exist and will not stop providing services to the outside world, thus improving high availability. On the contrary, the advantage of prohibiting Unclean Leader election is to maintain data consistency and avoid message loss, but at the expense of high availability. The CAP theory of distributed systems says this is the case.

Unfortunately, the election process of unclean leader election may still cause data inconsistencies, because the synchronized replicas are not completely synchronized. Since the replication is done asynchronously , there is no guarantee that the follower can get the latest news. For example, the offset of the last message of the Leader partition is 100, and the offset of the copy may not be 100 at this time. This is affected by two parameters:

  • replica.lag.time.max.ms : the time between the synchronization replica and the leader replica
  • zookeeper.session.timeout.ms : session timeout time with zookeeper

Split brain

If the controller broker fails, the Kafka cluster must find a replacement controller, and the cluster will not function normally. There is a problem here, it is difficult to determine whether the Broker is down, or just a temporary failure. However, for the cluster to function normally, a new controller must be selected. If the previously replaced controller is normal again, and he does not know that he has been replaced, then two controllers will appear in the cluster at this time.

In fact, this situation is easy to happen. For example, a certain controller is considered to have died due to GC, and a new controller is selected. In the case of GC, in the eyes of the initial controller, nothing was changed, and the Broker didn't even know that it was suspended. Therefore, it will continue to act as the current controller, which is a common situation in distributed systems called split brain.
Insert picture description here

Suppose, the active controller enters a long GC pause. Its ZooKeeper session expired and the previously registered /controllernode was deleted. Other Brokers in the cluster will receive this notification from zookeeper.

Insert picture description here

Since there must be a controller Broker in the cluster, each Broker is now trying to become a new controller. Assume that Broker 2 is faster and becomes the latest controller Broker. At this time, each Broker will receive a notification that Broker2 has become a new controller. Because Broker3 is performing a "stop the world" GC, it may not receive a notification that Broker2 has become the newest controller.

Insert picture description here
After Broker3's GC is completed, you will still think you are the controller of the cluster, and in Broker3's eyes, it seems that nothing happened.

Insert picture description here

Now, there are two controllers in the cluster, they may issue conflicting commands together, and a split brain phenomenon will occur. If this situation is not dealt with, serious inconsistencies may result. So we need a way to distinguish who is the latest Controller in the cluster.

Kafka is done by using epoch number (epoch number, also known as isolation token). The epoch number is just a monotonously increasing number. When the Controller is selected for the first time, the epoch number value is 1. If a new Controller is selected again, the epoch number will be 2, and the epoch number will increase monotonically in turn.

Each newly selected controller obtains a new and larger epoch number through Zookeeper's conditional increment operation. After other Brokers know the current epoch number, if they receive a message containing an older (smaller) epoch number from the controller, they will ignore them, that is, the Broker distinguishes the current latest controller based on the largest epoch number.
Insert picture description here

In the above figure, Broker3 issues a command to Broker1: Let a partition copy on Broker1 become the leader, and the epoch number value of this message is 1. At the same time, Broker2 also sent the same command to Broker1. The difference is that the epoch number value of the message is 2. At this time, Broker1 only listens to Broker2's commands (due to its larger epoch number) and ignores Broker3's commands. So as to avoid the occurrence of split brain.

to sum up

This article mainly explains what is a Kafka Controller, which is actually an ordinary Broker. Except for some extra work, its role is basically the same as other Brokers. In addition, the main responsibilities of Kafka Controller are introduced, and some of them are explained in detail. Finally, it also explains how Kafka avoids split-brain.

Guess you like

Origin blog.csdn.net/jmx_bigdata/article/details/107220910