Kafka-- copy (Replica) mechanism

 

A copy of the definition

Kafka is a themed concept, and each topic is further divided into several partitions. The concept is actually a copy defined under the partition level, each partition configured with several copies.

The so-called replica (Replica), is essentially a message can only submit additional written log. Kafka copy definition mechanism, with all copies of a partition in the same sequence of messages stored, the copies stored in different dispersion Broker, can be brought down against the data portion Broker unavailable.

In the actual production environment, each Broker may have different copies stored in different partitions under each theme, therefore, there on the phenomenon of hundreds of copies of a single Broker is very normal. Next we look at a map, it shows that there is a copy of the distribution on three of Kafka Broker cluster.

From this figure, we can see that the theme Division 1 0 3 copies spread over three Broker, copy partition other topics are also scattered in different Broker, in order to achieve data redundancy.

 

 

Replica role

Since the partition where you can configure multiple copies, but the contents of these copies also consistent, then the natural question is: How do we ensure that a copy of all the data are consistent with it?

Especially Kafka, when the producer sends a message to a topic, the message is how to synchronize all copies corresponding to it? To address this problem, the most common solution is to use a copy of the mechanism-based leader (Leader-based) of. Apache Kafka is one such design.

Copy mechanism based on the principle of leadership shown below, let me briefly explain the contents of this picture inside.

First , in Kafka, a copy is divided into two categories: a copy of the leader (Leader Replica) and a copy of followers (Follower Replica). Each partition when you create a copy of the election must be called by a copy of the leader, the rest of the copy of the copy is automatically called followers.

Second , Kafka copy mechanism to be more stringent than some of the other distributed systems. In Kafka, the followers of the copy is not external service provider. This means that any follower of a copy of the response can not read and write requests of consumers and producers. All requests must be handled by a copy of the leader, or that all read and write requests must be sent to the leader of a copy where the Broker, the Broker in charge of the process. Followers copy does not handle client requests, its only task is to copy from the leader pulled asynchronous messages and writes a commit log into their own in order to achieve synchronization with the leader of copies.

Third , when a copy of the leader hung up, or when the leader of a copy of the Broker down where, Kafka relies on ZooKeeper monitoring capabilities provide real-time perception, and immediately open a new round of election leader from followers a copy of the selected as the new leader. Leader old copy back after the restart, only added to the cluster as a follower of copies.

You must pay special attention to the second point above, that is, followers of the copy is not external service provider. Remember when we talked about the benefits just copy mechanism, said Kafka failed to provide a read operation scale and improve locality it? Specific reason for this is it.

For the client user, a copy of Kafka's followers have no effect, as it can neither help leaders a copy of the "anti-read" like MySQL, can not be achieved will put some away from copies of recent client data locally to improve sex.

That being the case, Kafka why this design? In fact, this copy mechanism has two benefits.

 

1. facilitate the achievement of "Read-your-writes".

The so-called Read-your-writes, as the name suggests is that when you use the API to write a message to Kafka producer successfully, immediately use API consumers to read the news just produced. For example, such as when you usually send to it, you're done send a micro-Bo, is certainly hoping to see immediately, which is typical of Read-your-writes scene. If you allow followers of copies to provide services, due to the synchronous replica is asynchronous, so there may not have a copy from a copy of the followers of the leader there pulled to the latest news, so that the client can not see the news most recently written.

 

2. convenient read monotone (Monotonic Reads).

What is monotonous to read it? For the user is a consumer, the consumer when multiple messages, it will not see the presence of a message for a moment while absent. If you allow followers to read a copy of the service provided, it is assumed that currently there are two copies of the followers of F1 and F2, which asynchronously pulling leader a copy of the data. If F1 pull the Leader of the latest news and F2 has not pulled in time, then F2 and from this time if there is a pull consumer F1 start reading the message after message, it may see this phenomenon: first when a consumer to see the latest news in the second consumption was gone, it is not monotonous read consistency. However, if all read requests are handled by the Leader, so it is easy to achieve monotonous Kafka read consistency.

 

 

In-sync Replicas(ISR)

We just have repeatedly said, the service does not provide a copy of followers, but regularly pulls data asynchronously leader copy of it. Since it is asynchronous, it is unlikely there is a risk of real-time synchronization with the Leader. Before discussing how to deal with this risk properly, we need to know exactly what the meaning of synchronization yes. Or, Kafka should clearly tell us a copy of followers in the end be considered synchronized with the Leader and under what conditions.

Based on this idea, Kafka introduced the In-sync Replicas, also known as ISR replica set. A copy of the ISR is synchronized with a copy of the Leader, on the contrary, is not a copy of the followers of the ISR is considered to be the Leader of sync. So, in the end what can copy it into the ISR?

We must first be clear that, Leader a copy of the natural in the ISR. In other words, ISR not just followers replica set, it must include a copy of the Leader. Even, in some cases, ISR Leader only one copy of it.

In addition, a copy of followers to enter into the ISR to meet certain conditions. As to what conditions, I would like to secrecy, let's take a look at this picture together.

Figure, there are three copies: one copy of the leader and two followers copies. Leader written copy of the current message 10, Follower1 replica synchronized 6 wherein the message, but only synchronized replica Follower2 3 message therein. Now, you think about it for two copies of followers, followers do you think is a copy of the Leader of sync?

The answer is to be based on specific circumstances. Into English, it is the famous phrase "It depends". It looks like the number of messages Follower2 a lot less than the Leader, it is most likely with the Leader of sync. That's right, but only possible.

In fact, 2 Follower copy of this figure has may not be synchronized with the Leader, but also are likely to be synchronized with the Leader. In other words, Kafka determine whether Follower standard sync with the Leader, not to see the difference between the number of messages, but another "mystery."

This standard is Broker end parameter replica.lag.time.max.ms parameter values. The meaning of the parameters can be Follower copy of a copy of the most backward Leader long time intervals, the current default value is 10 seconds. This means that as long as a copy of backward time Leader Follower copy of discontinuous over 10 seconds, then Kafka considers the Follower copies are synchronized with the Leader, Follower even saved a copy of the message at this time significantly less than the Leader copy of the message.

We said earlier, Follower copies only job is to constantly pull message from the Leader copy and then submit their own written to the log. If the speed of the synchronization process continues to slow write speed Leader copy of the message, then after replica.lag.time.max.ms time, this will be considered a copy of Follower and Leader copy of sync, it can not recapture the ISR. At this time, Kafka will automatically shrink ISR collection, a copy of "kicked out" ISR.

It is noteworthy that, if the copy back slowly to catch up with the progress of the Leader, then it can be re-added back to the ISR. This also shows that, ISR is a dynamic adjustment of the set, rather than static unchanging.

 

 

Unclean leader election (Unclean Leader Election)

Since the ISR can be dynamically adjusted, then naturally such a situation can arise: ISR is empty. Because a copy of the natural Leader in the ISR, if the ISR is empty, it means Leader copy is also "hang" a, Kafka need to re-elect a new Leader. But the ISR is empty, at which time the election of a new Leader how it?

Kafka is not the surviving copies of all the ISR are called asynchronous copy. In general, asynchronous copy Leader backward too much, so if you choose these copies as the new Leader, it appears possible loss of data. After all, these saved a copy of the message is far behind the old Leader of the message. In Kafka, a copy of this election process is called Unclean leader election. Broker end parameters unclean.leader.election.enable control whether to allow Unclean leader election.

Open Unclean leader election may result in loss of data, but the advantage is that it makes a copy of the partition Leader has always existed, and will not cease to provide services, and thus enhance the high availability. On the other hand, the benefits of prohibiting Unclean leader election is to maintain data consistency, avoiding message loss, but at the expense of high availability.

如果你听说过 CAP 理论的话,你一定知道,一个分布式系统通常只能同时满足一致性(Consistency)、可用性(Availability)、分区容错性(Partition tolerance)中的两个。显然,在这个问题上,Kafka 赋予你选择 C 或 A 的权利。

你可以根据你的实际业务场景决定是否开启 Unclean 领导者选举。不过,我强烈建议你不要开启它,毕竟我们还可以通过其他的方式来提升高可用性。如果为了这点儿高可用性的改善,牺牲了数据一致性,那就非常不值当了。

 

 

副本处理

Kafka在启动的时候会开启两个任务

一个任务用来定期地检查是否需要缩减或者扩大ISR集合,这个周期是replica.lag.time.max.ms的一半,默认5000ms。当检测到ISR集合中有失效副本时,就会收缩ISR集合,当检查到有Follower的HighWatermark追赶上Leader时,就会扩充ISR。除此之外,当ISR集合发生变更的时候还会将变更后的记录缓存到isrChangeSet中,

另外一个任务会周期性地检查这个Set,如果发现这个Set中有ISR集合的变更记录,那么它会在zk中持久化一个节点。然后因为Controllr在这个节点的路径上注册了一个Watcher,所以它就能够感知到ISR的变化,并向它所管理的broker发送更新元数据的请求。最后删除该路径下已经处理过的节点。

此外,在0.9X版本之前,Kafka中还有另外一个参数replica.lag.max.messages,它也是用来判定失效副本的,当一个副本滞后leader副本的消息数超过这个参数的大小时,则判定它处于同步失效的状态。它与replica.lag.time.max.ms参数判定出的失效副本取并集组成一个失效副本集合。

不过这个参数本身很难给出一个合适的值。以默认的值4000为例,对于消息流入速度很低的主题(比如TPS为10),这个参数就没什么用;对于消息流入速度很高的主题(比如TPS为2000),这个参数的取值又会引入ISR的频繁变动。所以从0.9x版本开始,Kafka就彻底移除了这一个参数。

Guess you like

Origin www.cnblogs.com/caoweixiong/p/12049462.html