One minute to understand Kafka replication mechanism

A copy of the text to understand Kafka replication mechanism

It allows the operator to distributed systems easier, in a way is an art, which is usually achieved are summarized from a lot of practice obtained. The Apache  Kafka used to live  popularity is attributed to its simplicity of design and operation to a large extent. With the community to add more features, developers will rethink ways to simplify complex behaviors come back.

The Apache  Kafka  in a more subtle features is its replication protocol (replication protocol). For workloads on a single cluster of different sizes, adjust  Kafka used to live  Replication to apply it to different situations in today's point of view is a bit tricky. It makes this point a particularly difficult challenge is how to prevent one from a synchronized copy of a copy of the list (also known as ISR) entry and exit. From the user's perspective, this means that if the producer (producer) sent a number of "big enough" message, then this could lead to Kafka brokers sends multiple alerts. These alerts indicate that some of the themes "not copied" (under replicated), which means that data is not copied to a sufficient number of brokers, thereby increasing the likelihood of data loss. Therefore, Kafka cluster closely monitoring "not copied" the total number of partitions is very important. In this article, I will discuss the root causes of this behavior and how we can solve this problem.

Article Directory

One minute to understand Kafka replication mechanism

Kafka topic each partition has a write-ahead log (write-ahead log), we write a message of Kafka is stored in there. This inside each message has a unique offset, which is used to identify the current location in the log partition. As shown below:

Hands-free Kafka Replication: A lesson in operational simplicity
If you want to keep abreast of article Spark, Hadoop or Hbase related, welcome attention to the micro-channel public number: iteblog_hadoop

Each topic in the partition Kafka is replicated n times, where n is the theme of the replication factor (replication factor). This allows Kafka automatically switch to these copies when the cluster server fails, so that the message is still available in case of failure. Kafka is copied partition size, partition write-ahead log is copied to n servers. In the n copies, one copy as a leader, the other copy to become followers. As the name suggests, producer can only write data (read only from the leader partition) to partition the leader, followers only in order to copy the logs from the leader.

Hands-free Kafka Replication: A lesson in operational simplicity
If you want to keep abreast of article Spark, Hadoop or Hbase related, welcome attention to the micro-channel public number: iteblog_hadoop

The basic guarantee of log replication algorithm (log replication algorithm) must be provided that, if it tells the client that the message has been submitted, and the current leader fails, the newly elected leader must also have the message. In the event of a fault, Kafka will hang from the leader of the ISR inside select this partition as a follower of a new leader; in other words, because the follower is to keep up with the progress of the leader wrote.

Each partition leader maintains a (synchronous copy of the list, also known as ISR) in-sync replica. When all copies of the producer transmits a message to the broker, the message is written to the corresponding first partition leader, then copied to the partition. Only after the message has been successfully copied all synchronized copies (ISR) to which the message is submitted to be considered. Since the message is limited by the slowest replication latency synchronous copy, so the rapid detection of slow copy and delete important from the ISR. Kafka copy details of the agreement will be some nuances, this blog is not intended to be exhaustive discussion of the topic. Interested students can go here to learn more about how Kafka replication.

Under what circumstances is considered a copy to keep up with leader

一个副本如果它没有跟上 leader 的日志进度,那么它可能会被标记为不同步的副本。我通过一个例子来解释跟上(caught up)的含义。假设我们有一个名为 foo 的主题,并且只有一个分区,同时复制因子为 3。假设此分区的副本分别在 brokers 1,2和3上,并且我们已经在主题 foo 上提交了3条消息。brokers 1上的副本是 leader,副本2和3是 followers,所有副本都是 ISR 的一部分。假设 replica.lag.max.messages 设置为4,这意味着只要 follower 落后 leader 的消息不超过3条,它就不会从 ISR 中删除。我们把 replica.lag.time.max.ms 设置为500毫秒,这意味着只要 follower 每隔500毫秒或更早地向 leader 发送一个 fetch 请求,它们就不会被标记为死亡并且不会从 ISR 中删除。

Hands-free Kafka Replication: A lesson in operational simplicity
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公众号:iteblog_hadoop

现在假设 producer 往 leader 上发送下一条消息,与此同时,broker 3 上发生了 GC 停顿,现在每个 broker 上的分区情况如下所示:

Hands-free Kafka Replication: A lesson in operational simplicity
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公众号:iteblog_hadoop

由于 broker 3 在 ISR中,因此在将 broker 3从 ISR 中移除或 broker 3 上的分区跟上 leader 的日志结束偏移之前,最新消息都是不认为被提交的。注意,由于 border 3 落后 leader 的消息比 replica.lag.max.messages = 4 要小,因此不符合从 ISR 中删除的条件。这意味着 broker 3 上的分区需要从 leader 上同步 offset 为 3 的消息,如果它做到了,那么这个副本就是跟上 leader 的。假设 broker 3 在 100ms 内 GC 完成了,并且跟上了 leader 的日志结束偏移,那么最新的情况如下图:

Hands-free Kafka Replication: A lesson in operational simplicity
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公众号:iteblog_hadoop

什么情况下会导致一个副本与 leader 失去同步

一个副本与 leader 失去同步的原因有很多,主要包括:

  • 慢副本(Slow replica):follower replica 在一段时间内一直无法赶上 leader 的写进度。造成这种情况的最常见原因之一是 follower replica 上的 I/O瓶颈,导致它持久化日志的时间比它从 leader 消费消息的时间要长;
  • 卡住副本(Stuck replica):follower replica 在很长一段时间内停止从 leader 获取消息。这可能是以为 GC 停顿,或者副本出现故障;
  • 刚启动副本(Bootstrapping replica):当用户给某个主题增加副本因子时,新的 follower replicas 是不同步的,直到它跟上 leader 的日志。

当副本落后于 leader 分区时,这个副本被认为是不同步或滞后的。在 Kafka 0.8.2 中,副本的滞后于 leader 是根据 replica.lag.max.messages 或 replica.lag.time.max.ms 来衡量的; 前者用于检测慢副本(Slow replica),而后者用于检测卡住副本(Stuck replica)。

如何确认某个副本处于滞后状态

通过 replica.lag.time.max.ms 来检测卡住副本(Stuck replica)在所有情况下都能很好地工作。它跟踪 follower 副本没有向 leader 发送获取请求的时间,通过这个可以推断 follower 是否正常。另一方面,使用消息数量检测不同步慢副本(Slow replica)的模型只有在为单个主题或具有同类流量模式的多个主题设置这些参数时才能很好地工作,但我们发现它不能扩展到生产集群中所有主题。

在我之前的示例的基础上,如果主题 foo 以 2 msg/sec 的速率写入数据,其中 leader 收到的单个批次通常永远不会超过3条消息,那么我们知道这个主题的 replica.lag.max.messages 参数可以设置为 4。为什么? 因为我们以最大速度往 leader 写数据并且在 follower 副本复制这些消息之前,follower 的日志落后于 leader 不超过3条消息。同时,如果主题 foo 的 follower 副本始终落后于 leader 超过3条消息,则我们希望 leader 删除慢速 follower 副本以防止消息写入延迟增加。

这本质上是 replica.lag.max.messages 的目标 - 能够检测始终与 leader 不同步的副本。假设现在这个主题的流量由于峰值而增加,生产者最终往 foo 发送了一批包含4条消息,等于 replica.lag.max.messages = 4 的配置值。此时,两个 follower 副本将被视为与 leader 不同步,并被移除 ISR。

Hands-free Kafka Replication: A lesson in operational simplicity
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公众号:iteblog_hadoop

但是,由于两个 follower 副本都处于活动状态,因此它们将在下一个 fetch 请求中赶上 leader 的日志结束偏移量并被添加回 ISR。如果生产者继续向 leader 发送大量的消息,则将重复上述相同的过程。这证明了 follower 副本进出 ISR 时触发不必要的错误警报的情况。

Hands-free Kafka Replication: A lesson in operational simplicity
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公众号:iteblog_hadoop

replica.lag.max.messages 参数的核心问题是,用户必须猜测如何配置这个值,因为我们不知道 Kafka 的传入流量到底会到多少,特别是在网络峰值的情况下。

一个参数搞定一切

We are aware that detect stuck or slow copy of the really important thing is the time to copy with the leader of sync. We removed by guessing to set the  replica.lag.max.messages parameters. Now, we just need to configure the server  replica.lag.time.max.ms parameters can be; the meaning of the parameters for the time copy and leader of sync.

  • Detecting a copy jam (Stuck replica) in the same manner as before  - if the copy fails to  replica.lag.time.max.ms send fetch request time, it is treated as dead and removed from the copy of the ISR;
  • Slow copy detection mechanism has changed  - If a copy of the time behind the leader exceeds  replica.lag.time.max.ms, it is considered too slow and removed from the ISR.

Therefore, even at peak flow, the producer sent to the leader of a large number of messages, unless a copy of the leader and always maintain the  replica.lag.time.max.ms time lag, otherwise it will not be random and out of the ISR.

 

original:

https://www.iteblog.com/archives/2556.html

Guess you like

Origin blog.csdn.net/BD_fuhong/article/details/92359541