Detailed Kafka (e) a copy of Kafka mechanism

Called copy mechanism (the Replication), can also be called backup mechanism, generally it refers to a distributed system on a network interconnection of multiple machines the same copy of the data stored. Its role mainly in the following points:

  1. Provide data redundancy. Even if part of the system fails, the system can still continue to operate, thereby increasing the overall availability and data durability.
  2. Provide high scalability. Lateral support extension, it is possible to improve the machine performance by increasing the read mode, the read operation and to improve throughput.
  3. Improve data locality. Allowing local data into the geographical proximity of the user, to reduce system latency.

These are distributed copies of a textbook definition of the role. But for Kafka, currently only the first effect, that is, to provide data redundancy, high availability and high durability.

1. a copy of the definition

Kafka in the copy (Replica), is essentially a message can only submit additional written log . All copies of the same partition is stored in the same sequence of messages, these copies stored in different dispersion Broker, can be brought down against the portion Broker data is not available.

In the actual production environment, each Broker may have different copies stored in different partitions under each theme, therefore, there on the phenomenon of hundreds of copies of a single Broker is very normal. As shown below:

2. A copy of the role

Kafka used is based on the leader (Leader-based) a copy of the mechanism . In order to ensure data consistency under the same partition multiple copies. Based on the principle mechanisms copy of the leader as shown below:

  • In Kafka, a copy is divided into two categories: a copy of the leader (Leader Replica) and a copy of followers (Follower Replica). Each partition when you create a copy of the election must be called by a copy of the leadership, the remaining copies automatically referred to a copy of his followers.
  • Followers copy of Kafka is not the provision of external services. All read and write requests must be sent to the leader of a copy where the Broker, is handled by the Broker. Followers copy does not handle client requests, its only task is to pull message from the leader asynchronous copy, and written to the log in your own submission, in order to achieve synchronization with the leader of copies.
  • 当领导者副本挂掉,或者说领导者副本所在的Broker宕机时,Kafka依托于ZooKeeper提供的监控功能能够实时感知,并立即开启新一轮的领导者选举,从追随者副本中选一个作为领导者。老Leader副本重启回来后,只能作为追随者副本加入到集群中。

3. Kafka副本机制的优点

Kafka副本机制有两个方面的好处:

1 方便实现“Read-your-writes”

所谓 Read-your-writes,顾名思义就是,当你使用生产者 API 向 Kafka 成功写入消息后,马上使用消费者 API 去读取刚才生产的消息。如果允许追随者副本对外提供服务,由于副本同步是异步的,因此有可能出现追随者副本还没有从领导者副本那里拉取到最新的消息,从而使得客户端看不到最新写入的消息。

2 方便实现单调读(Monotonic Reads)

单调读定义:对于一个消费者用户而言,在多次消费消息时,它不会看到某条消息一会儿存在一会儿不存在。

如果允许追随者副本提供读服务,那么假设当前有 2 个追随者副本 F1 和 F2,它们异步地拉取领导者副本数据。倘若 F1 拉取了 Leader 的最新消息而 F2 还未及时拉取,那么,此时如果有一个消费者先从 F1 读取消息之后又从 F2 拉取消息,它可能会看到这样的现象:第一次消费时看到的最新消息在第二次消费时不见了,这就不是单调读一致性。但是,如果所有的读请求都是由 Leader 来处理,那么 Kafka 就很容易实现单调读一致性。

4. In-sync Replicas(ISR)

ISR副本是指与Leader保持同步的副本集合。这个副本集合不只是追随者副本,也(必然)包括Leader副本。甚至在某些情况下,ISR只有Leader副本。

Broker 端参数 replica.lag.time.max.ms 决定一个追随者副本是否在ISR集合中。这个参数的含义是追随者副本落后领导者副本的最长时间间隔。在这个时间间隔内,追随者副本没有做fetch操作,就被视为与领导者副本不同步,从而被踢出ISR集合。

后续倘若该副本的LEO不小于Leader副本的HW时,它会被重新加入到ISR集合中。

这表明,ISR是一个动态调整的集合,而非静止不变的。

生产者写入的一条消息只有被ISR中的所有副本都接收到,才被视为“已提交”状态。由此可见,若ISR分区中有n个副本,那么该分区最多可以忍受n-1个副本崩溃而不丢失已提交的消息。

5. Unclean领导者选举(Unclean Leader Election)

Broker 端参数 unclean.leader.election.enable 控制是否允许 Unclean 领导者选举。当这个参数开启时,Leader副本所在的Broker宕机,且ISR集合为空,就会发生Unclean领导者选举。

  • 开启Unclean领导者选举可能会造成数据丢失,但是好处是,始终会选出新的分区Leader副本,不至于停止对外服务,因此提升了高可用性。
  • 禁止Unclean领导者选举的好处是维护了数据的一致性,避免了消息丢失,但牺牲了高可用性。

强烈建议不要开启Unclean领导者选举,避免数据丢失。

发布了8 篇原创文章 · 获赞 0 · 访问量 7280

Guess you like

Origin blog.csdn.net/fedorafrog/article/details/103988582
Recommended