Kafka Topic Architecture - parallel processing duplication failover

This paper describes the architecture of Kafka Topic, and discusses how to use the partition failover and parallel processing.

1. Kafka Topic, Log, Partition

Kafka Topic (theme) is a name of the recorded stream, Kafka to Record (record) is stored in the log in the log file.

A theme is broken down into multiple Partition (partition) .

Kafka log plurality of partitions distributed to multiple servers.

Topic decomposed to a plurality of partitions, in order to speed, scalability, storage.

Topic is born publish / subscribe messaging type manner, a Topic can have zero or more subscribers (consumer group consumer group).

Kafka Topic Partitions

Topic comprising a plurality of Partition, Partition which contains a plurality of records.

A specific record stored in the partition it?

If the recorded key, it will specify the partition according to the key.

If the record is not key, default polling manner specified partition.

Kafka by way of multi-partition, the theme has a scalable across multiple servers, write performance such producer would enhance, multiple partitions also enhance the performance of consumer consumption, as consumption in parallel, the upper limit is parallel partition quantity.

Each partition inside the record is to ensure that the order, if it is specified partition key, then the key is the same record in the same partition, is very useful for scenarios that require replay logs based.

Kafka Each partition will be copied to ensure high availability, ease of doing failover.

Kafka order record

Kafka only sub-region to ensure that records are ordered, does not guarantee an orderly record in the theme.

A partition is ordered, immutable sequence of records.

Kafka as a zoning structure of the commit log for additional recording to the partition.

Each partition is assigned a record number, called "offset", offset specifies the position of each record in the partition.

主题分区使 Kafka 日志可以扩展到超出单个服务器的大小。

分区必须适合其所在的服务器,但主题可以有多个分区,所以主题就可以跨越多个服务器。

一个分区是一个并行单元,一个分区一次只能由消费者组中的一个消费者操作。

如果消费者组中的一个消费者停止了,Kafka 会把其对应的分区再分配给组内其他消费者。

Kafka 主题分区复制

Kafka 可以对分区进行跨服务器复制,复制几份是可以配置的,复制分区是用于容错。

每个分区复制了多份,其中有一个是 leader,其他为 follower,leader 负责此分区的所有读写请求。

follower 从 leader 复制记录,并关注 leader 的存活状态。

2. 复制:分区的 Leader, Follower, and ISR

Kafka 通过 ZooKeeper 从多个分区中选举出一个 leader。

分区 leader 所在的服务器负责处理此分区的所有读写请求,写请求会从 leader 复制到 follower。

在所有 follower 当中,与 leader 保持同步复制的,称为 ISR(in-sync replica),如果 leader 故障了,会从 ISR 中选举出一个新的 leader。

被所有 ISR 都复制完成的记录才是 “committed 已提交” 的,只有已提交的记录才能被消费者读取。

3. 常见问题

ISR 是什么?

一个 ISR 是一个同步复制 leader 的 follower。

如果 leader 故障了,只有 ISR 有资格竞选 leader。

消费者与分区的对应关系

一个消费者可以对应多个分区,但一个分区一次只能被一个消费者组中的一个消费者使用。

如果一个主题只有一个分区,那么就只能有一个消费者。

Leader Follower 是什么?

leader 处理本分区的所有读写请求,follower 复制 leader。

kafka 如何做消费者的故障转移?

消费者组的一个消费者如果故障了,那么之前分配给这个消费者的分区会被重新分配给组内的其他消费者。

kafka 如何做 broker 的故障转移?

如果一个 broker 故障了,kafka 会将其持有的分区领导权重新分配给其他 broker。

内容翻译整理自:

https://dzone.com/articles/kafka-topic-architecture-replication-failover-and

推荐阅读:

Guess you like

Origin www.cnblogs.com/yogoup/p/12052413.html