In-depth understanding of Kafka (1)

Kafka's three major roles:

  • Message System:

Kafka traditional message system (also referred to as messaging middleware> are provided with decoupling, redundant storage, flow clipping, buffer, asynchronous communication, scalability, resiliency and other functions. At the same time, is also provided Kafka most messaging systems difficult to achieve message sequence of backtracking consumer protection and functionality.

  • Storage System:

Kafka the message persisted to disk , compared to others in terms of memory-based storage system that effectively reduces the risk of data loss. It also benefited from Kafka message persistence and multi-copy mechanism, we can put Kafka as a long-term data storage system to use, just need the corresponding data retention policy is set to "permanent" or the Enable subject logging compression It can be.

  • Streaming Platform:

Kafka popular not only provide for each frame a reliable streaming data sources, also provides a full -flow processing library , such as a window, is connected, and the polymerization conversion and other operations.

basic concept

A typical Kafka architecture include:

  • Producer

Producer, one message is sent. Producers are responsible for creating the message, and then deliver it to Kafka in.

  • Broker

Service Agent node. For Kafka, Broker may simply be seen as a stand-alone service nodes or Kafka Kafka service instance . In most cases it can also be seen as a Kafka Broker server, provided that this server only deployed a Kafka instance. Broker composed of one or more of a Kafka cluster.

  • Consumer

Consumers, that is, the party receiving the message. Kafka and coupled to the consumer receiving the message, and thus the corresponding service logic.

  • ZooKeeper cluster

ZooKeeper is Kafka used to be responsible for a cluster of metadata management, control of elections operations.

Kafka architecture

Kafka's two key concepts -Topic and Partition

  • Topic

Kafka messages to classify the unit of themes, the producer responsible for sending messages to a specific theme (Kafka sent to each message in the cluster must specify a theme), and consumers subscribe to a topic and responsible consumption.

  • Partition

Theme is a logical concept , it can be broken down into multiple partitions, a partition only belong to a single topic , often called the theme will be the partition partition (Topic-Partition). Under the same topic messages contain different partitions are different , partition storage level can be seen as an additional log (Log) files.

Partitions can be distributed on different servers (broker), that is to say, a theme can span multiple Broke r, in order to provide a single broker more powerful performance ratio.

  • offset

Message is appended to partition log file is assigned a specific time offset (offset). offset is the message that uniquely identifies the partition , Kafka through it in order to ensure that the message of the partition, but the offset is not across the partition , that is to say, Kafka ensure the orderly partition is not the subject of an orderly.

Additional writing messages

Multi-copy architecture

Kafka 为分区引入了多副本( Replica ) 机制, 通过增加副本数量可以提升容灾能力。同一分区的不同副本中保存的是相同的消息(在同一时刻,副本之间并非完全一样),各副本之间是“ 一主多从”的关系,其中leader 副本负责处理读写请求, follower 副本只负责与leader 副本的消息同步。副本处于不同的broker 中,当leader 副本出现故障时,从follower 副本中重新选举新的leader 副本对外提供服务。Kafka 通过多副本机制实现了故障的自动转移,当Kafka 集群中某个broker 失效时仍然能保证服务可用。

多副本架构

术语

  • AR ( Assigned Replicas)

分区中的所有副本统。

  • ISR(On-Sync Replicas )

所有与leader 副本保持一定程度同步的副本(包括leader 副本在内〕组成 , ISR 集合是AR 集合中的一个子集。

消息会先发送到leader副本,然后follower 副本才能从leader 副本中拉取消息进行同步,同步期间内follower 副本相对于leader 副本而言会有一定程度的滞后。前面所说的“ 一定程度的同步”是指可忍受的滞后范围,这个范围可以通过参数进行配置

  • OSR ( Out-of-Sync Replicas )

与leader 副本同步滞后过多的副本(不包括leader 副本)

  • HW(High Watermark)

它标识了一个特定的消息偏移量( offset ),消费者只能拉取到这个offset 之前的消息。

  • LEO(Log End Offset)

它标识当前日志文件中下一条待写入消息的offset,LEO 的大小相当于当前日志分区中最后一条消息的offset值加1 。分区ISR 集合中的每个副本都会维护自身的LEO.

分区中各偏移量位置

上图代表一个日志文件,这个日志文件中有9 条消息,第一条消息的offset( LogStartOffset )为0 ,最后一条消息的offset为 8, offset 为9 的消息用虚线框表示,代表下一条待写入的消息。日志文件的HW 为6,表示消费者只能拉取到offset 在0 至5 之间的消息,而offset 为6 的消息对消费者而言是不可见的。

HW与LEO关系

为了让读者更好地理解ISR 集合,以及HW 和LEO 之间的关系,下面通过一个简单的示例来进行相关的说明。

假设某个分区的ISR 集合中有3 个副本,即一个leader副本和2 个follower 副本,此时分区的LEO 和HW 都为3 。

消息3 和消息4 从生产者发出之后会被先存入leader 副本:

在消息写入leader 副本之后, fo llower 副本会发送拉取请求来拉取消息3 和消息4 以进行消息同步。在同步过程中,不同的follower副本的同步效率也不尽相同。

所有的副本都成功写入了消息3 和消息4

 

 

 

Guess you like

Origin blog.csdn.net/demon7552003/article/details/92366110