6.Kafka系列之设计思想(四)-消息传递语义

4.6 Message Delivery Semantics

Now that we understand a little about how producers and consumers work, let’s discuss the semantic guarantees Kafka provides between producer and consumer. Clearly there are multiple possible message delivery guarantees that could be provided:

现在我们对生产者和消费者的工作方式有了一些了解，让我们讨论一下 Kafka 在生产者和消费者之间提供的语义保证。显然，可以提供多种可能的消息传递保证

At most once—Messages may be lost but are never redelivered.
最多一次——消息可能会丢失，但永远不会重新传递。
At least once—Messages are never lost but may be redelivered.
至少一次——消息永远不会丢失，但可能会重新传递。
Exactly once—this is what people actually want, each message is delivered once and only once.
Exactly once——这正是人们真正想要的，每条消息只传递一次

It’s worth noting that this breaks down into two problems: the durability guarantees for publishing a message and the guarantees when consuming a message.

值得注意的是，这分为两个问题：发布消息的持久性保证和消费消息时的保证

Many systems claim to provide “exactly once” delivery semantics, but it is important to read the fine print, most of these claims are misleading (i.e. they don’t translate to the case where consumers or producers can fail, cases where there are multiple consumer processes, or cases where data written to disk can be lost).

许多系统声称提供“恰好一次”交付语义，但阅读细则很重要，这些声明中的大多数都是误导性的（即它们不会转化为消费者或生产者可能失败的情况，有多个的消费者进程或写入磁盘的数据可能丢失的情况）

Kafka’s semantics are straight-forward. When publishing a message we have a notion of the message being “committed” to the log. Once a published message is committed it will not be lost as long as one broker that replicates the partition to which this message was written remains “alive”. The definition of committed message, alive partition as well as a description of which types of failures we attempt to handle will be described in more detail in the next section. For now let’s assume a perfect, lossless broker and try to understand the guarantees to the producer and consumer. If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key.

Kafka 的语义是直截了当的。发布消息时，我们有消息被“提交”到日志的概念。一旦发布的消息被提交，只要复制该消息写入的分区的一个代理保持“活动”状态，它就不会丢失。提交消息的定义、活动分区以及我们尝试处理的故障类型的描述将在下一节中更详细地描述。现在让我们假设一个完美的、无损的经纪人，并尝试理解对生产者和消费者的保证。如果生产者尝试发布消息并遇到网络错误，则无法确定此错误是发生在消息提交之前还是之后

Prior to 0.11.0.0, if a producer failed to receive a response indicating that a message was committed, it had little choice but to resend the message. This provides at-least-once delivery semantics since the message may be written to the log again during resending if the original request had in fact succeeded. Since 0.11.0.0, the Kafka producer also supports an idempotent delivery option which guarantees that resending will not result in duplicate entries in the log. To achieve this, the broker assigns each producer an ID and deduplicates messages using a sequence number that is sent by the producer along with every message. Also beginning with 0.11.0.0, the producer supports the ability to send messages to multiple topic partitions using transaction-like semantics: i.e. either all messages are successfully written or none of them are. The main use case for this is exactly-once processing between Kafka topics (described below).

在 0.11.0.0 之前，如果生产者未能收到指示消息已提交的响应，它别无选择，只能重新发送消息。这提供了至少一次传递语义，因为如果原始请求实际上已经成功，则消息可能会在重新发送期间再次写入日志。从 0.11.0.0 开始，Kafka 生产者还支持幂等传递选项，保证重新发送不会导致日志中出现重复条目。为此，代理为每个生产者分配一个 ID，并使用生产者随每条消息发送的序列号对消息进行重复数据删除。同样从 0.11.0.0 开始，生产者支持使用类似事务的语义将消息发送到多个主题分区的能力：即要么所有消息都已成功写入，要么都没有。主要用例是 Kafka 主题之间的一次性处理（如下所述)

Not all use cases require such strong guarantees. For uses which are latency sensitive we allow the producer to specify the durability level it desires. If the producer specifies that it wants to wait on the message being committed this can take on the order of 10 ms. However the producer can also specify that it wants to perform the send completely asynchronously or that it wants to wait only until the leader (but not necessarily the followers) have the message.

并非所有用例都需要如此强大的保证。对于对延迟敏感的用途，我们允许生产者指定其所需的持久性级别。如果生产者指定它想要等待提交的消息，这可能需要 10 毫秒的数量级。然而，生产者也可以指定它想要完全异步地执行发送，或者它只想等到领导者（但不一定是追随者）收到消息

Now let’s describe the semantics from the point-of-view of the consumer. All replicas have the exact same log with the same offsets. The consumer controls its position in this log. If the consumer never crashed it could just store this position in memory, but if the consumer fails and we want this topic partition to be taken over by another process the new process will need to choose an appropriate position from which to start processing. Let’s say the consumer reads some messages – it has several options for processing the messages and updating its position.

现在让我们从消费者的角度来描述语义。所有副本都有完全相同的日志和相同的偏移量。消费者控制其在此日志中的位置。如果消费者从未崩溃，它可以将这个位置存储在内存中，但如果消费者失败并且我们希望这个主题分区被另一个进程接管，新进程将需要选择一个合适的位置来开始处理。假设消费者消费了一些消息——它有几个选项来处理消息和更新它的位置

It can read the messages, then save its position in the log, and finally process the messages. In this case there is a possibility that the consumer process crashes after saving its position but before saving the output of its message processing. In this case the process that took over processing would start at the saved position even though a few messages prior to that position had not been processed. This corresponds to “at-most-once” semantics as in the case of a consumer failure messages may not be processed.

它可以读取消息，然后保存它在日志中的位置，最后处理消息。在这种情况下，消费者进程有可能在保存其位置之后但在保存其消息处理的输出之前崩溃。在这种情况下，接管处理的进程将从保存的位置开始，即使该位置之前的一些消息还没有被处理。这对应于“至多一次”语义，因为在消费者失败消息的情况下可能不会被处理

It can read the messages, process the messages, and finally save its position. In this case there is a possibility that the consumer process crashes after processing messages but before saving its position. In this case when the new process takes over the first few messages it receives will already have been processed. This corresponds to the “at-least-once” semantics in the case of consumer failure. In many cases messages have a primary key and so the updates are idempotent (receiving the same message twice just overwrites a record with another copy of itself).

它可以读取消息、处理消息并最终保存其位置。在这种情况下，消费者进程有可能在处理消息之后但在保存其位置之前崩溃。在这种情况下，当新进程接管它接收到的前几条消息时，它已经被处理过了。这对应于消费者失败情况下的“至少一次”语义。在许多情况下，消息有一个主键，因此更新是幂等的（两次接收相同的消息只是用它自己的另一个副本覆盖记录）

So what about exactly once semantics (i.e. the thing you actually want)? When consuming from a Kafka topic and producing to another topic (as in a Kafka Streams application), we can leverage the new transactional producer capabilities in 0.11.0.0 that were mentioned above. The consumer’s position is stored as a message in a topic, so we can write the offset to Kafka in the same transaction as the output topics receiving the processed data. If the transaction is aborted, the consumer’s position will revert to its old value and the produced data on the output topics will not be visible to other consumers, depending on their “isolation level.” In the default “read_uncommitted” isolation level, all messages are visible to consumers even if they were part of an aborted transaction, but in “read_committed,” the consumer will only return messages from transactions which were committed (and any messages which were not part of a transaction).

那么 exactly once 语义（即你真正想要的东西）呢？从 Kafka 主题消费并生产到另一个主题时（如Kafka Streams 应用程序），我们可以利用上面提到的 0.11.0.0 中的新事务生产者功能。消费者的位置作为消息存储在主题中，因此我们可以在与接收处理数据的输出主题相同的事务中将偏移量写入 Kafka。如果交易被中止，消费者的位置将恢复到它的旧值，并且输出主题的产生的数据将对其他消费者不可见，这取决于他们的“隔离级别”。在默认的“read_uncommitted”隔离级别中，所有消息对消费者都是可见的，即使它们是中止事务的一部分，但在“read_committed”中，消费者将只返回来自已提交事务的消息（以及任何不属于一部分的消息）交易）

When writing to an external system, the limitation is in the need to coordinate the consumer’s position with what is actually stored as output. The classic way of achieving this would be to introduce a two-phase commit between the storage of the consumer position and the storage of the consumers output. But this can be handled more simply and generally by letting the consumer store its offset in the same place as its output. This is better because many of the output systems a consumer might want to write to will not support a two-phase commit. As an example of this, consider a Kafka Connect connector which populates data in HDFS along with the offsets of the data it reads so that it is guaranteed that either data and offsets are both updated or neither is. We follow similar patterns for many other data systems which require these stronger semantics and for which the messages do not have a primary key to allow for deduplication.

写入外部系统时，限制在于需要将消费者的位置与实际存储为输出的内容相协调。实现这一点的经典方法是在消费者位置的存储和消费者输出的存储之间引入两阶段提交。但这可以通过让消费者将其偏移量存储在与其输出相同的位置来更简单和更普遍地处理。这更好，因为消费者可能想要写入的许多输出系统不支持两阶段提交。作为这方面的一个例子，考虑一个 Kafka Connect连接器将数据连同它读取的数据的偏移量一起填充到 HDFS 中，以保证数据和偏移量都被更新或两者都不更新。我们对许多其他数据系统遵循类似的模式，这些系统需要这些更强的语义并且消息没有主键以允许重复数据删除

So effectively Kafka supports exactly-once delivery in Kafka Streams, and the transactional producer/consumer can be used generally to provide exactly-once delivery when transferring and processing data between Kafka topics. Exactly-once delivery for other destination systems generally requires cooperation with such systems, but Kafka provides the offset which makes implementing this feasible (see also Kafka Connect). Otherwise, Kafka guarantees at-least-once delivery by default, and allows the user to implement at-most-once delivery by disabling retries on the producer and committing offsets in the consumer prior to processing a batch of messages

因此Kafka在Kafka Streams 中有效地支持exactly-once delivery ，在Kafka topics之间传输和处理数据时，一般可以使用事务性生产者/消费者来提供exactly-once交付。其他目标系统的 Exactly-once 交付通常需要与此类系统合作，但 Kafka 提供了偏移量，这使得实现这一点变得可行（另请参见Kafka Connect）。否则，Kafka 默认保证至少一次交付，并允许用户通过在处理一批消息之前禁用对生产者的重试并在消费者中提交偏移量来实现最多一次交付

欢迎关注公众号算法小生

6.Kafka系列之设计思想(四)-消息传递语义

猜你喜欢