Apache Kafka illustrates message offset Evolution (0.7.x ~ 0.10.x) [] needs to be further understood that

Illustrates the evolution of Apache Kafka message offset (0.7.x ~ 0.10.x)

 

I "Apache Kafka evolution message format (0.7.x ~ 0.10.x)" describes the article  Kafka  several versions of the message format. Careful students certainly see the Message in MessageSet has a Offset in correspondence with them, this article will discuss  Kafka  versions of message processing offset. Also from  Kafka  began to introduce 0.7.x, and in turn introduced to Kafka 0.10.x, due Kafka 0.11.x under development, and the message format has a large and previous versions are not the same, so do not intend to introduce here.

Article Directory

Kafka 0.7.x

I "Apache Kafka evolution message format (0.7.x ~ 0.10.x)" describes the format of the article when MessageSet said Offset field stores the physical offset after the message is stored to disk; note that this is a physical bias shift amount, what does that mean? Look at the following chart we should have it:

Kafka_0.7.x Message offset
Figure I

As can be seen from the figure, there is a disk offset each message is an absolute offset from the beginning of the file thereof. The above example, the offset of the first message is 0; offset of the second message is the total length of the first message; third message is the total length of the first two messages; and so on. This message stored offset well understood, the process is also very convenient.

We need to note, the message is stored in the disk is offset by the Broker process is completed, the reason is very simple, because only know now Log Broker end of the latest offset; Producer end is not obtained. The same logic applies to Kafka 0.8.x, Kafka 0.9.x and Kafka 0.10.x.

Above are merely non-compressed offset message processing, we look at the offset process is a version of the compressed message Zeyang, as shown below:

Message compression offset Kafka_0.7.x
Figure II

As shown above, a sub message compression internal message does not set the offset, offset external message setting rules and the logic uncompressed uniform message.

Advantages and disadvantages

There are several problems with this design:

  • Message compression is difficult to be inside the checkpoint message;
  • Message compression is difficult to locate the internal operation of the message;
  • log compaction well done.

But this design is also good:

  • Message Broker processing speed is very fast from Producer
  • CPU utilization is generally <10%
  • General network is a major bottleneck here.

Kafka 0.8.x

For a variety of problems Kafka 0.7.x version of the message offset, this version be resolved, this version of the message offset processing results are as follows:

Kafka_0.8.x Message offset
Figure III

What is our first in the end regardless of the message format changes (really want to know, please see "Apache Kafka evolution message format (0.7.x ~ 0.10.x)" ), it is clear that a change is on the map refers to offset the message is not physical offset, but rather an absolute offset, this offset from zero. Absolute offset of the first message is 0; the absolute offset of the second message is 1; and so on. Also, this offset is calculated by the Broker process.

Compressed message offset processing logic is more complicated than this, take a look at a map it results:

Message compression offset Kafka_0.8.x
Figure IV

这个图相对于 Kafka 0.7.x压缩消息的最明显变化就是,压缩消息内部的消息也有偏移量了!对于压缩消息的偏移量处理相对于 Kafka 0.7.x 复杂多了,下面我们将详细介绍 Kafka 是如何处理的:

Producer端对于压缩消息偏移量处理

Producer 端会对压缩消息中内部的消息设置一个相对偏移量。从0开始,依次到n-1,这里的n代表压缩消息的条数。处理的效果如下:

Kafka_0.8.x Producer Message offset
图五

偏移量设置好之后,Producer 端会将整个 MessageSet 进行压缩,然后发送到Broker。

Broker端对于压缩消息偏移量处理

Broker 端接收到 Producer 发送过来的压缩消息,忽略掉 Producer 端对压缩消息偏移量的而处理,其会先解压接收到的压缩消息,然后根据 nextOffset 依次设置压缩消息内部消息的偏移量,最后整个压缩消息的偏移量为最后一条内部消息的绝对偏移量。举个例子,比如图四最后一条消息的偏移量是7,那么 nextOffset 应该为 8;现在 Broker 接收到图五的消息,最后的处理如下:

Kafka_0.8.x Broker Message offset
图六

偏移量设置完之后, Broker 需要重新压缩刚刚解压好的消息,最后会将这条消息追加到 Log 文件中。

Client端对于压缩消息偏移量处理

Client 端如果请求压缩的消息,Broker 端会直接将整个压缩的消息发送到 Client,Client会自动将压缩的消息解压,解压的过程对我们编程的人来说是无感知的。

问题:为什么整个压缩消息的偏移量为最后一条内部消息的绝对偏移量呢?【有待进一步理解】

这样设计其实是有目地的,由于 FetchRequest 协议中的 offset 是要求 Broker 提供大于等于这个 offset 的消息,因此 Broker 会检查log,找到符合条件的,然后传输出去。那么由于FetchRequest中的offset位置的消息可位于一个compressed message中,所以broker需要确定一个compressed Message是否需要被包含在respone中。

  • 如果我们将整个压缩消息的偏移量为第一条内部消息的绝对偏移量。那么,我们对于这个Message是否应包含在response中,无法给出是或否的回答。比如 FetchRequest 中指明的开始读取的offset是14,而一个compressed Message的offset是13,那么这个Message中可能包含offset为14的消息,也可能不包含。
  • 如果我们将整个压缩消息的偏移量为最后一条内部消息的绝对偏移量。那么,可以根据这个offset确定这个Message应不应该包含在response中。比如 FetchRequest 中指明的开始读取的offset是14,那么如果一个compressed Message的offset是13,那它就不该被包含在response中。而当我们顺序排除这种不符合条件的Message,就可以找到第一个应该被包含在response中的Message(压缩或者未压缩), 从它开始读取。

在第一种情况下(最小offset),我们尽管可以通过连续的两个Message确定第一个Message的offset范围,但是这样在读取时需要在读取第二个Message的offset之后跳回到第一个Message, 这通常会使得最近一次读(也就读第二个offset)的文件系统的缓存失效。而且逻辑比第二种情况更复杂。在第二种情况下,broker只需要找到第一个其offset大于或等于目标offset的Message,从它可以读取即可,而且也通常能利用到文件系统缓存,因为offset和消息内容有可能在同一个缓存块中。

优缺点

这个版本的压缩消息中内部的消息也有偏移量了,这样就可以对内部消息进行定位处理。而且log compaction实现起来很方便。但是这个版本的消息偏移量也有个很明显的问题,就是对于每条压缩的消息,Broker 端都需要对其进行解压,设置好相关的偏移量之后,再进行压缩,这些都会占用很多的CPU资源。

Kafka 0.10.x

Kafka 0.10.x 对于非压缩的消息偏移量处理和 Kafka 0.8.x 一致,这里就不再介绍了。这里主要介绍 Kafka 0.10.x 对压缩消息偏移量处理逻辑。和 Kafka 0.8.x 处理内部消息偏移量逻辑不一样,这个版本对于内部消息偏移量使用的是相对偏移量,从0开始,依次到n-1,这里的n代表压缩消息的条数。所以 Kafka 0.10.x 压缩消息处理完偏移量之后看起来像下面的结果:

Kafka_0.10.x Broker Message offset
图七

从上图可以看出,相对于 Kafka 0.8.x 仅仅是内部消息偏移量变成了相对偏移量,整个压缩消息的偏移量处理逻辑和 Kafka 0.8.x 一致。下面我们将详细介绍 Kafka 是如何处理的:

Producer端对于压缩消息偏移量处理

这个逻辑和 Kafka 0.8.x 处理逻辑一致,不再介绍。有一点需要注意,Kafka 0.10.x 会将消息的 magic 值设置为 1,用于区分其他版本的消息,后面会介绍这样设置的用处。

Broker端对于压缩消息偏移量处理

Broker 端接收到 Producer 发送过来的压缩消息,其也是先解压接收到的压缩消息,然后做一堆的判断,比如 消息的 magic 值是否大于0,压缩消息内部的消息偏移量值是否连续(0,1,2,3这样的)等,如果符合这些条件(inPlaceAssignment = true),那么 Broker 会直接处理整个压缩消息外部的偏移量,内部消息的偏移量不需要设置,因为这个在 Producer 端已经设置好了;并不需要再次压缩消息,最后会将这条消息追加到 Log 文件中。

如果 inPlaceAssignment = false,这时候会直接操作解压后的消息,并给压缩消息内部消息设置偏移量,最后设置整个压缩消息的偏移量;这时候会忽略掉 Producer 端为压缩消息设置的偏移量,包括内部消息和整个压缩消息的偏移量。整个处理逻辑分为两种情况:

  • 如果接收到的消息不是由 Kafka 0.10.x 版本Producer客户端发送过来的,那么消息的 magic 值会等于0,这时候 Broker 设置偏移量逻辑和 Kafka 0.8.x 处理逻辑一致,也就是不管内部消息还是整个压缩消息的偏移量都是使用绝对偏移量;
  • 如果接收到的消息是由 Kafka 0.10.x 版本Producer客户端发送过来的,那么消息的 magic 值会等于1,这时候 Broker 会将压缩消息内部的消息偏移量设置成相对的,从0开始,依次到 n-1 ,最后整个压缩消息的偏移量为nextOffset + n - 1,其中n为压缩消息的条数。处理结果如下:

    Kafka_0.10.x Broker Message offset
    图八

After offsets are set, for inPlaceAssignment = false, whether it is a message sent over by what version, Broker  need to decompress compressed just good news , and places the message is appended to the Log file.

Client-side processing for the compressed message offset

Different versions of the Client request, make different Broker determination: For non Kafka 0.10.x version Consumer, end message send Broker does not use zero-copy technique ; and if it is Kafka 0.10.x version Consumer, end Broker sending a message only using the zero-copy

Original: https://www.iteblog.com/archives/2235.html

Reprinted from past memories (https://www.iteblog.com/)

Guess you like

Origin blog.csdn.net/BD_fuhong/article/details/92367997