4.Kafka系列之设计思想(二)

4.3 Efficiency效率

We have put significant effort into efficiency. One of our primary use cases is handling web activity data, which is very high volume: each page view may generate dozens of writes. Furthermore, we assume each message published is read by at least one consumer (often many), hence we strive to make consumption as cheap as possible.

我们在效率方面付出了巨大的努力。我们的主要用例之一是处理网络活动数据，该数据量非常大：每次页面浏览都可能产生数十次写入。此外，我们假设发布的每条消息都被至少一个消费者（通常是很多）读取，因此我们努力使消费尽可能简单

We have also found, from experience building and running a number of similar systems, that efficiency is a key to effective multi-tenant operations. If the downstream infrastructure service can easily become a bottleneck due to a small bump in usage by the application, such small changes will often create problems. By being very fast we help ensure that the application will tip-over under load before the infrastructure. This is particularly important when trying to run a centralized service that supports dozens or hundreds of applications on a centralized cluster as changes in usage patterns are a near-daily occurrence.

根据我们构建和运行许多类似系统的经验，我们发现效率是有效的多租户操作的关键。如果下游基础设施服务因应用程序使用量的轻微增加而很容易成为瓶颈，这些小变化通常会造成问题。通过非常快速地运行，我们可以确保这种情况发生转变。尝试在集中式集群上运行支持数十个或数百个应用程序的集中式服务时，这特别重要，因为使用模式的变化几乎每天都会发生

We discussed disk efficiency in the previous section. Once poor disk access patterns have been eliminated, there are two common causes of inefficiency in this type of system: too many small I/O operations, and excessive byte copying.

在前面的部分中，我们讨论了磁盘效率。一旦消除了磁盘访问模式不良的情况，此类系统中存在两种常见的低效原因：太多的小I/O操作和过多的字节复制。

The small I/O problem happens both between the client and the server and in the server’s own persistent operations.

小I/O问题在客户端和服务器之间以及服务器自身的持久化操作中都会出现

To avoid this, our protocol is built around a “message set” abstraction that naturally groups messages together. This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time. The server in turn appends chunks of messages to its log in one go, and the consumer fetches large linear chunks at a time.

为了避免这种情况，我们的协议建立在“消息集”抽象之上，自然地将消息组合在一起。这允许网络请求将消息组合在一起，分摊网络往返的开销，而不是一次只发送单个消息。服务器反过来将消息块一次性附加到其日志中，消费者一次获取大块线性消息

This simple optimization produces orders of magnitude speed up. Batching leads to larger network packets, larger sequential disk operations, contiguous memory blocks, and so on, all of which allows Kafka to turn a bursty stream of random message writes into linear writes that flow to the consumers.

这个简单的优化使速度提高了数个数量级。批处理会导致更大的网络数据包、更大的连续磁盘操作、连续的内存块等，所有这些都使Kafka将突发随机消息写入流转换为线性写入，以便流向消费者

The other inefficiency is in byte copying. At low message rates this is not an issue, but under load the impact is significant. To avoid this we employ a standardized binary message format that is shared by the producer, the broker, and the consumer (so data chunks can be transferred without modification between them).

另一种低效是字节复制。在低消息速率下，这不是问题，但在负载下，其影响很大。为了避免这种情况，我们使用了一种标准化的二进制消息格式，生产者、代理和消费者共享该格式（因此可以在它们之间传输数据块而不需要修改）

The message log maintained by the broker is itself just a directory of files, each populated by a sequence of message sets that have been written to disk in the same format used by the producer and consumer. Maintaining this common format allows optimization of the most important operation: network transfer of persistent log chunks. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket; in Linux this is done with the sendfile system call.

代理维护的消息日志本身只是一个文件目录，每个文件都由一系列以生产者和消费者使用的相同格式写入磁盘的消息集填充。保持这个通用格式可以优化最重要的操作：持久日志块的网络传输。现代Unix操作系统提供了一个高度优化的代码路径，可以将数据从页面缓存传输到套接字；在Linux中，这是通过sendfile系统调用完成的

To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:

要了解sendfile的影响，重要的是要理解从文件到套接字传输数据的常见数据路径

1.The operating system reads data from the disk into pagecache in kernel space
2.The application reads the data from kernel space into a user-space buffer
3.The application writes the data back into kernel space into a socket buffer
4.The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network

1.操作系统从磁盘读取数据到内核空间的pagecache
2.应用程序从内核空间读取数据到用户空间缓冲区
3.应用程序将数据写回内核空间的套接字缓冲区
4.操作系统将数据从套接字缓冲区复制到 NIC 缓冲区，并在此处通过网络发送

This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.

这显然是低效的，存在四个副本和两个系统调用。使用sendfile可以避免重新复制，允许操作系统直接将页面缓存中的数据发送到网络。因此，在这个优化的路径中，只需要将数据最终复制到NIC缓冲区中

We expect a common use case to be multiple consumers on a topic. Using the zero-copy optimization above, data is copied into pagecache exactly once and reused on each consumption instead of being stored in memory and copied out to user-space every time it is read. This allows messages to be consumed at a rate that approaches the limit of the network connection.

我们预计一个常见的使用情况是在一个主题上有多个消费者。利用上述的零拷贝优化，数据只被复制到页面缓存一次，并且在每次消费时被重复使用，而不是每次读取时存储在内存中并复制到用户空间。这允许消息以接近网络连接极限的速率被消费

This combination of pagecache and sendfile means that on a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache.

这种页面缓存和sendfile的组合意味着，在一个Kafka集群中，如果消费者大部分时间都是跟上生产者的进度，那么你在磁盘上不会看到任何读取活动，因为它们完全是从缓存中提供数据的

TLS/SSL libraries operate at the user space (in-kernel SSL_sendfile is currently not supported by Kafka). Due to this restriction, sendfile is not used when SSL is enabled. For enabling SSL configuration, refer to security.protocol and security.inter.broker.protocol

TLS/SSL库在用户空间中运行（内核中的SSL_sendfile目前不支持Kafka）。由于此限制，当启用SSL时，不使用sendfile。要启用SSL配置，请参考security.protocol和security.inter.broker.protocol

For more background on the sendfile and zero-copy support in Java, see this article.
有关Java中sendfile和零拷贝支持的更多背景信息，请参阅此文章

End-to-end Batch Compression端到端批量压缩

In some cases the bottleneck is actually not CPU or disk but network bandwidth. This is particularly true for a data pipeline that needs to send messages between data centers over a wide-area network. Of course, the user can always compress its messages one at a time without any support needed from Kafka, but this can lead to very poor compression ratios as much of the redundancy is due to repetition between messages of the same type (e.g. field names in JSON or user agents in web logs or common string values). Efficient compression requires compressing multiple messages together rather than compressing each message individually.

在某些情况下，瓶颈实际上不是CPU或磁盘，而是网络带宽。这在需要在广域网上在数据中心之间发送消息的数据管道中尤为明显。当然，用户始终可以一个接一个地压缩其消息，而不需要Kafka的任何支持，但这可能会导致非常低的压缩比率，因为许多冗余性是由于同一类型的消息之间的重复（例如JSON中的字段名称或Web日志中的用户代理或常见字符串值）。有效的压缩需要将多个消息一起压缩，而不是单独压缩每个消息

Kafka supports this with an efficient batching format. A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be written in compressed form and will remain compressed in the log and will only be decompressed by the consumer.

Kafka支持使用高效的批量格式实现这一点。一批消息可以被压缩并以这种形式发送到服务器。这一批消息将以压缩形式写入并保持在日志中，并且只由消费者解压缩

Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocols. More details on compression can be found here.

Kafka支持GZIP、Snappy、LZ4和ZStandard压缩协议。可以在这里找到有关压缩的更多详细信息
欢迎关注公众号算法小生

4.Kafka系列之设计思想(二)

猜你喜欢