Meituan Two Sides: How is Kafka’s high-throughput architecture design reflected on the production side?

One of the characteristics of Kafka is its high throughput, and it is the preferred message queue for big data scenarios . According to real production environment data, a single Kafka machine can simultaneously produce and consume millions of data volumes.

What kind of concept is this? Let's illustrate with a test of sending messages from the production end in the production environment.

  • Production environment configuration: 8-core CPU, 32G memory, 3 brokers are installed on 3 machines, the intranet bandwidth is very high, and the network bandwidth bottleneck is negligible.
  • Test method: The size of each message is designed to be 100B, and then test 1, 2, and 3 producers to produce messages, while 1, 2, and 3 consumers consume messages, and finally obtain the number of messages and message bytes that are successfully produced and consumed .

The throughput of messages sent by the producer

When 3 Producers send messages to 3 Brokers, the producers produce an average of 1 million messages per second to each Broker.

Here are the test results:

D111DB5C-30AB-48FB-B42C-B2C2F14BA555.png

Throughput of consumers consuming messages

Here are the test results:

373D6D2C-76D6-489E-88B0-064345785FBE.png

When 3 Consumers pull messages from 3 Brokers, consumers pull more than 2 million messages from each Broker per second on average. Isn't this effect amazing?

So, how does Kafka achieve such high throughput?

Kafka high-throughput architecture features

Kafka adopts a series of technical optimizations to ensure high throughput, including batch processing, compression, zero copy, sequential disk read and write, page caching technology, Reactor network architecture design mode, etc.

  • There are many technologies used. The way I explain here will be the same as the discussion of Kafka message reliability in the previous course. I will analyze and discuss it from three aspects: the production end, the server end, and the consumer end.
  • In addition to discussing the content of Kafka's design principles, I will also introduce relevant parameter configurations and corresponding source codes, and share some design ideas that can be used for reference when necessary.

In other words, I will take you through all the design points of Kafka's high throughput. As long as you can understand these points, you will have a deeper understanding of the parameter configuration related to Kafka's high throughput. At the same time, I will also take you to learn some high-performance design methods and the underlying working mode of the operating system, which are very helpful for you to efficiently design a high-throughput system in the future.

Today, we mainly discuss the embodiment of high throughput on the production side.

production end

How does Kafka's high throughput feature manifest itself on the production side? To answer this question, we must first understand how the production end sends messages, which is part of the foreshadowing knowledge. In order for you to better understand this process, I have drawn the following flow chart, which describes in detail the entire process of sending messages from the production end:

78308066-B5C5-4F8C-98CD-6CE7919AB34F.png

Combined with the figure, we can see that sending a message needs to go through 7 steps, and there are many steps, but in fact these steps can be divided into three major blocks, namely the main thread of KafkaProducer, the cache of RecordAccumulator and the sub-thread of Sender.

  • The main thread of KafkaProducer is mainly responsible for creating information, and calling the interceptor, serializer, and partitioner to intercept, serialize, and route partition the message respectively, then compress the message, and put the compressed message into the RecordAccumulator cache.
  • The RecordAccumulator cache creates a queue for each partition, which is a collection of messages to be sent to a partition.
  • The Sender sub-thread is the thread that actually sends the message. When certain conditions are met, the KafkaProducer main thread will activate the Sender sub-thread. The Sender sub-thread gets the message to be sent from the RecordAccumulator cache, and sends the message to the underlying network component for sending. For the data received and sent by the network, the network component will maintain it through two cache collections: completedReceives is responsible for saving the collection of completed network reception, and completedSends is responsible for saving the collection of completed network transmission. After the server returns a successful response to the Sender sub-thread, the Sender sub-thread will delete the successfully sent messages in the RecordAccumulator cache.

After introducing the architecture design of the production end, I will explain from the following three points how this architecture improves the throughput of messages.

1. Multi-threaded asynchronous design

The asynchronous design on the production side reflects two aspects.

In the first aspect, the KafkaProducer main thread and the Sender sub-thread perform their respective duties, and the interactive data is cached through the RecordAccumulator.

The KafkaProducer main thread has two sending modes: synchronous and asynchronous, but the underlying implementations of the two are the same, and both are implemented by sending messages asynchronously through the Sender sub-thread. The difference is that in the synchronous scenario, the main thread will wait for the Sender sub-thread to send the message before returning, while asynchronously will return without waiting for the Sender sub-thread to send the message.

When the KafkaProducer main thread sends a message, it is not a real network send, but the message is cached in the RecordAccumulator, and then the main thread returns from the send() method, and then the KafkaProducer main thread will continuously call the send() method to cache the message to the RecordAccumulator , regardless of whether the message is sent out. It is the Sender sub-thread that actually sends the message. The Sender sub-thread fetches the message from the RecordAccumulator cache, and then calls the underlying network component to complete the sending of the message.

Some students may have questions: why can't the main thread and the Sender sub-thread be put into one thread? What's wrong with a thread?

There are two processes for sending messages on the production side: creating messages and sending messages over the network. Both processes may be blocked. For example, the creation of messages depends on remote databases or caches. If the network is not good, threads will be blocked on message creation; and when the communication between the production end and the server is not good, it will also cause Blocking problem. If these two processes are placed in one thread, then one of them is blocked, which will affect the execution of the other process.

But if we assign the responsibility of creating messages to the main thread and the responsibility of sending messages to the sub-threads, the two processes will not affect each other. At the same time, there is a cache as a buffer, which can play a good role in "shaving peaks and filling valleys".

2. The Sender sub-thread is decoupled from the underlying communication module of Kafka.

The Sender sub-thread finally calls the underlying communication module of Kafka to send and receive messages.

We know that Java NIO essentially calls the Linux communication module. The bottom layer of Kafka encapsulates Java NIO components, especially org.apache.kafka.common.network.Selector (KSelector for short) encapsulates the Selector class of Java NIO. KSelector is in the Selector On the basis, two collections are added for buffering, namely completedReceives collection and completedSends collection. KSelector sent and received messages will be put into these two collections. The Sender sub-thread continuously tries to obtain messages from these two collections through the while(true) loop, thus realizing the decoupling of these two components. Good for high throughput.

3. Acquire data in batches in the cache and achieve efficient space utilization

This has a lot to do with the design of the RecordAccumulator class. The design diagram of the RecordAccumulator class is as follows:

CA3E3B15-6276-4C39-B5BB-1721C66E9402.png

As can be seen from the figure, there is a CopyOnWriteMap collection batches in RecordAccumulator. The key is the topic partition, the value is the ProducerBatch queue, and each partition corresponds to a queue. The elements in the queue are batches of ProducerBatch, and the messages are encapsulated in these batches for caching . The smallest unit of message sending is batch, which means that more than one message may be sent at a time. This design greatly reduces the number of network requests, thereby improving the efficiency of network reading and writing, and thus improving throughput.

Next, let's analyze the timing and logic of sending messages. The code is in the RecordAccumulator.drain() method, and its source code and source code comments are as follows:

`//Five judgment conditions determine whether it can be sent boolean sendable = full || expired || exhausted || closed || flushInProgress();

//Can send and is not backing off if (sendable && !backingOff) {  

//If it can be sent, join the readyNodes collection

readyNodes.add(leader); } else {

long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);

// How long is left: the waiting time - the waiting time

nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs); }`

Here I will focus on explaining the judgment logic of the boolean variable sendable that determines whether to send: as long as one of the five judgment conditions is satisfied, the message can be sent. These five conditions can be summarized as follows.

  • full : Whether the deque is greater than 1, or whether the first ProducerBatch of the deque is full.
  • expired : Whether ProducerBatch has timed out in the deque.
  • exhausted : Whether the BufferPool is releasing space.
  • closed : Whether the producer is about to shut down gracefully.
  • flushInProgress : Whether it is in flush operation, this flush is a mark to send the temporary message immediately.

The first judgment condition is whether the deque is greater than 1, or whether the first ProducerBatch of the deque is full. When the Broker load is not full, whether the first ProducerBatch of the deque is full is the timing for sending messages in most cases. Therefore, the producer sends messages not one by one, but one by one batch.

Next, let's analyze the efficient space utilization characteristics of the production end .

The cache space allocation is done by the BufferPool component, the following is its working principle diagram:

C11A5CCA-1410-4CF4-BF1F-516F1ABBB3A9.png

The default size of the entire BufferPool is 32M, and the internal memory area is divided into two parts: fixed-size memory block set free and non-pooled cache nonPooledAvailableMemory. The default size of a fixed size memory block is 16K. When ProducerBatch applies to BufferPool for a memory block of size size, BufferPool will judge which memory area allocates the memory block according to the size.

When the data of ProducerBatch is successfully sent, ProducerBatch will not be destroyed, but will remain in the set free, so that when ProducerBatch is needed, it can be directly taken out of the set, and there is no need to destroy and rebuild frequently. In fact, the bottom layer of ProducerBatch is Java NIO ByteBuffer. The creation and destruction of ByteBuffer consumes a lot of CPU resources. This design realizes the reuse of ByteBuffer, thus greatly reducing the consumption of resources.

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/131920454