Kafka memory pool for high-performance design

foreword

Kafka's memory pool is a cache area used to manage memory allocation. It reserves a fixed-size memory pool in memory for allocating objects such as message cache and batch cache to reduce the overhead of frequently calling memory allocation functions.

The implementation of Kafka memory pool utilizes ByteBuffer in Java NIO. When a new cache object needs to be created, the memory pool will take out a memory block of fixed size, and save the reference of the memory block in the pool storing the memory pool object. When the memory block is no longer used, the memory pool will reclaim it for the next use.

Using memory pools can improve the performance of Kafka producers, because message middleware such as object Kafka needs to create objects frequently. We know that frequent object creation consumes memory, and using memory pools can reduce memory consumption. In addition, memory pools also It can reduce memory fragmentation and improve memory usage efficiency.

accomplish

Below we introduce the implementation of the object memory pool in detail from several aspects.

Create a memory pool

When kafka is initialized, the memory pool will be initialized. On the Kafka Producer side, there is a BufferPool. The configuration parameters related to it are buffer.memoryand batch.size, buffer.memorywhich represent the size of the buffer memory. The default is 32M, which batch.sizerepresents the size of the message batch. The default is 16kb. In BufferPool, batch.sizeit actually represents the size of a ByteBuffer, because BufferPool only manages batch.sizeByteBuffers of large size. When kafka is initialized, a buffer (new BufferPool) will be created, as follows, when creating a message collector RecordAccumulator , the BufferPool is created.

this.accumulator = new RecordAccumulator(logContext,
                    batchSize,
                    this.compressionType,
                    lingerMs(config),
                    retryBackoffMs,
                    deliveryTimeoutMs,
                    partitionerConfig,
                    metrics,
                    PRODUCER_METRIC_GROUP_NAME,
                    time,
                    apiVersions,
                    transactionManager,
                    new BufferPool(this.totalMemorySize, batchSize, metrics, time, PRODUCER_METRIC_GROUP_NAME));
复制代码

Allocate memory

We know that Kafka's messages are not sent directly to the broker, but first sent to the message collector RecordAccumulator, and when the message is sent to the RecordAccumulator, it is necessary to apply for memory first. If the size of the message is larger than the size of the memory pool BufferPool, then this is not allowed Yes, an exception will be thrown. For example, the size of my message is 40M, but the size of the memory pool is 32M. Obviously, the BufferPool cannot hold the message, and an error will be reported.

In the previous article, we said that messages are stored in queues in the form of ProducerBatch. When sending messages, get the queue corresponding to the partition. If the enqueue queue does not exist, create a queue. This queue is the queue where ProducerBatch is installed. , which is Deque, and then take out a ProducerBatch from the queue. If there is a ProducerBatch, then judge whether the ProducerBatch is enough to hold the message. If it can hold the message, then load the message. If it cannot hold it, then create a new one. ProducerBatch, then add the message to the newly created ProducerBatch, and finally add the ProducerBatch to the queue, and then release the ProducerBatch, in fact, release the ProducerBatch in the ByteBuffer, because the ProducerBatch itself is carried by the ByteBuffer.

If the length of the message is greater than 16kb ( 注意,这个16kb是batch.size参数的默认值,如果我们对batch.size进行设置,那么就按照我们设置的值来算), then it will be created according to the actual size of the message. If it is less than or equal to 16kb, then it will be created according to 16kb. As shown in the following code, the batchSize will be compared with the size of our message. Select the largest one, and then allocate Buffer.

We know that ProducerBatch is placed in ByteBuffer, so when creating ProducerBatch, we will apply for a ByteBuffer. If our message is less than or equal to ( batch.size16kb by default), then we will go to the buffer pool BufferPool to take a ByteBuffer for ProducerBatch , as shown in the figure above, these ByteBuffers are managed by the buffer pool BufferPool, if our message is larger than that batch.size, then the ByteBuffer in the buffer pool cannot be used. As follows, in the allocate method, if the size of the ByteBuffer required by our message is equal to poolableSizeand there is a ByteBuffer in the BufferPool, then it is a long time to directly obtain a ByteBuffer from the queue of the BufferPool, poolableSizein fact batch.size.

free memory

After we send the message, we need to release the ByteBuffer, and then add the ByteBuffer to the BufferPool for later use. If it is 注意,只有batch.size大小的ByteBuffer才能加入BufferPool中,后面才能复用,大于batch.size的ByteBuffer不能加入BufferPool中larger than batch.size, it is related to the memory of the non-buffer pool and the value of nonPooledAvailableMemory. In detail, as follows, clear the ByteBuffer through buffer.clear(), and then add the cleared buffer to the queue.

Summarize

Above we have analyzed why kafka uses memory pools, the benefits of using memory pools, and then analyzed how it is implemented, and explained in detail from creation, use and release, but we should remember that kafka uses The condition of the memory pool is that the size of our message must be less than or equal to batch.sizethe value, so that the memory pool can play its role. If our message is large, but it is not batch.sizeset, and the default value is used, then the memory cannot be used Pool, can not play its performance.

Guess you like

Origin blog.csdn.net/ww2651071028/article/details/130552008