Kafka high-performance design: memory pool

Preface

Kafka's memory pool is a cache area used to manage memory allocation. It reserves a fixed-size memory pool in memory for allocating objects such as message cache and batch cache to reduce the cost of frequently calling memory allocation functions.

The implementation of Kafka memory pool utilizes ByteBuffer in Java NIO. When a new cache object needs to be created, the memory pool will take out a fixed-size memory block and save a reference to the memory block in the pool where the memory pool object is stored. When the memory block is no longer used, the memory pool will reclaim it for the next use.

Using a memory pool can improve the performance of Kafka producers, because message middleware such as object kafka needs to create objects frequently. We know that frequently creating objects consumes memory. Using a memory pool can reduce memory consumption. In addition, the memory pool also It can reduce the generation of memory fragmentation and improve memory usage efficiency.

accomplish

Below we will introduce the implementation of the object memory pool in detail from several aspects.

Create memory pool

When Kafka is initialized, the memory pool will be initialized. On the Kafka Producer side, there is a BufferPool, and its related configuration parameters are buffer.memoryand batch.size, buffer.memorywhich represents the size of the buffer memory. The default is 32M, which batch.sizerepresents the size of the message batch. The default is 16kb. In BufferPool, batch.sizeit actually represents the size of a ByteBuffer, because BufferPool only manages batch.sizethe size of ByteBuffer. When kafka is initialized, a buffer (new BufferPool) will be created, as follows, when creating the message collector RecordAccumulator , the BufferPool is created.

this.accumulator = new RecordAccumulator(logContext,
                    batchSize,
                    this.compressionType,
                    lingerMs(config),
                    retryBackoffMs,
                    deliveryTimeoutMs,
                    partitionerConfig,
                    metrics,
                    PRODUCER_METRIC_GROUP_NAME,
                    time,
                    apiVersions,
                    transactionManager,
                    new BufferPool(this.totalMemorySize, batchSize, metrics, time, PRODUCER_METRIC_GROUP_NAME));
复制代码

Allocate memory

We know that Kafka messages are not sent directly to the broker, but are first sent to the message collector RecordAccumulator. When sending messages to the RecordAccumulator, you need to apply for memory first. If the size of the message is larger than the size of the memory pool BufferPool, then this is not allowed. Yes, an exception will be thrown. For example, the size of my message is 40M, but the size of the memory pool is 32M, then obviously the BufferPool cannot hold the message and an error will be reported.

In the previous article, we said that messages are stored in queues in the form of ProducerBatch. When sending a message, the queue corresponding to the partition is obtained. If the queue does not exist, a queue is created. This queue is the queue where ProducerBatch is installed. , is Deque, and then take out a ProducerBatch from the queue. If a ProducerBatch exists, then judge whether the ProducerBatch is enough to hold the message. If it can hold the message, then load the message. If it cannot hold it, then create a new one. ProducerBatch, then add the message to the newly created ProducerBatch, and finally add the ProducerBatch to the queue, and then release the ProducerBatch, which actually releases the ProducerBatch in the ByteBuffer, because the ProducerBatch itself is carried by the ByteBuffer.

If the length of the message is greater than 16kb ( 注意,这个16kb是batch.size参数的默认值,如果我们对batch.size进行设置,那么就按照我们设置的值来算), then it will be created according to the actual size of the message. If it is less than or equal to 16kb, then it will be created according to 16kb. As shown in the following code, the batchSize will be compared with the size of our message. Select the largest one and then allocate the Buffer.

We know that ProducerBatch is placed in ByteBuffer, so when creating ProducerBatch, we will apply for a ByteBuffer. If our message is less than or equal to ( batch.sizedefault is 16kb), then we will go to the buffer pool BufferPool to get a ByteBuffer for use by ProducerBatch. , as shown in the figure above, these ByteBuffers are managed by the buffer pool BufferPool. If our message is larger than batch.size, then the ByteBuffer in the buffer pool cannot be used. As follows, in the allocate method, if the size of the ByteBuffer required by our message is equal to poolableSizeand there is a ByteBuffer in the BufferPool, then a ByteBuffer is directly obtained from the queue of the BufferPool, poolableSizein fact batch.size.

free memory

After we send the message, we need to release the ByteBuffer, and then add the ByteBuffer to the BufferPool for later use. If it is 注意,只有batch.size大小的ByteBuffer才能加入BufferPool中,后面才能复用,大于batch.size的ByteBuffer不能加入BufferPool中larger than batch.size, it is related to the memory of the non-buffer pool, and it is related to the value of nonPooledAvailableMemory, so we will not go there. To describe it in detail, as follows, clear the ByteBuffer through buffer.clear(), and then add the cleared buffer to the queue.

Summarize

Above we have analyzed why Kafka uses memory pools, the benefits of using memory pools, and then analyzed how to implement it, giving detailed explanations from creation, use and release. However, what we should remember is that Kafka uses The condition of the memory pool is that the size of our message must be less than or equal to batch.sizethe value, so that the memory pool can play its role. If our message is large, but we have not set batch.sizeit and use the default value, then the memory will not be used. Pool, cannot exert its performance.

Guess you like

Origin blog.csdn.net/2301_76607156/article/details/130558112