Article directory

Preface
Producer overall process
Producer initialization
Producer sending process
Producer buffer
Producer parameter tuning

Preface

In Kafka, the party that generates messages is called Producer, which is one of the core components of Kafka and the source of messages. Its main function is to package and encapsulate the client's request and send it to a certain partition of a certain topic in the kafka cluster. So how do the messages generated by these producers reach the Kafka server?

Producer overall process

The process of sending and consuming a message in Kafka

From the core perspective of source code, Producer can be divided into the following core parts:

Producer initialization
Producer sending process
Producer buffer
Producer parameters and tuning

Producer initialization

Because there is a lot of additional processing in the source code, it is not necessary to read every line to interpret the source code. You only need to find the core code according to the main process of sorting and interpret it.

Set the partitioner (partitioner) , the partitioner supports customization

Set retry time

The retry time (retryBackoffMs) defaults to 100ms. If an exception is thrown when sending a message to the broker, and it is an exception that allows retries, then it will be retried the maximum number of times specified by the retries parameter, and retryBackoffMs is the retry interval.

Set up serializer

Set interceptors

Interceptors are generally not used much. They can add fields to messages uniformly or count the number of successful sending failures. These logics will slow down the message sending efficiency of the producer and are not recommended for use in production.

To implement an interceptor, we need to first implement the ProducerInterceptor interface, and then set it in the producer.

Some settings in the picture above

Set the maximum message size (maxRequestSize), the default maximum is 1M, and the production environment can be increased to 10M
Set the cache size (totalMemorySize), the default is 32M
Set compression format (compressionType)
Initialize RecordAccumulator, that is, the buffer is specified as 32M

Set buffer

Set message accumulator

Because the producer sends through buffering, a message accumulator is required to complete the message sending.

Initialize cluster metadata (metadata)

Create Sender thread

An important component for managing the network, NetworkClient, is also initialized here.

KafkaThread sets Sender as daemon thread and starts

Producer sending process

Execute interceptor logic

Execute interceptor logic, preprocess messages, and encapsulate Producer Record

Get cluster metadata

Get cluster metadata metadata from Kafka Broker cluster

Serialization

Call the Serializer.serialize() method to perform key/value serialization of the message

Select partition

Call partition() to select the appropriate partition strategy and assign the topic partition number to be sent to the message body Producer Record.

Messages are accumulated into cache

Cache the message into the RecordAccumulator collector, and finally determine whether to send it.

message sending

The actual message sending is done by the Sender thread, and it must be processed in combination with the buffer. Here we only need to know the conditions for sending: the buffer data size reaches batch.size or linger.ms reaches the upper limit.

Producer buffer

The Kafka producer's buffer, that is, the memory pool, can be compared to a connection pool (DB, Redis), mainly to avoid unnecessary overhead of creating connections. In this way, the memory pool can reuse RecordBatch to prevent Full GC problems.

The core is this code:

Kafka's memory design has two parts, available memory (unallocated memory, initially 32M) and allocated memory. Each small batch is 16K, and then each batch can be used repeatedly. You need to apply for memory every time, and the two parts add up to 32M.

The process of applying for memory

During the sending process, the message will be put into the accumulator, that is, accumulator.append() is called to append, and then the message is encapsulated into batches for sending, and then the memory is applied for (free.allocate()).

If the requested memory size exceeds the size of the entire cache pool, an exception will be thrown.
If the requested size is the size of each recordBatch (16K), and the allocated memory is not empty, one will be taken out and returned directly.
If the size of the entire memory pool is larger than the memory size to be applied for (this.availableMemory + freeListSize >= size), apply for a piece of memory directly from the available memory.

Producer parameter tuning

In the actual use of Kafka, the Producer side must ensure throughput and no message loss, so the configuration of some core parameters is crucial.

acks

Parameter description: This is a very important parameter for Kafka Producer. It indicates the number of copies of successfully written messages in the specified partition and is a guarantee of the durability of Kafka production-side messages.

max.request.size

Parameter description: This parameter is also important for Kafka Producer, indicating the maximum message size that the production end can send. The default value is 1048576 (1M) .

Tuning suggestions: This configuration is a bit small for a production environment. In order to avoid failure to send due to too large messages, it is recommended to increase the configuration appropriately in the production environment, for example, it can be adjusted to 10485760 (10M) .

retries

Parameter description: Indicates the number of retries when the production end message fails to be sent. The default value is 0, which means no retries. This parameter is generally used to solve message sending failures caused by transient system failures, such as network jitter, Leader election and re-election, among which instantaneous Leader re-election is relatively common. Therefore, the setting of this parameter is very important for Kafka Producer .

Tuning suggestions: It is recommended to set it to a value greater than 0, such as 3 times.

retry.backoff.ms

Parameter description: ** Set the time interval between two retries to avoid invalid frequent retries. The default value is 100. ** Mainly used in conjunction with retries. Before configuring retries and retry.backoff.ms, it is best First estimate the possible exception recovery time. You need to set the total retry time to be greater than the exception recovery time to prevent producers from giving up retrying prematurely.

connections.max.idele.ms

Parameter description: Mainly used to determine how long it will take to close an idle link. The default value is 540000 (ms), which is 9 minutes.

compression.type

Parameter description: This parameter indicates whether the production end wants to compress the message. The default value is no compression (none). Compression can significantly reduce network IO transmission, disk IO and disk space, thus improving overall throughput, but it also comes at the expense of CPU overhead.

Tuning suggestions: To improve throughput, it is recommended to compress messages on the production side. For Kafka, considering throughput and compression ratio, it is recommended to choose lz4 compression. If you are pursuing the highest compression ratio, zstd compression is recommended.

buffer.memory

Parameter description: This parameter indicates the size of the production end message buffer pool or buffer. The default value is 33554432 (32M) . This parameter can basically be thought of as the memory size used by the Producer program.

Tuning suggestions: Usually we should try our best to ensure the overall throughput of the production end. It is recommended to increase this parameter appropriately, which also means that the production client will occupy more memory.

batch.size

Parameter description: This parameter indicates that the messages sent to the buffer will be encapsulated into Batch one by one and sent to the Broker in batches. The default value is 16KB. Therefore, reducing the batch size is beneficial to reducing message delay, and increasing the batch size is beneficial to improving throughput.

Tuning suggestions: Generally, increasing the value of this parameter reasonably can significantly improve the throughput of the production end. For example, it can be adjusted to 32KB. Increasing the value also means that messages will have a relatively large delay.

linger.ms

Parameter description: This parameter indicates the maximum idle time used to control the Batch. Batch exceeding this time will also be automatically sent to the Broker. In practice, this is a trade-off between throughput and latency. The default value is 0, which means the message needs to be sent immediately, regardless of whether the batch is filled.

Tuning suggestions: Usually in order to reduce the number of requests and improve the overall throughput, it is recommended to set a value greater than 0, such as setting it to 100, which will cause a delay of 100ms under low load conditions.

Kafka source code analysis——Producer