Illustration of common performance optimization configuration parameters of Kafka Producer

1 Basic parameters

  • bootstrap.servers: Kafka broker server address list, ,separate, no need to write all, Kafka has an internal mechanism to automatically sense Kafka broker

  • client.dns.lookup: How the client looks for the bootstrap address. Two methods are supported:

    • resolve_canonical_bootstrap_servers_only: Based on the hostname (hostname) provided by bootstrap.servers, return an array of its IP address (InetAddress.getAllByName) according to the name service on the host, and then obtain inetAddress.getCanonicalHostName() in sequence, and then establish a tcp connection. A host can be configured with multiple network cards. If this function is enabled, the advantages of multiple network cards can be effectively used to reduce the load pressure on the Broker's network side.
    • use_all_dns_ips: Create a tcp connection directly using the hostname and port provided in bootstrap.servers, default option.
  • compression.type: message compression algorithm, optional values: none, gzip, snappy, lz4, zstd, no compression by default. It is recommended to configure the same as the Kafka server . Of course, the configurable compression type of the Kafka server is producer, that is, adopt and send The compression type configured by the user. The sender and Broker server use the same compression type , which can effectively avoid message compression and decompression on the Broker server and greatly reduce the CPU usage pressure of the Broker.

  • client.id: Client ID. If not set, the default is producer-increment. It is strongly recommended to set this value and try to include ip, port, and pid.

  • send.buffer.bytes: Send buffer size of network channel (TCP), default 128K

  • receive.buffer.bytes: Network channel (TCP) receive buffer size, default 32K

  • reconnect.backoff.ms: The waiting time to re-establish the link, the default is 50ms, it is an underlying network parameter and is basically not of concern.

  • reconnect.backoff.max.ms: The maximum waiting time for link reestablishment, the default is 1s. If the same connection is reconnected twice in a row, the waiting time will increase exponentially from the initial value of reconnect.backoff.ms, but it exceeds max. After that, it will no longer increase exponentially

  • key.serializer: serialization strategy of message key, org.apache.kafka.common.serialization interface implementation class, be careful not to import the wrong package

  • value.serializer: message body serialization strategy

  • partitioner.class: message sending queue load algorithm, default DefaultPartitioner, routing algorithm is as follows:

    • If key is specified, the hashcode of key is used modulo the number of partitions.
    • If key is not specified, all partitions will be polled.
  • interceptor.classes: interceptor list, kafka runs to intercept and process the message before it is actually sent to the broker.

  • enable.idempotence: Whether to enable the idempotence of the sender, default false

  • transaction.timeout.ms: The maximum timeout time for the transaction coordinator to wait for transaction status feedback from the client, the default is 60s

  • transactional.id: transaction id, used to uniquely identify a client in a transaction

2 Common parameters for performance optimization

When it comes to how message sending works, this section will first list the parameters and give a brief explanation, and then give an operation diagram to further elaborate on its working mechanism.

  • buffer.memory is used to set the memory size of the cache pool in a producer (KafkaProducer). The default is 32M.
  • max.block.ms When the message sender applies for free memory, the waiting time if the available memory is insufficient. The default is 60s. If the memory is not requested within the specified time, the message sender will directly report a TimeoutException. This time includes the sender. The time spent looking up meta-information .
  • retries The number of retries, the number of retries that the Kafka Sender thread attempts to send from the cache to the Broker. The default is Integer.MAX_VALUE. In order to avoid infinite retries, only recoverable exceptions are used. For example, this exception is recoverable in Leader election. , retrying will eventually solve the problem.
  • acks is used to define the condition (standard) of the message "submitted", which is the condition under which the Broker accepts submission to the client. The optional values ​​are as follows:
  • 0 means that the producer does not care about the processing result of the message on the broker side. As long as the send method of KafkaProducer is called and returned, it will be considered successful. Obviously this method is the most unsafe, because the Broker side may not receive the message at all or Storage failed.
  • All or -1 means that the message not only requires the Leader node to store the message, but also requires all its replicas (to be precise, the nodes in the ISR) to be considered submitted and a successful submission is returned to the client. This is the strictest durability guarantee, but also has the lowest performance.
  • 1 means that the message only needs to be written to the Leader node before the submission success can be returned to the client.
  • batch.size Kafka introduces the concept of batch on the message sending end. The messages sent to the server are usually not sent one by one, but in batches. This value is used to set the memory size of each batch . One batch corresponds to The source code level is the ProducerBatch object, which defaults to 16K.
  • linger.ms This parameter is used in conjunction with batch.size. Kafka hopes to send a message to the Broker batch by batch. When the application sends a message to KafkaProducer, it will first enter the internal buffer, specifically it will enter a certain batch (ProducerBatch) and wait for the batch to be full. The last time it is sent to the Broker, this can improve the throughput of the message, but the delay in sending the message will also increase accordingly . Imagine if at a certain time, the application sends too few messages to the broker, not enough to fill a Batch, wouldn't it mean that the message has never been sent to the Broker? In order to solve this problem, the linger.ms parameter came into being. Its function is to control the behavior of the message sending thread when the buffer is not full. If linger.ms is set to 0, it means sending immediately. If it is set to greater than 0, the message sending thread will wait for this value before sending it to the broker . It is somewhat similar to the Nagle algorithm in the TCP field .
  • delivery.timeout.ms is the expiration time of the message in the client cache. In Kafka's message sending model, the message first enters the double-ended cache queue of the message sending end, and then a single thread sends the message in the buffer to the Broker. This parameter controls the expiration time in the double-ended queue. The default is 120s. The time starts from entering the double-ended queue. If this value is exceeded, a timeout exception (TimeoutException) will be returned.
  • request.timeout.ms The request timeout is mainly the request timeout for the network communication between the Kafka message sending thread (Sender) and the Broker .
  • max.request.size The maximum number of bytes sent by the Send thread at one time, which is the maximum transmission data requested by the Send thread to send a message to the server. The default is 1M.
  • max.in.flight.requests.per.connection sets the number of backlogged messages in a channel at the application layer for each client-server connection. The default is 5. It is somewhat similar to how Netty uses high and low watermarks to control the backlog of messages in the sending buffer. , to avoid memory overflow.

3 Diagram of core data structure

4. The timing of action of graphic parameters

How to guide the performance optimization parameters of actual business and which parameters should be adjusted? Let’s talk about it in the next article.

The article is reprinted from the public account: JavaEdge

reference:

Programming Selection Network

This article is published by OpenWrite, a blog that publishes multiple articles !

Broadcom announced the termination of the existing VMware partner program . Site B crashed twice, Tencent's "3.29" level one incident... Taking stock of the top ten downtime incidents in 2023, Vue 3.4 "Slam Dunk" released, Yakult confirmed 95G data Leaked MySQL 5.7, Moqu, Li Tiaotiao... Taking stock of the (open source) projects and websites that will be "stopped" in 2023 "2023 China Open Source Developer Report" is officially released Looking back at the IDE 30 years ago: only TUI, bright background color …… Julia 1.10 officially released Rust 1.75.0 released NVIDIA launched GeForce RTX 4090 D specially for sale in China
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3494859/blog/10560500
Recommended