kafka - other important configuration of the producer

kafka - other important configuration of the producer

The producer also has many configurable parameters, which are described in the Kafka documentation, and most of them have reasonable default values, so there is no need to modify them. However, there are several parameters that have a big impact on producers in terms of memory usage, performance, and reliability .


acks

The acks parameter specifies how many partition replicas must receive the message before the producer considers the message write successful. This parameter has a significant impact on the probability of message loss. This parameter has the following options.

  1. If acks=0 , the producer will not wait for any response from the server before successfully writing the message. That is, if something goes wrong and the server doesn't receive the message, the producer has no way of knowing, and the message is lost. However, because the producer does not need to wait for a response from the server, it can send messages at the maximum speed the network can support, resulting in high throughput.
  2. If acks=1 , the producer will receive a successful response from the server as soon as the leader node of the cluster receives the message. If the message cannot reach the leader node (for example, the leader node crashes and a new leader has not been elected), the producer will receive an error response, and in order to avoid data loss, the producer will resend the message. However, if a node that did not receive the message becomes the new leader, the message will still be lost. The throughput at this time depends on whether synchronous or asynchronous transmission is used. If the sending client is made to wait for the server's response (by calling the Future object's get() method), there will obviously be an added latency (a round-trip delay on the network). Latency issues can be mitigated if the client uses callbacks, but throughput is still limited by the number of messages being sent (i.e. how many messages the producer can send before receiving a response from the server).
  3. If acks=all , the producer will receive a successful response from the server only if all nodes participating in the replication have all received the message. This mode is the safest, it guarantees that more than one server receives the message, and even if a server crashes, the entire cluster can still run. However, its latency is higher than with acks=1 because we are waiting for more than one server node to receive the message.

buffer.memory

This parameter is used to set the size of the producer's memory buffer, which is used by the producer to buffer messages to be sent to the server. If the application sends messages faster than it can to the server, it will cause the producer to run out of space. At this time, the send() method call will either block or throw an exception, depending on how the block.on.buffer.full parameter is set (replaced with max.block.ms in version 0.9.0.0, indicating that the Can block for a period of time before exception).

compression.type

By default, messages are sent without compression. This parameter can be set to snappy, gzip or lz4 and it specifies which compression algorithm is used to compress the message before sending it to the broker.

  1. The snappy compression algorithm was invented by Google. It occupies less CPU, but can provide better performance and a considerable compression ratio. If you are more concerned about performance and network bandwidth, you can use this algorithm.
  2. The gzip compression algorithm generally consumes more CPU, but will provide a higher compression ratio, so if the network bandwidth is relatively limited, this algorithm can be used.

Using compression can reduce network transmission and storage overhead, which are often bottlenecks in sending messages to Kafka.

retries

The error that the producer receives from the server may be a temporary error (for example, the partition cannot find the leader). In this case, the value of the retries parameter determines the number of times the producer can resend the message, if this number is reached, the producer will give up retrying and return an error.

By default, the producer will wait 100ms between retries, but this interval can be changed with the retry.backoff.ms parameter. It is recommended to test how long it takes to recover a crashed node (such as how long it takes for all partitions to elect a leader) before setting the number of retries and the retry interval, so that the total retry time is longer than the recovery time of the Kafka cluster from the crash. long, otherwise the producer will give up retrying prematurely. However, some errors are not transient and cannot be resolved by retrying (such as "message too large" errors).

In general, because the producer will automatically retry, there is no need to handle those retryable errors in the code logic. You only need to handle non-retryable errors or retries that exceed the limit.

batch.size

When there are multiple messages that need to be sent to the same partition, the producer will put them in the same batch. This parameter specifies the amount of memory a batch can use, in bytes (not messages) . When the batch is full, all messages in the batch will be sent out. However, producers do not necessarily wait until the batch is full before sending. Half-full batches, even batches containing only one message, may be sent. So even if the batch size is set large, it will not cause delay, it will just take up more memory. But if it is set too small, it will add some extra overhead because the producer needs to send messages more frequently.

linger.ms

This parameter specifies how long the producer waits for more messages to join the batch before sending the batch. KafkaProducer will send out batches when the batch is full or when the linger.ms limit is reached. By default, the producer will send messages even if there is only one message in the batch, as long as there are threads available.

Setting linger.ms to a number greater than 0 makes the producer wait a while before sending a batch, so that more messages are added to the batch. While this increases latency, it also increases throughput (because more messages are sent at once, the overhead per message is less).

client.id

This parameter can be any string that the server will use to identify the source of the message, and can also be used in logs and quota metrics.

max.in.flight.requests.per.connection

This parameter specifies how many messages the producer can send before receiving a response from the server. The higher its value, the more memory is used, but the throughput is also improved. Setting it to 1 ensures that messages are written to the server in the order they were sent, even if retries occur.

timeout.ms、request.timeout.ms 和 metadata.fetch.timeout.ms

request.timeout.ms specifies how long the producer waits for a response from the server when sending data.

metadata.fetch.timeout.ms specifies how long the producer waits for a response from the server when fetching metadata (such as who is the leader of the target partition). If the wait for a response times out, the producer will either retry sending the data or return an error (throw an exception or execute a callback).

timeout.ms specifies how long the broker waits for an ack from the in-sync replica to return a message, matching the configuration of asks - if no ack from the in-sync replica is received within the specified time, the broker will return an error.

max.block.ms

This parameter specifies the blocking time of the producer when calling the send() method or using the partitionsFor() method to get metadata. These methods block when the producer's send buffer is full, or when no metadata is available. When the blocking time reaches max.block.ms, the producer will throw a timeout exception.

max.request.size

This parameter is used to control the size of the request sent by the producer. It can refer to the maximum value of a single message that can be sent, or it can refer to the total size of all messages in a single request. For example, assuming this value is 1MB, the largest single message that can be sent is 1MB, or the producer can send a batch of 1000 messages, each 1KB in size, in a single request. In addition, the broker also has its own limit (message.max.bytes) on the maximum number of messages that can be received, so it is best to match the configuration on both sides to avoid the message sent by the producer being rejected by the broker.

receive.buffer.bytes 和 send.buffer.bytes

These two parameters specify the buffer size of the TCP socket to receive and send packets, respectively. If they are set to -1, the operating system defaults are used. If the producer or consumer is in a different data center than the broker, these values ​​can be increased appropriately, as the network across data centers generally has higher latency and lower bandwidth.

order guarantee

Kafka can guarantee that the messages in the same partition are ordered. That is, if producers send messages in a certain order, brokers will write them to partitions in that order, and consumers will read them in the same order.

In some cases, the order is very important. For example, depositing 100 yuan into an account and then withdrawing it is completely different from withdrawing money first and then depositing it. However, some scenes are not very order sensitive.

If retries is set to a non-zero integer and max.in.flight.requests.per.connection is set to a number greater than 1, then if the first batch of messages fails to write, and the second batch of If the entry is successful, the broker will retry to write the first batch. If the first batch is also successfully written at this time, the order of the two batches is reversed.

Generally speaking, if some scenarios require messages to be ordered, it is also critical whether the message is written successfully, so it is not recommended to set retries to 0. You can set max.in.flight.requests.per.connection to 1 so that when the producer tries to send the first batch of messages, no other messages are sent to the broker. However, this can seriously affect the throughput of the producer, so this should only be done if there are strict requirements on the order of the messages.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324650689&siteId=291194637