Kafka high and low version of heartbeats and session timeout mechanism

 

problem:

We are using version of Kafka 0.10.2.1,

#kafka.bootstrap.servers=
kafka.bootstrap.servers=
kafka.group.id=
kafka.topic=
kafka.concurrency=5
kafka.key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
kafka.value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
kafka.auto.offset.reset=latest
kafka.enable.auto.commit=true
kafka.auto.commit.interval=100
kafka.max.poll.records=20
kafka.session.timeout.ms=30000
kafka.security.protocol=SASL_PLAINTEXT
kafka.sasl.mechanism=PLAIN

But there will be frequent redistribution of kafka consumers


 

 

Kafka high and low version of heartbeats and session timeout mechanism

Kafka 0.10.0.0 & Kafka 0.10.1.0

0.10.0.0 heartbeat and timeout mechanism:
heartbeats and polling are coupled together, only session.timeout.ms parameters are provided, and there is no independent control poll polling parameter.
Assuming that it takes 1 minute for the consumer to process the message, the session.timeout.ms needs to be set to more than 1 minute, otherwise the consumer will time out.

session.timeout.ms
The timeout used to detect failures when using Kafka's group management facilities. 
When a consumer's heartbeat is not received within the session timeout, the broker will mark the consumer as failed and rebalance the group. 
Since heartbeats are sent only when poll() is invoked, a higher session timeout allows more time for message processing in the consumer's poll loop at the cost of a longer time to detect hard failures. 
See also max.poll.records for another option to control the processing time in the poll loop. 
Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.
官方提到还可以通过 max.poll.records 参数从另外一个维度来控制影响每次 poll 的时间。

heartbeat.interval.ms
The expected time between heartbeats to the consumer coordinator when using Kafka's group management facilities. 
Heartbeats are used to ensure that the consumer's session stays active and to facilitate rebalancing when new consumers join or leave the group. 
The value must be set lower than session.timeout.ms, but typically should be set no higher than 1/3 of that value. 
It can be adjusted even lower to control the expected time for normal rebalances.


0.10.1.0 heartbeat mechanism:
Starting from this version, heartbeats and poll are decoupled, and each thread has an independent heartbeat maintenance mechanism.
Starting from this version, an independent max.poll.interval.ms parameter has been added. In this way, the interval between two polls can be configured separately, which makes it possible to configure the poll interval to be longer than the heartbeat interval, that is, the time for consumers to process messages can be configured independently, allowing the message processing time to be greater than the heartbeat time (session timeout session. timeout.ms).
session.timeout.ms is used for heartbeat maintenance threads, and max.poll.interval.ms is used for consumption processing threads. There are two separate threads in this version.

Assuming session.timeout.ms = 30000, which is 30 seconds, the consumer heartbeat thread must send a heartbeat to the server before this timeout.
On the other hand, if a single message processing takes 1 minute, max.poll.interval.ms can be set to greater than 1 minute to provide more time for the consumer processing thread to process the message.
Otherwise, if max.poll.interval.ms <1 minute, it will cause a single message to be processed and wait for the next poll, because the two polls exceed max.poll.interval.ms and cause the poll to fail (even if the session has not timed out) , Poll will still fail).

If the processing (pol) thread hangs, the server can detect it through max.poll.interval.ms.
If the entire consumer (Consumer) hangs, it can only be detected through session.timeout.ms.


0.10.1.0 的重大修改:
The new Java Consumer now supports heartbeating from a background thread. 
There is a new configuration max.poll.interval.ms which controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). 
The value of the configuration request.timeout.ms must always be larger than max.poll.interval.ms because this is the maximum time that a JoinGroup request can block on the server while the consumer is rebalancing, so we have changed its default value to just above 5 minutes. 
Finally, the default value of session.timeout.ms has been adjusted down to 10 seconds, and the default value of max.poll.records has been changed to 500.


0.10.1.0 版本的官方说明(http://kafka.apache.org/0101/documentation.html
max.poll.interval.ms
The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.

session.timeout.ms
The timeout used to detect consumer failures when using Kafka's group management facility. The consumer sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this consumer from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.

request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the response of a request. 
If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted.
 

Also note:

The default parameter (max.poll.interval.ms) of version 0.10.2.1 is adjusted to Integer.MAX_VALUE

http://kafka.apache.org/0102/documentation.html

Notable changes in 0.10.2.1
The default values for two configurations of the StreamsConfig class were changed to improve the resiliency of Kafka Streams applications. 
The internal Kafka Streams producer retries default value was changed from 0 to 10. 
The internal Kafka Streams consumer max.poll.interval.ms default value was changed from 300000 to Integer.MAX_VALUE.

 

Guess you like

Origin blog.csdn.net/qq_32907195/article/details/112801258