一次kafka一直rebalance的定位与处理

前端时间我们自研了一个kafka->elasticsearch导数据的小程序,我们启动了多个副本。由于需要限速,可能会有停止接收消息的时间段,会导致停止和kafka的心跳,导致kafka一直处于rebalance的状态,通过查阅文档发现下面相关的几个参数

session.timeout.ms is for heartbeat thread. If coordinator fails to get 
any heartbeat from a consumer before this time interval elapsed, it marks 
consumer as failed and triggers a new round of rebalance.

max.poll.interval.ms is for user thread. If message processing logic is 
too heavy to cost larger than this time interval, coordinator explicitly 
have the consumer leave the group and also triggers a new round of 
rebalance.

heartbeat.interval.ms is used to have other healthy consumers aware of the 
rebalance much faster. If coordinator triggers a rebalance, other 
consumers will only know of this by receiving the heartbeat response with 
REBALANCE_IN_PROGRESS exception encapsulated. Quicker the heartbeat 
request is sent, faster the consumer knows it needs to rejoin the group.

Suggested values:
session.timeout.ms : a relatively low value, 10 seconds for instance.
max.poll.interval.ms: based on your processing requirements
heartbeat.interval.ms: a relatively low value, better 1/3 of the session.timeout.ms

简单解释一下上面几个参数,session.timeout.ms 这个是消费者维护session的最大时间,如果在这个时间内消费者没有上报状态则认为消费者已经断开,max.poll.interval.ms这个是拉取的最大时间间隔,如果你一次拉取的比较多,建议加大这个值,heartbeat.interval.ms这个是通知消费者重新连接的间隔时间,这个值要比session.timeout.ms小,最好不要超过最好不要超过heartbeat.interval.ms的三分之一,在我们环境中我们把超时时间修改成60s后rebalance问题得到解决,这些参数都是针对具体场景设置,不要盲从!

猜你喜欢

转载自blog.csdn.net/u010278923/article/details/79895335