Today, we encountered some problems when using python consumption kafka, special record it.
scene one,
Special case: write a separate program only for production and consumption data
Start time: 10 : 42
Topic: t_facedec
Partition: 1
Program starts: 168 Start consumer, 158 start consumer, windows machine producer to push data
Run Length: 15 Fenzhong
result:
1 , 168 of consume pause, 158 of the consumer has been consumption
2 , 10 : 46 minutes producer restart pushed back a few stops, 158 stop spending and start spending
3 , 10 : 49 minute stop 168 , 158 of the consumer according to the order to restart, 168 after some data consumption 158 start has been consumer
4 , after the start of consumer data in the consumer
Scene Two,
Special Scene: on-line program, comprising a face recognition process
Start time: 11 : 00
Topic: t_facedec
Partition: 1
1 , 11:46 start 168 of conumer consumption, 6 minutes after the log is as follows, no abnormal information
Scene Two,
Special Scene: on-line program, comprising a face recognition process
Start time: 11 : 00
Topic: t_facedec
Partition: 1
1 , 11:46 start 168 of conumer consumption, 6 minutes after the log is as follows, no abnormal information
2 , 11:53 start 158 of the consumer , as a log, no abnormal, 158 of the consumer join group kongzhagen
3、168的consumer发出警告, 心跳失败,因组正在重新平衡
4、windows端启动producer, 168的consumer开始消费数据, 158的consumer没有消费数据
结论: 先启动的consumer会消费数据, 168的consumer关闭后, 158的consumer开始消费
5、半小时后
分解错误图:
9分钟后空连接时间到'connections_max_idle_ms': 540000,
'max_poll_records': 500, 'heartbeat_interval_ms': 3000, 'session_timeout_ms': 30000,
后续:
14:32开始继续生产数据, 数据开始被消费
14:48分再次出现平衡超时
15:02分再次出现平衡超时
原因分析:
1、引起timeout的原因是consumer每3秒触发一次心跳, 由于某种原因在30秒内协调者没有收到此consumer的心跳信息, 认为此consumer已经死掉,topic内的分区在group的成员间重新分配(reblance)
2、默认consumer的每次最大poll数据量为500, 如果处理这500条记录的时候超过了最大时间间隔max_poll_interval_ms, consumer也会退出group, 导致reblance的产生
3、如果consumer没有产生消费行为的时间超过最大值connections_max_idle_ms:540000 (9 min)时, 也会导致consumer退出该组。
解决方法:
1、增加心跳会话超时间隔
session_timeout_ms = 300000(从30改为300秒)
2、减少每次获取任务的数量
max_poll_records = 5(从500改为5)
3、增加空闲连接时间
connections_max_idle_ms=5400000(从9min改为90min)