Introduction to Kafka, missed consumption and repeated consumption, consumer affairs, data backlog (24)

Missed consumption and repeated consumption

Repeated consumption: The data has been consumed, but the offset has not been submitted.
Leaked consumption: first submit the offset and then consume, which may cause data to be missed and consumed
insert image description here

consumer affairs

insert image description here

If quasi-one-time consumption is required to complete the consumer side, then the Kafka consumer side needs to atomically bind the consumption process and the offset submission process. At this point, we need to save the offset of kafka to a custom medium that supports transactions (such as MySQL), this part will only be involved in the follow-up project

Data backlog (how consumers can improve throughput)

insert image description here
1) If Kafka's consumption capacity is insufficient, you can consider increasing the number of topic partitions, and at the same time increase the number of consumers in the consumption group, where the number of consumers = the number of partitions.
2) If the downstream processing is not timely, increase the number of pulls in each batch. If the data pulled in a batch is too small (pull data/processing time < production speed), the processed data will be smaller than the production data, which will also cause data squeeze.

parameter name describe
fetch.max.bytes Default Default: 52428800 (50 m) The maximum number of bytes that a consumer can obtain from a batch of messages from the server. If the batch of data on the server side is larger than this value (50m), the batch of data can still be pulled back, so this is not an absolute maximum. The batch size is affected by message.max.bytes(broker config) or max.message.bytes(topic config)
max.poll.records The maximum number of messages returned by a poll pull data, the default is 500

Guess you like

Origin blog.csdn.net/weixin_43205308/article/details/131639450