kafka restart consumer repeat Consumption

Original link: https: //blog.csdn.net/z1941563559/java/article/details/88753938

Problem Description : Some topic kafka period of time after consumption, restart the unique consumer, offset will be reset to the minimum offset re-consumption, consumption has led to repeated problems kafka consumption.
The cause of the problem: is the offset caused by outdated information. I always thought that consumers remain online, the latest displacement information is not expired. But even if consumers online, such as displacement information will expire about. Data configuration log.retention.hours = 168 retention time hour = retention time ratio of the displacement offsets.retention.minutes 1440 i.e. 24 hours longer. After offset outdated information, restart consumers. Unable to find the offset information, will start spending depending on the configuration auto.offset.reset = earliest, from the smallest displacement, has led to the consumption of data before being consumed again.

 

 

solution:

Original: https://issues.apache.org/jira/browse/KAFKA-3806

Default value adjustment of log.retention.hours and offsets.retention.minutes

In special cases, the log.retention.hours (168 hour = 7 days) and offsets.retention.minutes (1440 minute = Measured 1 day) in combination with default values can be dangerous. Offset reservations should always be greater than the number of reservations.

We have the following observations and questions:

  • Producers update was two days ago banned the production of data on the subject, the subject is not deleted.
  • Consumers use all the data and correct the offset assigned to Kafka's.
  • Consumers no longer submit to offset the topic, because there is no more incoming data, there is no identifiable content. (We have disabled automatic commit, I'm not sure the behavior is enabled automatically submitted.)
  • After a day: Kafka cleared too old offset according to offsets.retention.minutes.
  • Two days later: the long-running user restarts after the update, since this topic has been deleted offsets.retention.minutes, and therefore did not find any offsets have been submitted on the subject, and therefore began to use it from the beginning.
  • Since log.retention.hours longer, still Kafka message, the message is about 5 days read again.

To resolve this issue known solution:

  • Explicit configuration log.retention.hours and offsets.retention.minutes, do not use the default value.

Suggestions:

  • The default value offsets.retention.minutes extended to at least twice as large as log.retention.hours.
  • Kafka check these values ​​during startup, is less than if offsets.retention.minutes log.retention.hours, a warning is recorded.
  • Add a note in the migration guide to understand the difference between Kafka and ZooKeeper stored offset ( http://kafka.apache.org/documentation.html#upgrade ).

 

 

modify:

Default Default value problem parameters offsets.retention.minutes & log.retention.minutes of.

The former is 7 days default parameters, which is 24 hours. While saving data can cause problems but offset failure causes the client duplication of data consumption.

0.10.0.0 official Parameters: http: //kafka.apache.org/0100/documentation.html#log

offsets.retention.minutes

Log retention window in minutes for offsets topic

Expiration time saved Kafka Server offset the end. The default value is 1440 (1440 minutes is 24 hours), it should be adjusted to coincide with the log.retention.hours, namely 10080.

 

log.retention.hours & log.retention.minutes

These two parameters are used to set the deletion log, regardless of which property has overflowed, will delete the file.

log.retention.hours: 
at The Number The hours to the Keep A log of the before the Deleting File IT (in hours), tertiary to log.retention.ms Property
parameter of type int, default: 168 (168 hours ie 7 days).

log.retention.minutes: 
The number of minutes to keep a log file before deleting it (in minutes), secondary to log.retention.ms property. If not set, the value in log.retention.hours is used
参数int类型,默认值:null。

Guess you like

Origin www.cnblogs.com/lshan/p/12573631.html