How does Consumer.endOffsets work in Kafka?

user2683814 :

Assume I've a timer task running indefinitely which iterates over the all the consumer groups in the kafka cluster and outputs lag, committed offset and end offset for all partitions for each group. Similar to how Kafka console consumer group script works except it's for all groups.

Something like

Single Consumer - Not Working - Doesn't return offsets for some of the provided topic partitions ( ex. 10 provided - 5 Offsets Returned )

Consumer consumer;

static {
  consumer = createConsumer();
}

run() { 
  List<String> groupIds = getConsumerGroups();
  for(String groupId: groupIds) {
       List<TopicParition> topicParitions =  getTopicParitions(groupId);
       consumer.endOffsets(topicParitions); -- Not working - missing offsets for some partitions for some groups (in 10 - out 5)
   }
}

Multiple Consumers - Working

run() { 
   List<String> groupIds = getConsumerGroups();
   for(String groupId: groupIds) {
        List<TopicParition> topicParitions =  getTopicParitions(groupId);
        Consumer consumer = createConsumer();
        consumer.endOffsets(topicParitions); This works!!!
   }
 }

Versions:Kafka-Client 2.0.0

Am I using the consumer api incorrectly ? Ideally I would like to use single consumer.

Let me know if you need more details.

user2683814 :

It's a bug in Fetcher.fetchOffsetsByTimes() specifically inside groupListOffsetRequests method in which the logic was not adding the partitions for retry where leader for requesting offset for a partition was unknown or unavailable.

This was more noticeable when you use the single consumer across all consumer group partitions where some groups already have the topics partition leader information when we requested endoffsets and for the topics partitions where there is no leader information is unknown or unavailable are left off because of the bug.

Later, I realized it was not a good idea to pull topics partitions from each consumer group instead made the change to read the topics partitions from AdminClient.listTopics & AdminClient.describeTopics and pass all at once to Consumer.endOffsets.

Although this completely doesn't resolve the issue as topics/partitions may still be unavailable or unknown between multiple runs.

More information can be found - KAFKA-7044 & pull request. This has been fixed and scheduled for 2.1.0 release.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=82510&siteId=1