How to get latest offset/size of a Kafka topic using KafkaAdminClient (Java) for 2.x version

Marina :

Is there a more efficient/simpler way of getting the size / latest offsets of a topic/partitions using the newest Kafka client 2.4 APIs in Java? And then, calculate a Lag for a consumer group by comparing that group's offsets with the size of the topic...

I know this question has been asked for older Kafka versions and there is also a way to get this info from JMX metrics exposed by Kafka, but I am stuck with a legacy app that needs to do it in Java but with latest 2.4 Kafka libs.

The usual way of getting this info , as far as I understand is:

  • The easiest part: get offsets for a topic/partitions for a consumer groupID using an API call on KafkaAdminClient like public ListConsumerGroupOffsetsResult listConsumerGroupOffsets(String groupId, ListConsumerGroupOffsetsOptions options)
  • The hardest part: Determine the size of the topic for each partition:
    • create a new consumer and subscribe to the topic
    • advance the consumer to the latest offset using consumer.seekToEnd(...)
    • get the position of the consumer for all partitions using consumer.position(...)
  • finally, do [size - current offset] to determine the lag of the consumer group for each partition

Thus, determining the last offset is a pretty heavy operation ... So my question is: is there a more efficient way of getting the last offsets for a topic without using the dummy consumer, maybe in the latest 2.4 APIs? The topic/partition size info is really independent of any consumers, so it seems logical to be able to get it without the use of consumers...

Thank you!

Marina

radai :

externally to the kafka consuming application you are correct, your options are to look at partition end offsets vs the latest checkpointed positions of the consumer group (assuming the consumers in question even use kafka to store offsets).

there are tools that will monitor this for you, such as burrow.

However, if you have access to the consuming application itself there is a more accurate way. here's a list of all consumer sensors (exposed either via API or jmx by default) https://kafka.apache.org/documentation/#consumer_fetch_monitoring.

there is a per-partition records-lag metric. its updated every time poll() is called so is more accurate and lower latency than committed offsets. the only complication is you'd need to sum the values of these sensors across all partitions the consumer is assigned.

here's how to get at it via KafkaConsumer.metrics():

private long calcTotalLag(Map<MetricName, ? extends Metric> metrics) {
   long totalLag = 0;
   for (Map.Entry<MetricName, ? extends Metric> entry : metrics.entrySet()) {
     MetricName metricName = entry.getKey();
     Metric metric = entry.getValue();
     Map<String, String> tags = metricName.tags();
     if (metricName.name().equals("records-lag") && tags.containsKey("partition")) {
        totalLag += ((Number) metric.metricValue()).longValue();
     }
   }

   return totalLag;
}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=415466&siteId=1