Kafka Offset Storage

1 Overview

  At present, the latest version of Kafka's official website [ 0.10.1.1 ] has moved the consumed offset to a topic named __consumer_offsets in Kafka by default. In fact, as early as version 0.8.2.2, it has been supported to store the consumed offset into the topic, but at that time the default is to store the consumed offset in the Zookeeper cluster. Now, the official default is to store the consumed offset in the topic of Kafka. At the same time, it also retains the interface stored in Zookeeper, which is set through the offsets.storage property.

2. Content

  In fact, the official recommendation is also justified. In the previous version, Kafka actually had a big hidden danger, which was to use Zookeeper to store and record the consumption progress of each consumer/group. Although, in the process of use, JVM helps us to complete some optimizations, but consumers need to interact with Zookeeper frequently, and using ZKClient's API to operate Zookeeper's frequent Write itself is a relatively inefficient Action. Horizontal scaling is also a headache. If the Zookeeper cluster changes during the period, the throughput of the Kafka cluster will also be affected.

  After that, the official put forward the concept of migrating to Kafka very early, but it was always stored in the Zookeeper cluster by default before, which requires manual settings. If you are not very familiar with the use of Kafka, we generally accept it. Default storage (ie: stored in ZK). In the new version of Kafka and later versions, the offset consumed by Kafka will be stored in a topic called __consumer_offsets in the Kafka cluster by default.

  Of course, the principle of her implementation is also familiar to us. Using Kafka's own Topic, the consumed Group, Topic, and Partition are used as combination keys. All consumption offsets are submitted and written to the above Topic. Because this part of the message is so important that data loss cannot be tolerated, the ack level of the message is set to -1, and the producer will not get ack until all ISRs have received the message (data security is excellent, of course, its speed will be affected). Therefore, Kafka maintains a triplet of Group, Topic and Partition in memory to maintain the latest offset information. When consumers get the latest offset, they will get it directly from memory.

3. Realize

  So how do we get the offset of this part of the consumption? We can define a Map collection in memory to maintain the offset captured in the consumption, as shown below:

protected static Map<GroupTopicPartition, OffsetAndMetadata> offsetMap = new ConcurrentHashMap<>();

  Then, we update the Map in memory through a listening thread, the code is as follows:

copy code
private static synchronized void startOffsetListener(ConsumerConnector consumerConnector) {
        Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
        topicCountMap.put (consumerOffsetTopic, new Integer (1 ));
        KafkaStream<byte[], byte[]> offsetMsgStream = consumerConnector.createMessageStreams(topicCountMap).get(consumerOffsetTopic).get(0);

        ConsumerIterator<byte[], byte[]> it = offsetMsgStream.iterator();
        while (true) {
            MessageAndMetadata<byte[], byte[]> offsetMsg = it.next();
            if (ByteBuffer.wrap(offsetMsg.key()).getShort() < 2) {
                try {
                    GroupTopicPartition commitKey = readMessageKey(ByteBuffer.wrap(offsetMsg.key()));
                    if (offsetMsg.message() == null) {
                        continue;
                    }
                    OffsetAndMetadata commitValue = readMessageValue(ByteBuffer.wrap(offsetMsg.message()));
                    offsetMap.put(commitKey, commitValue);
                } catch (Exception e) {
                    e.printStackTrace ();
                }
            }
        }
    }
copy code

  After getting this part of the updated offset data, we can share this part of the data through RPC, so that the client can obtain this part of the data and visualize it. The RPC interface looks like this:

copy code
namespace java org.smartloli.kafka.eagle.ipc

service KafkaOffsetServer{
    string query(1:string group,2:string topic,3:i32 partition),
    string getOffset(),
    string sql(1:string sql),
    string getConsumer(),
    string getActiverConsumer()
}
copy code

  这里,如果我们不想写接口来操作 offset,可以通过 SQL 来操作消费的 offset 数组,使用方式如下所示:

  • 引入依赖JAR
<dependency>
    <groupId>org.smartloli</groupId>
    <artifactId>jsql-client</artifactId>
    <version>1.0.0</version>
</dependency>
  • 使用接口
JSqlUtils.query(tabSchema, tableName, dataSets, sql);

  tabSchema: table structure; tableName: table name; dataSets: data set; sql: SQL statement for operation.

4. Preview

  The consumer preview is shown below:

  The graph of the relationship being consumed looks like this:

  The consumption detail offset is as follows:

  Rate graphs for consumption and production, as follows:

5. Summary

  Here, it is explained that when the offset is stored in the topic of Kafka, the consumer thread ID information is not recorded. However, after reading the composition rules of Kafka consumer thread ID, we can manually generate the consumer thread ID. The consumer thread ID is: Group+ ConsumerLocalAddress+Timespan+UUID(8bit)+PartitionId, because the consumer is on another node, we cannot determine the ConsumerLocalAddress temporarily. Finally, everyone is welcome to use Kafka cluster monitoring - [  Kafka Eagle  ], [  Operation Manual  ].

 

Reprinted: http://www.cnblogs.com/smartloli/p/6266453.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326900015&siteId=291194637