Kafka Consumer consumer API operation detailed explanation

1. Preparation

  • Create a maven project on the IDE, add dependencies to the pom file
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>1.1.1</version>
</dependency>
  • Start zookeeper cluster
bin/zkServer.sh start
  • Start the kafka cluster
bin/kafka-server-start.sh -daemon config/server.properties
  • Kafka cluster opens a producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mydata 

Two, create an ordinary consumer (automatically submit offset)

The reliability of the consumer's data consumption is very easy to ensure, because the data is persistent in Kafka, so there is no need to worry about data loss.

Since the consumer may experience failures such as power outages and downtime during the consumption process, after the consumer recovers, it needs to continue to consume from the location before the failure, so the consumer needs to record in real time which offset it consumes so that it can continue to consume after the failure is restored. Therefore, the maintenance of offset is a problem that Consumers must consider when consuming data

In the code below, this consumption will only record the maximum offset, which is equivalent to not adding --from-beginning on the command line, and the previous data cannot be consumed

import java.util.Collections;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

public class MyConsumer {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();

        /* 指定连接kafka集群 */
        properties.put("bootstrap.servers", "centos7-1:9092");

        /* 开启自动提交 */
        properties.put("enable.auto.commit", Boolean.valueOf(true));

        /* 自动提交的延时时间 */
        properties.put("auto.commit.interval.ms", "1000");

        /* key的反序列化 */
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        /* value的反序列化 */
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        /* 指定消费者组 */
        properties.put("group.id", "mygroup");

        /* 创建消费者对象 */
        KafkaConsumer<String, String> consumer = new KafkaConsumer(properties);

        /* 订阅的主题,可同时订阅多个 */
        consumer.subscribe(Collections.singletonList("bigdata"));


        while (true) {
    
    
            /* 获取数据,设置拉取数据延迟时间 */
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);


            for (ConsumerRecord<String, String> consumerRecord : consumerRecords)
            {
    
    
                /* 打印消息的键和值 */
                System.out.println(consumerRecord.key() + "==>" + consumerRecord.value());
            }
        }
    }
}

Run screenshot:
Insert picture description here

Third, how to re-consume the data of a certain topic (automatically submit the offset)

You can change a consumer group and reset the offset

/* 指定消费者组 */
properties.put("group.id", "mygroup");

/* 重置消费者的offset */
properties.put("auto.offset.reset","earliest");

The source code of auto.offset.reset is as follows: Either one of the two conditions is satisfied: when the consumer group is changed or the saved data becomes invalid (when the offset is invalid), it will take effect and reset the offset. There are two effective types: the default is the latest offset, which can be modified to the earliest, the earliest offset.

    public static final String AUTO_OFFSET_RESET_CONFIG = "auto.offset.reset";
    public static final String AUTO_OFFSET_RESET_DOC = "What to do when there is no 
    initial offset in Kafka or if the current offset does not exist any more on the 
    server (e.g. because that data has been deleted): <ul><li>earliest: automatically 
    reset the offset to the earliest offset<li>latest: automatically reset the offset to 
    the latest offset</li><li>none: throw exception to the consumer if no previous offset 
    is found for the consumer's group</li><li>anything else: throw exception to the 
    consumer.</li></ul>";

The detailed code is as follows:

import java.util.Collections;
import java.util.Properties;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

public class MyConsumer {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();

        /* 指定连接kafka集群 */
        properties.put("bootstrap.servers", "centos7-1:9092");

        /* 开启自动提交 */
        properties.put("enable.auto.commit", Boolean.valueOf(true));

        /* 自动提交的延时时间 */
        properties.put("auto.commit.interval.ms", "1000");

        /* key的反序列化 */
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        /* value的反序列化 */
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        /* 指定消费者组 */
        properties.put("group.id", "mygroup1");

        /* 重置消费者的offset */
        properties.put("auto.offset.reset","earliest");

        /* 创建消费者对象 */
        KafkaConsumer<String, String> consumer = new KafkaConsumer(properties);

        /* 订阅的主题,可同时订阅多个 */
        consumer.subscribe(Collections.singletonList("bigdata"));


        while (true) {
    
    
            /* 获取数据,设置拉取数据延迟时间 */
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
            /* 遍历 */
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords)
            {
    
    
                /* 打印消息的键和值 */
                System.out.println(consumerRecord.key() + "==>" + consumerRecord.value());
            }
        }
    }
}

Four, manually submit the offset

  • Although the automatic submission of offset is very concise, it is difficult for developers to grasp the timing of offset submission ( delay time ) because it is submitted based on time . Therefore, Kafka also provides an API for manually submitting the offset.

  • There are two methods for manually submitting an offset: commitSync (synchronous submission) and commitAsync (asynchronous submission).

    The similarity between the two is that they will submit the highest offset of a batch of poll data;

    The difference is that commitSync blocks the current thread until the submission is successful, and will automatically fail and retry (due to uncontrollable factors, there will also be submission failures); commitAsync does not have a failure retry mechanism, so the submission may fail.

  • Whether it is submitted synchronously or asynchronously, it may cause data leakage or repeated consumption. Submitting the offset first and then consuming it may cause data leakage; and submitting the offset after consuming it may cause repeated consumption of data

Synchronous submission

import java.util.Collections;
import java.util.Properties;
import org.apache.kafka.clients.consumer.*;

public class MyConsumer {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();

        /* 指定连接kafka集群 */
        properties.put("bootstrap.servers", "centos7-1:9092");

        /* 开启手动提交 */
        properties.put("enable.auto.commit", Boolean.valueOf(false));

        /* key的反序列化 */
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        /* value的反序列化 */
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        /* 指定消费者组 */
        properties.put("group.id", "mygroup");

        /* 创建消费者对象 */
        KafkaConsumer<String, String> consumer = new KafkaConsumer(properties);

        /* 订阅的主题,可同时订阅多个 */
        consumer.subscribe(Collections.singletonList("bigdata"));

        while (true) {
    
    
            /* 获取数据,设置拉取数据延迟时间 */
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
            /* 遍历 */
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords)
            {
    
    
                /* 打印消息的键和值 */
                System.out.println(consumerRecord.key() + "==>" + consumerRecord.value());
            }

            /* 同步提交,当前线程会阻塞直到 offset 提交成功 */
            consumer.commitSync();

        }
    }
}

Asynchronous submission:

import java.util.Collections;
import java.util.Map;
import java.util.Properties;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

public class MyConsumer {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();

        /* 指定连接kafka集群 */
        properties.put("bootstrap.servers", "centos7-1:9092");

        /* 开启手动提交 */
        properties.put("enable.auto.commit", Boolean.valueOf(false));

        /* key的反序列化 */
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        /* value的反序列化 */
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        /* 指定消费者组 */
        properties.put("group.id", "mygroup");

        /* 创建消费者对象 */
        KafkaConsumer<String, String> consumer = new KafkaConsumer(properties);

        /* 订阅的主题,可同时订阅多个 */
        consumer.subscribe(Collections.singletonList("bigdata"));


        while (true) {
    
    
            /* 获取数据,设置拉取数据延迟时间 */
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
            /* 遍历 */
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords)
            {
    
    
                /* 打印消息的键和值 */
                System.out.println(consumerRecord.key() + "==>" + consumerRecord.value());
            }

            /* 异步提交 */
            consumer.commitAsync(new OffsetCommitCallback() {
    
    
                @Override
                public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
    
    
                    if (exception != null){
    
    
                        System.err.println("提交失败" + offsets);
                    }else {
    
    
                        System.err.println("提交成功" + offsets);
                    }
                }
            });
        }
    }
}

Five, custom storage offset

  • Before Kafka version 0.9, the offset was stored in zookeeper, and after version 0.9, the offset was stored in a built-in topic in Kafka by default. In addition, Kafka can also choose to customize the storage offset. The maintenance of offset is quite cumbersome, because consumer Rebalance needs to be taken into account .

  • When a new consumer joins a consumer group, an existing consumer launches a consumer group, or the partition of a subscribed topic changes, the partition redistribution will be triggered. The redistribution process is called Rebalance.

  • After consumer Rebalance occurs, the consumption partition of each consumer will change. Therefore, consumers must first obtain the partition to which they are reassigned, and locate the offset position recently submitted by each partition to continue consumption.

  • To implement custom storage offsets, you need to use the ConsumerRebalanceListener class. The methods of submitting and obtaining offsets need to be implemented by the selected offset storage system.

import java.util.*;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

public class MyConsumer {
    
    
    private static Map<TopicPartition, Long> currentOffset = new  HashMap<>();

    public static void main(String[] args) {
    
    
        Properties properties = new Properties();

        /* 指定连接kafka集群 */
        properties.put("bootstrap.servers", "centos7-1:9092");

        /* 开启手动提交 */
        properties.put("enable.auto.commit", Boolean.valueOf(false));

        /* key的反序列化 */
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        /* value的反序列化 */
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        /* 指定消费者组 */
        properties.put("group.id", "mygroup");

        /* 创建消费者对象 */
        KafkaConsumer<String, String> consumer = new KafkaConsumer(properties);

        /* 消费者订阅的主题 */
        consumer.subscribe(Arrays.asList("bigdata"), new ConsumerRebalanceListener() {
    
    

            /* 该方法会在 Rebalance 之前调用 */
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
    
    
                /* 调用自己写的提交所有分区的 offset 方法 */
                commitOffset(currentOffset);
            }

            /* 该方法会在 Rebalance 之后调用 */
            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
    
    
                currentOffset.clear();
                /* 遍历 */
                for (TopicPartition partition : partitions) {
    
    
                    /* 定位到最近提交的 offset 位置继续消费 */
                    consumer.seek(partition, getOffset(partition));
                }
            }
        });

        while (true) {
    
    
            /* 获取数据,设置拉取数据延迟时间 */
            ConsumerRecords<String, String> consumerRecords = consumer.poll(100);
            /* 遍历 */
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords)
            {
    
    
                /* 打印消息的键和值 */
                System.out.println(consumerRecord.key() + "==>" + consumerRecord.value());

                currentOffset.put(new TopicPartition(consumerRecord.topic(), consumerRecord.partition()),consumerRecord.offset());
            }

            /* 异步提交 */
            commitOffset(currentOffset);
        }
    }

    /* 获取某分区的最新 offset */
    private static long getOffset(TopicPartition partition) {
    
    
        return 0;
    }

    /* 提交该消费者所有分区的 offset */
    private static void commitOffset(Map<TopicPartition, Long> currentOffset) {
    
    
        /* 自己定义的方法,根据业务逻辑,也可以提交到mysql上,写一个jdbc,创建消费者组、主题、分区字段 */
    }
}

Guess you like

Origin blog.csdn.net/weixin_46122692/article/details/109280040