Kafka's auto.offset.reset value analysis

When using kafka today, I found that setting auto.offset.reset to earliest , latest, or none did not achieve the desired effect.

  • earliest: When there is a submitted offset under each partition, consumption starts from the submitted offset; when there is no submitted offset, consumption starts from the beginning.
  • latest: When there is a submitted offset under each partition, consumption starts from the submitted offset; when there is no submitted offset, the newly generated data under the partition is consumed.
  • none: When there is a submitted offset in each partition of the topic, consumption will start from the offset; as long as there is a partition that does not have a submitted offset, an exception will be thrown.

Simulation 1:

Method: The offset value of the topic configured in this round has been submitted, and then the automatically submitted offset value is set to false, and the offset value is not submitted manually; at the same time, the producer produces new data

    @Bean
    public KafkaConsumer<String, String> kafkaConsumer() {
        Map<String, Object> config = new HashMap<>();
        config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
        config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        config.put(ConsumerConfig.GROUP_ID_CONFIG, "testGroup");
        config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        // config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        // config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "none");
        config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(config);
        return consumer;
    }

earliest : Read all newly generated data, and consumption will be repeated when the service is restarted.

 latest: Read all newly generated data, and consumption will be repeated when the service is restarted.

 none: Read all newly generated data, and consumption will be repeated when the service is restarted.

Conclusion : There are submitted offsets under each partition, and the three parameters will be consumed from behind the offset.

Simulation 2:

We need to consume all the data accumulated on kafka (48 hours by default). How to consume it?

When configuring the producer, we reconfigure a new groupId. The new groupId has no offset configured and restarts the program.

    @Bean
    public KafkaConsumer<String, String> kafkaConsumer() {
        Map<String, Object> config = new HashMap<>();
        config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
        config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        config.put(ConsumerConfig.GROUP_ID_CONFIG, "testGroup-new");
        config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        // config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        // config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "none");
        config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(config);
        return consumer;
    }

earliest : Read all historical data, and consumption will be repeated when the service is restarted.

  latest: Read the newly generated data from the producer and will not consume the previous data.

  none: After the service is restarted, the program reports an error

 Conclusion : When creating a new group, the earliest  partitions are consumed from scratch. latest reads the latest data, and none  throws an exception when the previous offset value is not found for the consumer group.

Summary :

1. If there is a submitted offset, no matter it is set to earliest or latest, consumption will start from the submitted offset. If there is no submitted offset, earliest means consuming from the beginning, and latest means consuming from the latest data, that is Newly generated data. When each partition of the none topic has a submitted offset, consumption will start from the submitted offset; as long as there is a partition that does not have a submitted offset, an exception will be thrown.

2. There is already group consumption data in the topic. When creating a new consumer with other group IDs, the offset submitted by the previous group has no effect on the newly created group consumption.

3. Do not change the group.id, just add config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");. The consumer will not consume from the very beginning. As long as the consumer group is not changed, it will only consume from the end of the last consumption. Continue consuming; similarly, if the group.id is not changed, latest will not only consume the latest data. As long as the consumption group is not changed, consumption will only continue from where the last consumption ended.

Guess you like

Origin blog.csdn.net/Romantic_lei/article/details/126597740
Recommended