Kafka producer and consumer api examples

producer api example

 

A normal production logic requires the following steps:

  1. Configure producer parameters and create corresponding producer instances

  2. Construct the message to be sent

  3. Send a message

  4. Close producer instance

Use the default partitioning method to hash the message and send it to each partition.

 

package com.doitedu;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class KafkaProducerDemo {
    public static void main(String[] args) throws InterruptedException {
        /**
         * 1.构建一个kafka的客户端
         * 2.创建一些待发送的消息,构建成kafka所需要的格式
         * 3.调用kafka的api去发送消息
         * 4.关闭kafka生产实例
         */
        //1.创建kafka的对象,配置一些kafka的配置文件
        //它里面有一个泛型k,v
        //要发送数据的key
        //要发送的数据value
        //他有一个隐含之意,就是kafka发送的消息,是一个key,value类型的数据,但是不是必须得,其实只需要发送value的值就可以了
        Properties pros = new Properties();
        //指定kafka集群的地址
        pros.setProperty("bootstrap.servers", "linux01:9092,linux02:9092,linux03:9092");
        //指定key的序列化方式
        pros.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        //指定value的序列化方式
        pros.setProperty("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        //ack模式,取值有0,1,-1(all),all是最慢但最安全的  服务器应答生产者成功的策略
        pros.put("acks", "all");
        //这是kafka发送数据失败的重试次数,这个可能会造成发送数据的乱序问题
        pros.setProperty("retries", "3");
        //数据发送批次的大小 单位是字节
        pros.setProperty("batch.size", "10000");
        //一次数据发送请求所能发送的最大数据量
        pros.setProperty("max.request.size", "102400");
        //消息在缓冲区保留的时间,超过设置的值就会被提交到服务端
        pros.put("linger.ms", 10000);
        //整个Producer用到总内存的大小,如果缓冲区满了会提交数据到服务端
        //buffer.memory要大于batch.size,否则会报申请内存不足的错误
        pros.put("buffer.memory", 10240);

        KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(pros);
        for (int i = 0; i < 1000; i++) {
            //key value  0 --> doit32+-->+0
            //key value  1 --> doit32+-->+1
            //key value  2 --> doit32+-->+2
            //2.创建一些待发送的消息,构建成kafka所需要的格式
            ProducerRecord<String, String> record = new ProducerRecord<>("test01", i + "", "doit32-->" + i);
            //3.调用kafka的api去发送消息
            kafkaProducer.send(record);
            Thread.sleep(100);
        }
        kafkaProducer.flush();
        kafkaProducer.close();
    }
}

 For the second way of writing properties configuration, it is relatively error-free. Here is a simple example:

public static void main(String[] args) {
    Properties pros = new Properties();
    pros.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "linux01:9092,linux02:9092,linux03:9092");
    pros.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    pros.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
}

 

1. Can Kafka producers continuously send data to topics?

Can

2. What are the parameters that Kafak’s producer must configure:

//Specify the address of the kafka cluster pros.setProperty("bootstrap.servers", "linux01:9092,linux02:9092,linux03:9092"); //Specify the serialization method of the key pros.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); //Specify the serialization method of value pros.setProperty("value.serializer","org.apache.kafka.common.serialization.StringSerializer");

3. When the kafka producer sends data, can it use the jdk serializer to serialize the data?

No, kafka has a designated serialization interface org.apache.kafka.common.serialization.Serializer

4. After constructing a kafka producer, has it been determined which topic my data needs to be sent to?

No, we don’t need to specify the topic when constructing the producer object. We only specify it when constructing the sending data object.

 Consumer API example

A normal consumption logic requires the following steps:

  1. Configure consumer client parameters and create corresponding consumer instances;

  2. Subscribe to topic topic;

  3. Pull messages and consume them;

  4. Submit consumption displacement offsets to the __consumer_offsets topic regularly;

  5. Close consumer instance

 

package com.doitedu;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.header.Header;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.record.TimestampType;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Optional;
import java.util.Properties;

public class ConsumerDemo {
    public static void main(String[] args) {
        //1.创建kafka的消费者对象,附带着把配置文件搞定
        Properties props = new Properties();
        //props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092,linux02:9092,linux03:9092");
        //props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // 定义kakfa 服务的地址,不需要将所有broker指定上
       // props.put("bootstrap.servers", "linux01:9092,linux02:9092,linux03:9092");
        // 制定consumer group
        props.put("group.id", "g3");
        // 是否自动提交offset  __consumer_offset   有多少分区  50 
        props.put("enable.auto.commit", "true");
        // 自动提交offset的时间间隔   -- 这玩意设置的大小怎么控制
        props.put("auto.commit.interval.ms", "5000");  //50000   1000
        // key的反序列化类
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        // value的反序列化类
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        // 如果没有消费偏移量记录,则自动重设为起始offset:latest, earliest, none
        props.put("auto.offset.reset","earliest");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        //2.订阅主题(确定需要消费哪一个或者多个主题)
        consumer.subscribe(Arrays.asList("test02"));
        //3.开始从topic中获取数据
        while (true){
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
            for (ConsumerRecord<String, String> record : records) {
                //这是数据所属的哪一个topic
                String topic = record.topic();
                //该条数据的偏移量
                long offset = record.offset();
                //这条数据是哪一个分区的
                int partition = record.partition();
                //这条数据记录的时间戳,但是这个时间戳有两个类型
                long timestamp = record.timestamp();
                //上面时间戳的类型,这个类型有两个,一个是CreateTime(这条数据创建的时间), LogAppendTime(这条数据往日志里面追加的时间)
                TimestampType timestampType = record.timestampType();
                //这个数据key的值
                String key = record.key();
                //这条数据value的值
                String value = record.value();
                //分区leader的纪元
                Optional<Integer> integer = record.leaderEpoch();
                //key的长度
                int keySize = record.serializedKeySize();
                //value的长度
                int valueSize = record.serializedValueSize();
                //数据的头部信息
                Headers headers = record.headers();
//            for (Header header : headers) {
//                String hKey = header.key();
//                byte[] hValue = header.value();
//                String valueString = new String(hValue);
//                System.out.println("header的key值 = " + hKey + "header的value的值 = "+ valueString);
//            }
                System.out.printf("topic = %s ,offset = %d, partition = %d, timestampType = %s ,timestamp = %d , key = %s , value = %s ,leader的纪元 = %d , key序列化的长度 = %d ,value 序列化的长度 = %d \r\n" ,
                        topic,offset,partition,timestampType + "",timestamp,key,value,integer.get(),keySize,valueSize);
            }
        }

        //4.关闭消费者对象
//        consumer.close();
    }
}

 subscribeSubscribe to topic

 subscribe has the following overloaded methods:

public void subscribe(Collection<String> topics,ConsumerRebalanceListener listener) 
public void subscribe(Collection<String> topics) 
public void subscribe(Pattern pattern, ConsumerRebalanceListener listener) 
public void subscribe(Pattern pattern)
  1. Specify a collection method to subscribe to a topic

 consumer.subscribe(Arrays.asList(topicl ));

2. Subscribe to topics in regular way

If the consumer uses a regular expression (subscribe(Pattern)) to subscribe, in the subsequent process, if someone creates a new topic and the topic name matches the regular expression, then the consumer can Consume messages into newly added topics. This subscription method is effective if the application needs to consume multiple topics and can handle different types.

Example of regular expression subscription

 consumer.subscribe(Pattern.compile ("topic.*" ));

 Use regular expressions to subscribe to topics to achieve dynamic subscription

 assignsubscriptiontopic

Consumers can not only subscribe to topics through the KafkaConsumer.subscribe() method, but also directly subscribe to specified partitions of certain topics;

The assign() method is provided in KafkaConsumer to implement these functions. The specific definition of this method is as follows:

 

package com.doitedu;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.header.Headers;
import org.apache.kafka.common.record.TimestampType;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Optional;
import java.util.Properties;

public class ConsumerDemo1 {
    public static void main(String[] args) {
        //1.创建kafka的消费者对象,附带着把配置文件搞定
        Properties props = new Properties();
        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092,linux02:9092,linux03:9092");
        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"doit01");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        //2.订阅主题(确定需要消费哪一个或者多个主题)
//        consumer.subscribe(Arrays.asList("test03"));

//        consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
//        //我现在想手动指定,我需要从哪边开始消费
//        //如果用subscribe去订阅主题的时候,他内部会给这个消费者组来一个自动再均衡
//        consumer.seek(new TopicPartition("test03",0),2);
        TopicPartition tp01 = new TopicPartition("test03", 0);

        //他就是手动去订阅主题和partition,有了这个就不需要再去订阅subscribe主题了,手动指定以后,他的内部就不会再来自动均衡了
        consumer.assign(Arrays.asList(tp01)); // 手动订阅指定主题的指定分区的指定位置
        consumer.seek(new TopicPartition("test03",0),2);

        //3.开始从topic中获取数据
        while (true){
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
            for (ConsumerRecord<String, String> record : records) {
                //这是数据所属的哪一个topic
                String topic = record.topic();
                //该条数据的偏移量
                long offset = record.offset();
                //这条数据是哪一个分区的
                int partition = record.partition();
                //这条数据记录的时间戳,但是这个时间戳有两个类型
                long timestamp = record.timestamp();
                //上面时间戳的类型,这个类型有两个,一个是CreateTime(这条数据创建的时间), LogAppendTime(这条数据往日志里面追加的时间)
                TimestampType timestampType = record.timestampType();
                //这个数据key的值
                String key = record.key();
                //这条数据value的值
                String value = record.value();

                //分区leader的纪元
                Optional<Integer> integer = record.leaderEpoch();
                //key的长度
                int keySize = record.serializedKeySize();
                //value的长度
                int valueSize = record.serializedValueSize();
                //数据的头部信息
                Headers headers = record.headers();
//            for (Header header : headers) {
//                String hKey = header.key();
//                byte[] hValue = header.value();
//                String valueString = new String(hValue);
//                System.out.println("header的key值 = " + hKey + "header的value的值 = "+ valueString);
//            }
                System.out.printf("topic = %s ,offset = %d, partition = %d, timestampType = %s ,timestamp = %d , key = %s , value = %s ,leader的纪元 = %d , key序列化的长度 = %d ,value 序列化的长度 = %d \r\n" ,
                        topic,offset,partition,timestampType + "",timestamp,key,value,integer.get(),keySize,valueSize);
            }
        }

        //4.关闭消费者对象
//        consumer.close();
    }
}

 This method only accepts the parameter partitions, which is used to specify the partition set that needs to be subscribed. Examples are as follows:

consumer.assign(Arrays.asList(new TopicPartition ("tpc_1" , 0),new TopicPartition(“tpc_2”,1))) ;

 The difference between subscribe and assign

 

  • Subscribing to topics through the subscribe() method has the function of automatic consumer rebalancing;

In the case of multiple consumers, the relationship between each consumer and partition can be automatically assigned according to the partition allocation strategy. When the number of consumers in the consumer group increases or decreases, the partition allocation relationship will be automatically adjusted to achieve consumer load balancing and automatic failover.

  • When the assign() method subscribes to a partition, it does not have the function of automatic consumer balancing;

In fact, this can be seen from the assign method parameters. Both types of subscribe() have methods with ConsumerRebalanceListener type parameters, but the assign() method does not.

 Message consumption patterns

Consumption in Kafka is based on the pull model.

 There are generally two modes of message consumption: push mode and pull mode. The push mode is that the server actively pushes messages to the consumer, while the pull mode is that the consumer actively initiates a request to the server to pull the message.

 

public class ConsumerRecord<K, V> {
    public static final long NO_TIMESTAMP = RecordBatch.NO_TIMESTAMP;
    public static final int NULL_SIZE = -1;
    public static final int NULL_CHECKSUM = -1;

    private final String topic;
    private final int partition;
    private final long offset;
    private final long timestamp;
    private final TimestampType timestampType;
    private final int serializedKeySize;
    private final int serializedValueSize;
    private final Headers headers;
    private final K key;
    private final V value;

    private volatile Long checksum;
  • topic partition These two attributes respectively represent the name of the topic to which the message belongs and the number of the partition where it is located.

  • offset represents the offset of the message in the partition to which it belongs.

  • timestamp represents a timestamp, and the corresponding timestampType represents the type of timestamp.

  • timestampType has two types: CreateTime and LogAppendTime, which represent the timestamp of message creation and the timestamp of message appended to the log respectively.

  • headers represents the header content of the message.

  • Key value represents the key of the message and the value of the message respectively. Generally, what business applications need to read is the value;

  • serializedKeySize and serializedValueSize respectively represent the size of key and value after serialization. If the key is empty, the value of serializedKeySize is -1. Similarly, if the value is empty, the value of serializedValueSize will also be -1;

  • checksum is the CRC32 check value.

Sample code snippet 

/**
 * 订阅与消费方式2
 */
TopicPartition tp1 = new TopicPartition("x", 0);
TopicPartition tp2 = new TopicPartition("y", 0);
TopicPartition tp3 = new TopicPartition("z", 0);
List<TopicPartition> tps = Arrays.asList(tp1, tp2, tp3);
consumer.assign(tps);
​
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (TopicPartition tp : tps) {
        List<ConsumerRecord<String, String>> rList = records.records(tp);
        for (ConsumerRecord<String, String> r : rList) {
            r.topic();
            r.partition();
            r.offset();
            r.value();
            //do something to process record.
        }
    }
}

 Specify displacement consumption

Sometimes, we need a more fine-grained control that allows us to start pulling messages from a specific displacement, and the seek() method in KafkaConsumer provides exactly this function, allowing us to consume forward or backward.

The specific definition of the seek() method is as follows:

seek都是和assign这个方法一起用 指定消费位置
public void seek(TopicPartiton partition,long offset)

 Code example:

public class ConsumerDemo3指定偏移量消费 {
    public static void main(String[] args) {

        Properties props = new Properties();
        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"g002");
        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"doit01:9092");
        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        // 是否自动提交消费位移
        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"true");

        // 限制一次poll拉取到的数据量的最大值
        props.setProperty(ConsumerConfig.FETCH_MAX_BYTES_CONFIG,"10240000");
         KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        // assign方式订阅doit27-1的两个分区
        TopicPartition tp0 = new TopicPartition("doit27-1", 0);
        TopicPartition tp1 = new TopicPartition("doit27-1", 1);
        
        consumer.assign(Arrays.asList(tp0,tp1));
        // 指定分区0,从offset:800开始消费    ;  分区1,从offset:650开始消费
        consumer.seek(tp0,200);
        consumer.seek(tp1,250);

        // 开始拉取消息
        while(true){
            ConsumerRecords<String, String> poll = consumer.poll(Duration.ofMillis(3000));
            for (ConsumerRecord<String, String> rec : poll) {
                System.out.println(rec.partition()+","+rec.key()+","+rec.value()+","+rec.offset());
            }
        }
    }
}

 Automatically commit consumer offsets

 

The default submission method of consumption displacement in Kafka is automatic submission. This is configured by the consumer client parameter enable.auto.commit. The default value is true. Of course, this default automatic submission is not submitted once every time a message is consumed, but submitted regularly. This regular cycle time is configured by the client parameter auto.commit.interval.ms. The default value is 5 seconds. The premise for this parameter to take effect is The enable.auto.commit parameter is true.

In the default mode, the consumer will commit the maximum message displacement in each partition it pulls every 5 seconds. The action of automatic displacement submission is completed in the logic of the poll() method. Before each actual pull request is initiated to the server, it will be checked whether the displacement can be submitted. If it can, the displacement of the last poll will be submitted. .

Displacement submission is a major difficulty in the programming logic of Kafka consumption. The method of automatically submitting consumption displacement is very simple. It eliminates the complex displacement submission logic and makes the coding more concise. But what follows is the problem of repeated consumption and message loss .

  • Repeat consumption

Suppose you have just submitted a consumption displacement and then pulled a batch of messages for consumption. Before the next consumption displacement is automatically submitted, the consumer crashes. Then you have to restart consumption from the place where the last displacement was submitted. In this way, repeated consumption occurs. phenomenon (the same applies to rebalancing situations). We can reduce the window size of repeated messages by reducing the time interval for displacement submission, but this does not avoid the sending of repeated consumption, and it will also make displacement submission more frequent.

 

 

  • lost message

According to the general thinking logic, automatic submission is delayed submission, and repeated consumption is understandable. So under what circumstances does message loss occur? Let’s look at the situation in the picture below:

The pull thread continuously pulls messages and stores them in the local cache. For example, in BlockingQueue, another processing thread reads the messages from the cache and performs corresponding logical processing. Assume that the y+lth pull and the mth displacement submission are currently in progress, that is, the displacement before x+6 has been confirmed and submitted, but the processing thread is still processing the x+3 message; at this time , if An exception occurred in the processing thread. After it recovers, it will start pulling messages from the mth displacement submission, which is the position of x+6 . Then the messages between x+3 to x+6 will not be processed accordingly. In this way, message loss occurs.

 

  

 Manually commit consumer offsets (call kafka api )

 

The automatic displacement submission method will not cause message loss or repeated consumption under normal circumstances, but exceptions are unavoidable in the programming world; at the same time, automatic displacement submission cannot achieve accurate displacement management. Kafka also provides a method of manual displacement submission, which allows developers to have more flexible management and control over consumption displacement.

In many cases, it does not mean that the consumption is completed after pulling the message, but that the message needs to be written to the database, written to the local cache, or more complex business processing. In these scenarios, the message must be considered successfully consumed only after all business processing is completed;

The manual submission method allows developers to submit displacements at the appropriate time according to the logic of the program. The prerequisite for enabling the manual submission function is that the consumer client parameter enable.auto.commit is configured as false. The example is as follows:

props.put(ConsumerConf.ENABLE_AUTO_COMMIT_CONFIG, false); 

 

Manual submission can be subdivided into synchronous submission and asynchronous submission, corresponding to commitSync() and asynchronous submission in KafkaConsumer

commitAsync() two types of methods.

  • Synchronous submission method

The commitSync() method is defined as follows:

 

/**
 * 手动提交offset
 */
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> r : records) {
        //do something to process record.
    }
    consumer.commitSync();
}

 For the parameterless method using commitSync(), the frequency of submitting consumption displacement is the same as the frequency of pulling batch messages and processing batch messages. If you want to seek more fine-grained and more accurate submission, you need to use Another parameterized method of commitSync() is specifically defined as follows:

public void commitSync(final Map<TopicPartition,OffsetAndMetadata> offsets)

 The sample code is as follows:

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> r : records) {
        long offset = r.offset();
        //do something to process record.

        TopicPartition topicPartition = new TopicPartition(r.topic(), r.partition());
        consumer.commitSync(Collections.singletonMap(topicPartition,new OffsetAndMetadata(offset+1)));
    }
}

 Submitted offset = offset of consumed record + 1

Because the consumption offset recorded in __consumer_offsets represents the position where the consumer will read next! ! !

  • Asynchronous submission method

The asynchronous submission method (commitAsync()) will not block the consumer thread during execution; a new pull may start before the result of the submitted consumption displacement is returned. Asynchronous submission can enhance consumer performance to a certain extent. The commitAsync method has a different overloaded method, which is defined as follows

 

 

/**
 * 异步提交offset
 */
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> r : records) {
        long offset = r.offset();

        //do something to process record.
        TopicPartition topicPartition = new TopicPartition(r.topic(), r.partition());
        consumer.commitSync(Collections.singletonMap(topicPartition,new OffsetAndMetadata(offset+1)));
        consumer.commitAsync(Collections.singletonMap(topicPartition, new OffsetAndMetadata(offset + 1)), new OffsetCommitCallback() {
     @Override
     public void onComplete(Map<TopicPartition, OffsetAndMetadata> map, Exception e) {
                if(e == null ){
                    System.out.println(map);
                }else{
                    System.out.println("error commit offset");
                }
            }
        });
    }
}

 Submit displacement manually (selection of timing)

 

  • Commit offsets before data processing is complete

Missing processing (data loss) may occur. On the other hand, this method achieves: at most once data processing (transmission) semantics

  • Submit the offset after data processing is completed

The phenomenon of repeated processing (duplication of data) may occur . On the other hand, this method achieves: at least once data processing (transmission) semantics. Of course, the ideal semantics of data processing (transmission) is: exactly once (exactly once) Kafka It can also be done exactly once (transaction mechanism based on kafka)

Code example:

package com.doitedu;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.sql.*;
import java.time.Duration;
import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

public class CommitOffsetByMyself {
    public static void main(String[] args) throws SQLException {

        //获取mysql的连接对象
        Connection connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/football", "root", "123456");
        connection.setAutoCommit(false);
        PreparedStatement pps = connection.prepareStatement("insert into user values (?,?,?)");
        PreparedStatement pps_offset = connection.prepareStatement("insert into offset values (?,?) on duplicate key update offset = ?");

        Properties props = new Properties();
        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "linux01:9092,linux02:9092,linux03:9092");
        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //设置手动提交偏移量参数,需要将自动提交给关掉
        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
        //设置从哪里开始消费
//        props.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        //设置组id
        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "group001");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        //订阅主题
        consumer.subscribe(Arrays.asList("kafka2mysql"), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> collection) {

            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> collection) {
                for (TopicPartition topicPartition : collection) {
                    try {
                        PreparedStatement get_offset = connection.prepareStatement("select offset from offset where topic_partition = ?");
                        String topic = topicPartition.topic();
                        int partition = topicPartition.partition();
                        get_offset.setString(1, topic + "_" + partition);
                        ResultSet resultSet = get_offset.executeQuery();
                        if (resultSet.next()){
                            int offset = resultSet.getInt(1);
                            System.out.println("发生了再均衡,被分配了分区消费权,并且查到了目标分区的偏移量"+partition+" , "+offset);
                            //拿到了offset后就可以定位消费了
                            consumer.seek(new TopicPartition(topic, partition), offset);
                        }
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }
        });

        //拉去数据后写入到mysql
        while (true) {
            try {
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
                for (ConsumerRecord<String, String> record : records) {
                    String data = record.value();
                    String[] arr = data.split(",");
                    String id = arr[0];
                    String name = arr[1];
                    String age = arr[2];

                    pps.setInt(1, Integer.parseInt(id));
                    pps.setString(2, name);
                    pps.setInt(3, Integer.parseInt(age));
                    pps.execute();

                    //埋个异常,看看是不是真的是这样
//                    if (Integer.parseInt(id) == 5) {
//                        throw new SQLException();
//                    }

                    long offset = record.offset();
                    int partition = record.partition();
                    String topic = record.topic();
                    pps_offset.setString(1, topic + "_" + partition);
                    pps_offset.setInt(2, (int) offset + 1);
                    pps_offset.setInt(3, (int) offset + 1);
                    pps_offset.execute();
                    //提交jdbc事务
                    connection.commit();
                }
            } catch (Exception e) {
                connection.rollback();
            }
        }
    }
}

 Summary of how consumers submit offsets

Consumer's consumption displacement submission method:

  • Fully automatic

    • auto.offset.commit = true

    • Submit to consumer_offsets regularly

  • Semi-automatic

    • auto.offset.commit = false;

    • Then manually trigger the submission consumer.commitSync();

    • Submit to consumer_offsets

  • Full manual

    • auto.offset.commit = false;

    • Write your own code to save the consumption displacement to your own location mysql/zk/redis/

    • Submit it to the storage involved; during initialization, you also need to query the consumption displacement from the custom storage.

 

 

Guess you like

Origin blog.csdn.net/m0_53400772/article/details/131035869