kafka 0.10 client使用例子

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xhpscdx/article/details/76263727

一.前言

  1. 在理解本篇内容前请先读kafka特征介绍。
  2. 由于个人理解有限,在遇到特殊问题时请参考kafka官方文档。欢迎指出本篇内容的错误。
  3. 本篇内容会持续更新,请关注。

二.topic,group规范

  1. 为增强topic,group名称所代表的含义,约定如下规则:
  2. topic的名称里只体现生产者。定义:环境_生产者_业务含义。例如:prod_sync_order_snapshot.
  3. 每个group代表代表一个独立的consumer,虽然client支持一个group对应多个topic,但是为区分业务意思,我们还是约定为每个topic创建独立group。 定义:环境_group_消费者_业务含义.例如:prod_group_mind_order_snapshot.
  4. 为了整个系统消费者相互隔离,日后的监控。请希望大家遵守名称规范。

三.思考

  1. producer怎么选择发送到某一个partition,producer是怎样的一个推送模式。
  2. offset是对所有partition,还是某一个partiton。
  3. consumer是怎样选择从多个partition获取数据的。
  4. 怎么构建具有有序性的业务消费消息流。
  5. 上面几个问题是我们在解决业务需求应该理解的问题。

四.10.0后新client

1.生产者

    public static Properties getProducerProperties() {
        // create instance for properties to access producer configs
        Properties props = new Properties();
        //Assign localhost id
        // props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");
        /**
         *1.这里指定server的所有节点
         *2. product客户端支持动态broke节点扩展,metadata.max.age.ms是在一段时间后更新metadata。
         *
         */
        //  props.put("bootstrap.servers", "172.16.30.13:9093");
        //开发测试环境
        props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");
        //  props.put("bootstrap.servers", "dev.kafka1.cnhz.shishike.com:9092");

        /**
         * Set acknowledgements for producer requests.
         * acks=0:意思server不会返回任何确认信息,不保证server是否收到,因为没有返回retires重试机制不会起效。
         * acks=1:意思是partition leader已确认写record到日志中,但是不保证record是否被正确复制(建议设置1)。
         * acks=all:意思是leader将等待所有同步复制broker的ack信息后返回。
         */
        props.put("acks", "1");


        /**
         * 1.If the request fails, the producer can automatically retry,
         * 2.请设置大于0,这个重试机制与我们手动发起resend没有什么不同。
         */
        props.put("retries", 3);

        //
        /**
         * 1.Specify buffer size in config
         * 2. 10.0后product完全支持批量发送给broker,不乱你指定不同parititon,product都是批量自动发送指定parition上。
         * 3. 当batch.size达到最大值就会触发dosend机制。
         */
        props.put("batch.size", 16384);

        /**
         * Reduce the no of requests less than 0;意思在指定batch.size数量没有达到情况下,在5s内也回推送数据
         */
        props.put("linger.ms", 60000);

        /**
         * 1. The buffer.memory controls the total amount of memory available to the producer for buffering.
         * 2. 生产者总内存被应用缓存,压缩,及其它运算。
         *
         */
        props.put("buffer.memory", 33554432);


        /**
         * 可以采用的压缩方式:gzip,snappy
         */
        //  props.put("compression.type", gzip);


        /**
         * 1.请保持producer,consumer 序列化方式一样,如果序列化不一样,将报错。
         */
        props.put("key.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");

        props.put("value.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");
        return props;
    }
 @Test
    public void check_CallBack() throws Exception {
        try {
            CountDownLatch latch = new CountDownLatch(1);
            //Assign topicName to string variable
            String topicName = "page_visits8";
            Producer<String, String> producer = ProductUtils.getProducer();
            Future<RecordMetadata> result = producer.send(new ProducerRecord<String, String>(topicName,
                    "1", "ddddddddd洪10002" + 5), new Callback() {
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {

                    if (exception != null) {
                        exception.printStackTrace();
                        logger.error("find send exception:", exception);
                    }

                    logger.info("callback completion:" + metadata);
                    latch.countDown();
                }

            });
            logger.info("have send info");
            Thread.sleep(10000);
            logger.info("wait 10s");
            producer.flush();
            logger.info(" flush");
            latch.await();
            logger.info(" callback");
            //   RecordMetadata data=result.get();

            //data.

            System.out.println("Message sent successfully");
            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

  1. 为了更好的实现负载均衡和消息的顺序性,Kafka Producer可以通过分发策略发送给指定的Partition。Kafka只保证在partition中的消息是有序的。分发策略依赖于Partitioner接口的实现。
  2. 查看kafka默认的策略类DefaultPartitioner,我知道分发策略提供轮询,根据key的hash值选择,或者直接指定partition。请参考下面代码
/**
 * The default partitioning strategy:
 * <ul>
 * <li>If a partition is specified in the record, use it
 * <li>If no partition is specified but a key is present choose a partition based on a hash of the key
 * <li>If no partition or key is present choose a partition in a round-robin fashion
 */

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = counter.getAndIncrement();
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return DefaultPartitioner.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            return DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

2.生产者有序性

  1. 我通过如下测试实验,topic page_visits5有两个partition。producer通过key的hash指定partition。构建两个consumer指定读取不同的partition。通过实验可以知道offset是对某一个partition成自增长。

  2. testCustomerByPartitionOne的日志

ConsumerRecord(topic = page_visits5, partition = 0, offset = 605, CreateTime = 1499752027859, checksum = 2595474072, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000245)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 606, CreateTime = 1499752027859, checksum = 62561058, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000246)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 607, CreateTime = 1499752027859, checksum = 1958587316, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000247)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 608, CreateTime = 1499752027859, checksum = 3825382949, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000248)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 609, CreateTime = 1499752027860, checksum = 1633914638, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000249)

  1. testCustomerByPartitionTwo的日志
ConsumerRecord(topic = page_visits5, partition = 1, offset = 604, CreateTime = 1499752027859, checksum = 1821482793, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000244)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 605, CreateTime = 1499752027859, checksum = 462860223, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000245)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 606, CreateTime = 1499752027859, checksum = 2191523333, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000246)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 607, CreateTime = 1499752027859, checksum = 4120432275, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000247)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 608, CreateTime = 1499752027860, checksum = 2537675455, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000248)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 609, CreateTime = 1499752027860, checksum = 3762743849, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000249)
  1. product代码
@Test
    public  void assignPartitionByKey() throws Exception {
        try {

            //Assign topicName to string variable
            String topicName = "page_visits5";
            Producer<String, String> producer = getProducer();


            for (int i = 0; i < 50; i++) {
                for(int j=0;j<2;j++) {
                    producer.send(new ProducerRecord<String, String>(topicName,
                            Integer.toString(j), "ddddddddd洪10002" + i));

                    System.out.println("Message sent successfully");
                }
            }
            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
  1. 两个producer指定partition的发送消息
 @Test
    public void testCustomerByPartitionOne() throws Exception {
        //Kafka consumer configuration settings
        String topicName = "page_visits5";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic

        TopicPartition partition0 = new TopicPartition(topicName, 0);
        consumer.assign(Arrays.asList(partition0));

        //print the topic name
        System.out.println("Subscribed to topic " + topicName);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("======one===================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //    readPartition(consumer, records);


        }
    }

    @Test
    public void testCustomerByPartitionTwo() throws Exception {
        //Kafka consumer configuration settings
        //  String topicName = "page_visits4";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        String topic = "page_visits5";
        TopicPartition partition1 = new TopicPartition(topic, 1);
        consumer.assign(Arrays.asList(partition1));

        //print the topic name
        System.out.println("Subscribed to topic " + topic);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("======two===================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //    readPartition(consumer, records);


        }
    }


  1. 更多使用请参考官方:http://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html

3.消费者

  1. 10.0的kafka的consumer比老版本的提供手动确认Manual Offset Control功能。
  2. 介绍一下Consumer Rebalance的概念。在多个app实例前提下,Kafka保证同一consumer group中只有一个consumerApp实例会消费某一条消息。实际上,Kafka保证的是稳定状态下每一个consumer实例只会消费某一个或多个特定 partition的数据,而某个partition的数据只会被某一个特定的consumer实例所消费。这样设计的劣势是无法让同一个 consumer group里的consumer均匀消费数据,优势是每个consumer不用都跟大量的broker通信,减少通信开销,同时也降低了分配难度,实现也 更简单。另外,因为同一个partition里的数据是有序的,这种设计可以保证每个partition里的数据也是有序被消费。
  3. 如果某consumer group中consumer数量少于partition数量,则至少有一个consumer会消费多个partition的数据,如果consumer 的数量与partition数量相同,则正好一个consumer消费一个partition的数据,而如果consumer的数量多于 partition的数量时,会有部分consumer无法消费该topic下任何一条消息。 partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]

  4. 参考文档:http://www.cnblogs.com/coprince/p/5893066.html

  5. 官网还提供指定partition消费,重置consumer消费的offset位置,KafkaConsumer api及例子
 /**
     * http://www.tutorialspoint.com/apache_kafka/apache_kafka_simple_producer_example.htm
     *
     * @throws Exception
     */
    // @Test
    public void testCustomer() throws Exception {


        //Kafka consumer configuration settings
        String topicName = "page_visits5";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        consumer.subscribe(Arrays.asList(topicName, "page_visits5"));
        //print the topic name
        System.out.println("Subscribed to topic " + topicName);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("=========================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //     readPartition(consumer, records);


        }
    }

    private KafkaConsumer<String, String> getKafkaConsumer(Properties props) {
        props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");
        //  props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");

        props.put("group.id", "group-2");

        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        //每次poll方法调用都是client与server的一次心跳
        props.put("session.timeout.ms", "30000");
        //so it's natural to want to set a limit on the number of records handled at once. This setting provides that
        // . By default, there is essentially no limit.
       // props.put("max.poll.records", "2");
        props.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        return new KafkaConsumer<String, String>(props);
    }

4.手动确认消息

  1. kafka 9.0以后提供手动控制offset。通过下面的实验明白可以控制。
  2. 需要把enable.auto.commit=false,关闭自动提交。可以按其partition批量提交,也一条一条提交。
  3. 目前生产场景还没实例验证其稳定性,实践注意相关测试。
  4. consumer code
 @Test
    public void testCustomerByOneByOne() throws Exception {
        //Kafka consumer configuration settings
        //  String topicName = "page_visits4";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props, false);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        String topic = "page_visits5";
//        TopicPartition partition1 = new TopicPartition(topic, 1);
//        consumer.assign(Arrays.asList(partition1));
        consumer.subscribe(Arrays.asList(topic));
        //print the topic name
        System.out.println("Subscribed to topic " + topic);
        int i = 0;

        try {
            while (true) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (TopicPartition partition : records.partitions()) {
                    List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
                    for (ConsumerRecord<String, String> record : partitionRecords) {
                        // System.out.println(record.offset() + ": " + record.value());
                        System.out.println(record);
                         // one by one 提交
                        long lastOffset=record.offset();
                        consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
                    }

                      // 批量
//                    long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();/
//                    // 这个提交数量 与 max.poll.records取的数量有关,
//                    consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
                }
            }
        }catch(Exception e){
            e.printStackTrace();
        } finally {
            consumer.close();
        }

    }

  1. 批量监控offset

[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:32:25,130] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   300             400             100             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:33:02,699] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   300             400             100             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:35:10,992] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   310             400             90              none


  1. one by one监控

[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 11:01:30,047] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   311             500             189             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 11:01:46,404] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   315             500             185             none

五.8.0 旧client

  1. kafka 8.0 client api代码在10.0版本依然保留了。但是client的version必须升级10.0才能与其server通讯。
  2. 在10.0版本,老api与新api的producer,consumer是互通的。
  3. 老版本producer生产者有序性实践参考新版本的producer。

1.生产者

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

import java.util.Properties;


public abstract class ProducerUtil {
    private static Properties props = new Properties();// 配置文件
    private static Producer<String, String> producer = null;// 生产者

    private static Producer<String, String> oldProducer = null;// 生产者


    static {
        if (producer == null) {
            // props.put("metadata.broker.list", "localhost:9092,");
            props.put("metadata.broker.list", "172.16.1.248:9092,172.16.1.248:9093");
            // ce
            //    props.put("metadata.broker.list", "172.16.1.248:9095");

            props.put("serializer.class", "kafka.serializer.StringEncoder");
            props.put("key.serializer.class", "kafka.serializer.StringEncoder");

            // key.serializer.class默认为serializer.class
            // props.put("partitioner.class", "com.magic.cd.test.PartitionerDemo");
            // 可选配置,如果不配置,则使用默认的partitioner
            props.put("request.required.acks", "1");
            producer = new Producer<String, String>(new ProducerConfig(props));
        }
    }


   // 普通测试
    @Test
    public void testProducer2() throws Exception {
        try {
            //  ProducerUtil.sendMsg("page_visits4", "bbk", "你好一般测试!");
            long b1 = System.currentTimeMillis();
            for (int i = 0; i < 10; i++) {
                ProducerUtil.sendMsg("page_visits4", "bbk", "你好一般测试!" + i);

            }
            long b2 = System.currentTimeMillis();
            System.out.println("时间:" + (b2 - b1));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

2.消费者

  1. 下面的例子是consumer生产的多线程消费版本。结合上面提到consumer线程数与partition的关系,结合生产app实例数。在设置topicCountMap.put(KafkaConfigUtils.ORDER_TOPIC_NAME, topicCount); 请注意。如果是单app ,topicCount应该大于等于partition数。

    private static Properties props = new Properties();// 配置文件
    private static Producer<String, String> producer = null;// 生产者

    private static int topicCount = 3;
    /**
     * 消费者,消费线程池
     */
    private ExecutorService executor = Executors.newFixedThreadPool(topicCount);

   * 初始化消费者,8-5暂停该类的xiao
     */
    @PostConstruct
    public void initCustomer() {
        logger.debug("=======System  start init consumer client.===================");
        if (!KafkaConfigUtils.IS_START_CONSUMER) {
            logger.debug("System do not start consumer client.");
            return;
        }
        new Thread(new CustomerKafka()).start();
    }


    /**
     * 消费者启动线程
     *
     * @author my
     * @Date 2016年3月30日 上午11:09:11
     */
    public class CustomerKafka implements Runnable {
        public void run() {
            logger.debug("system starting listern kafka message.");
            ConsumerConnector consumer = ConsumerConnector();
            while (consumer == null) {
                try {
                    Thread.sleep(600000); // 60 second
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                consumer = ConsumerConnector(); // reconnect
            }

            logger.debug(" kafka connector success.");

            snapshotService.initSnapshotRule();
            logger.debug(" snapshot rule init success ");

            // 3.通过消费者获取流
            Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
            topicCountMap.put(KafkaConfigUtils.ORDER_TOPIC_NAME, topicCount);// 数字表示通过几个流执行
            Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
            final List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(KafkaConfigUtils.ORDER_TOPIC_NAME);


            logger.debug("read streams ");
            // 4.读取消息
            for (int i = 0; i < streams.size(); i++) {
                final KafkaStream stream = streams.get(i);
                executor.execute(new Runnable() {
                    public void run() {
                        ConsumerIterator<byte[], byte[]> it = stream.iterator();
                        while (it.hasNext()) {
                            byte[] each = it.next().message();
                            String message = new String(each);
                            logger.info("receive order message:" + StringUtils.substring(message, 0, 500) + "........");
                            // logger.debug("receive order message:" + message);
                            //  saveOrderMessageTwo(message);

                            saveOrderMessageThree(message);
                        }
                    }
                });
            }
        }



public static Map<String, Object> getKafkaConsumer(boolean commintFlag) {
        Map<String, Object> props = new HashMap<>();
        //  props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");

        // props.put("bootstrap.servers", "172.16.30.13:9093");
        //开发测试环境
        props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");

        props.put("group.id", "group-new");
        /**
         * 1.
         *
         */
        props.put("enable.auto.commit", String.valueOf(commintFlag));

        /**
         * 1.自动提交offset间隔时间,•可以这样理解:第二次poll调用的时候,提交上一次poll的offset和心跳发送。
         * 2. 而且是交付一个DelayedTaskQueue 来完成的
         */
        props.put("auto.commit.interval.ms", "1000");


        /**
         * 意思每次心跳间隔时间,要求不高于session.timeout.ms时间1/3
         */
        //  props.put("heartbeat.interval.ms", "1000");

        props.put("session.timeout.ms", "30000");

        /**
         * auto.offset.reset 默认值为largest,那么auto.offset.reset 有什么作用呢?auto.offset
         * .reset定义了Consumer在ZooKeeper中发现没有初始的offset时或者发现offset非法时定义Comsumer的行为,常见的配置有:
         *1.smallest : 自动把offset设为最小的offset;
         *2.largest : 自动把offset设为最大的offset;
         *3.anything else: 抛出异常;
         *
         *遇到过这种情况:先produce一些数据,然后停止produce数据的线程——〉
         * 然后再用consumer 新的group上面的代码消费数据,发现无数据可消费!
         *
         *其原因在于:初始的offset默认是非法的,而auto.offset.reset 默认值为largest,表示自动把offset设为最大的offset,由于此时没有生产者向kafka
         * push数据,当然没有数据可以消费了。如果此时有生产者向kafka push数据,那么该代码可以从最新位置消费数据。
         */
        props.put("auto.offset.reset", "earliest");
        /**
         * so it's natural to want to set a limit on the number of records handled at once. This setting provides that
         * . By default, there is essentially no limit.
         * 1.我发现offset的每次确认与心跳都是在调用poll方法的时候触发,建议max.poll.records设置为100-400的数量,如果你处理速度慢建议低些。
         *
         */
        props.put("max.poll.records", "10");


        props.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        // return new KafkaConsumer<String, String>(props);
        return props;
    }

六 升级兼容使用

  1. 如果你的app中有老kafka0.8的client代码依赖,请直接使用下面新的clients与老api完全隔离,不相冲突。
  2. 如果你producer也添加新依赖,重新构建producer来实现新业务。
  3. 如果你是consumer,如果老topic还有大量message未消费。则让新老consumer获取同时调用你业务处理逻辑来完成上线老client没有获取的message。
<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.10.0.0</version>
 </dependency>
  1. 请不使用如下dependency来构建新的kafka client。
<dependency>
            <groupId>org.apache.kafka</groupId>           
            <artifactId>kafka_2.10</artifactId>
            <version>0.10.0.0</version>
</dependency>

七.问题反馈

1.producer采用什么发送模式。

  1. producer采用异步批量的发送模式,由单独sender线程来完成发送,发送机制根据batch.size (批量累积的大小) ,linger.ms(发送间隔时间)来触发发送。 我们通过指定key或者 partition也是批量异步发送。请关注上面producer参数描述。
  2. 在KafkaProducer在构造方法里初始化RecordAccumulator,Sender两个组件。
  3. RecordAccumulator是一个record缓存器,它按照不同的TopicPartition有序的存储RecordBatch。
  4. Sender是一个独立发送线程,在不断的轮询读取RecordAccumulator。按max.request.size读RecordBatch组装request发送给broker
  5. 更多细节参考代码与文档 http://www.cnblogs.com/byrhuangqiang/p/6392532.html

2.一个队列最重要的就是消息丢失问题,kafka是如何处理的?

  1. kafka producer都是异步发送。 每次发送数据时,Producer都是send()之后就认为已经发送出去了,但其实大多数情况下消息还在内存的RecordAccumulator当中。这时候如果Producer挂掉,那就会出现丢数据的情况。或者发送失败,客服端不能明确知道消息是否成功。

  2. 解决办法: ack机制,一般设置为acks=1,消息只需要被Leader接受并确认即可,这样同时保证了可靠性和效率。对于我们想明确指定消息是否成功,请在producer.send 实现其callback方法,callback方法是在明确server返回后的回调机制。请用单独日志文件打印来跟踪丢失的消息。

3.producer是怎么更新Metadata信息的?

  1. Metadata指topic的partition的leader,follower信息,及broker节点信息。producer在每次发送时候都会检查Metadata的有效性。有周期性,及失效后更新。更多关注http://www.cnblogs.com/byrhuangqiang/p/6377961.html
  2. 由于Metadata支持更新,所以动态水平的扩展broker是没有问题的。

4. kafka consumer的rebalance机制

  1. rebalance机制的意思,kafka怎么样根据group的实例数量来动态的分配topic的partition数量,来提高并发效率。每次poll方法的时候都会与broker保持心跳,更新server信息及consumer的信息。
  2. 举一个例子描述consumer rebalance机制:

    1. 我通过测试例子来观察,2partition的topic。只有一个consomer a instance的时候,两个partition都会由a实例轮询消费。
    2. 再添加第2个消费 b instance后,partition会立即自动重新分配,由a,b应用各自负责一个partition消费。
    3. 再关掉b 实例后,partition又会立即自动重新分配,由a 实例轮询消费partition

5. Consumer消费者的工作过程

  1. 在consumer启动时或者coordinator节点故障转移时,consumer发送ConsumerMetadataRequest(请求topic的partition等信息)给任意一个brokers。在ConsumerMetadataResponse中,它接收对应的Consumer Group所属的Coordinator的位置信息。
  2. Consumer连接Coordinator节点,并发送HeartbeatRequest。如果返回的HeartbeatResponse中返回IllegalGeneration错误码,说明协调节点已经在初始化平衡。消费者就会停止抓取数据,提交offsets,发送JoinGroupRequest给协调节点。在JoinGroupResponse,它接收消费者应该拥有的topic-partitions列表以及当前Consumer Group的新的generation编号。这个时候Consumer Group管理已经完成,Consumer就可以开始fetch数据,并为它拥有的partitions提交offsets。
  3. 如果HeartbeatResponse没有错误返回,Consumer会从它上次拥有的partitions列表继续抓取数据,这个过程是不会被中断的。
  4. 这些通讯步骤都在poll函数中异步完成。参考http://www.cnblogs.com/byrhuangqiang/p/6372600.html

猜你喜欢

转载自blog.csdn.net/xhpscdx/article/details/76263727