kafka 0.10 client使用例子

一.前言

在理解本篇内容前请先读kafka特征介绍。
由于个人理解有限，在遇到特殊问题时请参考kafka官方文档。欢迎指出本篇内容的错误。
本篇内容会持续更新，请关注。

二.topic,group规范

为增强topic,group名称所代表的含义，约定如下规则：
topic的名称里只体现生产者。定义：环境_生产者_业务含义。例如：prod_sync_order_snapshot.
每个group代表代表一个独立的consumer，虽然client支持一个group对应多个topic，但是为区分业务意思，我们还是约定为每个topic创建独立group。定义:环境_group_消费者_业务含义.例如：prod_group_mind_order_snapshot.
为了整个系统消费者相互隔离，日后的监控。请希望大家遵守名称规范。

三.思考

producer怎么选择发送到某一个partition,producer是怎样的一个推送模式。
offset是对所有partition，还是某一个partiton。
consumer是怎样选择从多个partition获取数据的。
怎么构建具有有序性的业务消费消息流。
上面几个问题是我们在解决业务需求应该理解的问题。

四.10.0后新client

1.生产者

    public static Properties getProducerProperties() {
        // create instance for properties to access producer configs
        Properties props = new Properties();
        //Assign localhost id
        // props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");
        /**
         *1.这里指定server的所有节点
         *2. product客户端支持动态broke节点扩展，metadata.max.age.ms是在一段时间后更新metadata。
         *
         */
        //  props.put("bootstrap.servers", "172.16.30.13:9093");
        //开发测试环境
        props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");
        //  props.put("bootstrap.servers", "dev.kafka1.cnhz.shishike.com:9092");

        /**
         * Set acknowledgements for producer requests.
         * acks=0：意思server不会返回任何确认信息，不保证server是否收到，因为没有返回retires重试机制不会起效。
         * acks=1：意思是partition leader已确认写record到日志中，但是不保证record是否被正确复制(建议设置1)。
         * acks=all：意思是leader将等待所有同步复制broker的ack信息后返回。
         */
        props.put("acks", "1");


        /**
         * 1.If the request fails, the producer can automatically retry,
         * 2.请设置大于0，这个重试机制与我们手动发起resend没有什么不同。
         */
        props.put("retries", 3);

        //
        /**
         * 1.Specify buffer size in config
         * 2. 10.0后product完全支持批量发送给broker，不乱你指定不同parititon，product都是批量自动发送指定parition上。
         * 3. 当batch.size达到最大值就会触发dosend机制。
         */
        props.put("batch.size", 16384);

        /**
         * Reduce the no of requests less than 0;意思在指定batch.size数量没有达到情况下，在5s内也回推送数据
         */
        props.put("linger.ms", 60000);

        /**
         * 1. The buffer.memory controls the total amount of memory available to the producer for buffering.
         * 2. 生产者总内存被应用缓存，压缩，及其它运算。
         *
         */
        props.put("buffer.memory", 33554432);


        /**
         * 可以采用的压缩方式：gzip，snappy
         */
        //  props.put("compression.type", gzip);


        /**
         * 1.请保持producer，consumer 序列化方式一样，如果序列化不一样，将报错。
         */
        props.put("key.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");

        props.put("value.serializer",
                "org.apache.kafka.common.serialization.StringSerializer");
        return props;
    }

 @Test
    public void check_CallBack() throws Exception {
        try {
            CountDownLatch latch = new CountDownLatch(1);
            //Assign topicName to string variable
            String topicName = "page_visits8";
            Producer<String, String> producer = ProductUtils.getProducer();
            Future<RecordMetadata> result = producer.send(new ProducerRecord<String, String>(topicName,
                    "1", "ddddddddd洪10002" + 5), new Callback() {
                @Override
                public void onCompletion(RecordMetadata metadata, Exception exception) {

                    if (exception != null) {
                        exception.printStackTrace();
                        logger.error("find send exception:", exception);
                    }

                    logger.info("callback completion:" + metadata);
                    latch.countDown();
                }

            });
            logger.info("have send info");
            Thread.sleep(10000);
            logger.info("wait 10s");
            producer.flush();
            logger.info(" flush");
            latch.await();
            logger.info(" callback");
            //   RecordMetadata data=result.get();

            //data.

            System.out.println("Message sent successfully");
            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

为了更好的实现负载均衡和消息的顺序性，Kafka Producer可以通过分发策略发送给指定的Partition。Kafka只保证在partition中的消息是有序的。分发策略依赖于Partitioner接口的实现。
查看kafka默认的策略类DefaultPartitioner，我知道分发策略提供轮询，根据key的hash值选择，或者直接指定partition。请参考下面代码

/**
 * The default partitioning strategy:
 * <ul>
 * <li>If a partition is specified in the record, use it
 * <li>If no partition is specified but a key is present choose a partition based on a hash of the key
 * <li>If no partition or key is present choose a partition in a round-robin fashion
 */

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = counter.getAndIncrement();
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return DefaultPartitioner.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            return DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

2.生产者有序性

我通过如下测试实验，topic page_visits5有两个partition。producer通过key的hash指定partition。构建两个consumer指定读取不同的partition。通过实验可以知道offset是对某一个partition成自增长。
testCustomerByPartitionOne的日志

ConsumerRecord(topic = page_visits5, partition = 0, offset = 605, CreateTime = 1499752027859, checksum = 2595474072, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000245)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 606, CreateTime = 1499752027859, checksum = 62561058, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000246)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 607, CreateTime = 1499752027859, checksum = 1958587316, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000247)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 608, CreateTime = 1499752027859, checksum = 3825382949, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000248)
ConsumerRecord(topic = page_visits5, partition = 0, offset = 609, CreateTime = 1499752027860, checksum = 1633914638, serialized key size = 1, serialized value size = 19, key = 0, value = ddddddddd洪1000249)

testCustomerByPartitionTwo的日志

ConsumerRecord(topic = page_visits5, partition = 1, offset = 604, CreateTime = 1499752027859, checksum = 1821482793, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000244)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 605, CreateTime = 1499752027859, checksum = 462860223, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000245)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 606, CreateTime = 1499752027859, checksum = 2191523333, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000246)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 607, CreateTime = 1499752027859, checksum = 4120432275, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000247)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 608, CreateTime = 1499752027860, checksum = 2537675455, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000248)
ConsumerRecord(topic = page_visits5, partition = 1, offset = 609, CreateTime = 1499752027860, checksum = 3762743849, serialized key size = 1, serialized value size = 19, key = 1, value = ddddddddd洪1000249)

product代码

@Test
    public  void assignPartitionByKey() throws Exception {
        try {

            //Assign topicName to string variable
            String topicName = "page_visits5";
            Producer<String, String> producer = getProducer();


            for (int i = 0; i < 50; i++) {
                for(int j=0;j<2;j++) {
                    producer.send(new ProducerRecord<String, String>(topicName,
                            Integer.toString(j), "ddddddddd洪10002" + i));

                    System.out.println("Message sent successfully");
                }
            }
            producer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

两个producer指定partition的发送消息

 @Test
    public void testCustomerByPartitionOne() throws Exception {
        //Kafka consumer configuration settings
        String topicName = "page_visits5";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic

        TopicPartition partition0 = new TopicPartition(topicName, 0);
        consumer.assign(Arrays.asList(partition0));

        //print the topic name
        System.out.println("Subscribed to topic " + topicName);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("======one===================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //    readPartition(consumer, records);


        }
    }

    @Test
    public void testCustomerByPartitionTwo() throws Exception {
        //Kafka consumer configuration settings
        //  String topicName = "page_visits4";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        String topic = "page_visits5";
        TopicPartition partition1 = new TopicPartition(topic, 1);
        consumer.assign(Arrays.asList(partition1));

        //print the topic name
        System.out.println("Subscribed to topic " + topic);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("======two===================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //    readPartition(consumer, records);


        }
    }

更多使用请参考官方：http://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html

3.消费者

10.0的kafka的consumer比老版本的提供手动确认Manual Offset Control功能。
介绍一下Consumer Rebalance的概念。在多个app实例前提下，Kafka保证同一consumer group中只有一个consumerApp实例会消费某一条消息。实际上，Kafka保证的是稳定状态下每一个consumer实例只会消费某一个或多个特定 partition的数据，而某个partition的数据只会被某一个特定的consumer实例所消费。这样设计的劣势是无法让同一个 consumer group里的consumer均匀消费数据，优势是每个consumer不用都跟大量的broker通信，减少通信开销，同时也降低了分配难度，实现也更简单。另外，因为同一个partition里的数据是有序的，这种设计可以保证每个partition里的数据也是有序被消费。
如果某consumer group中consumer数量少于partition数量，则至少有一个consumer会消费多个partition的数据，如果consumer 的数量与partition数量相同，则正好一个consumer消费一个partition的数据，而如果consumer的数量多于 partition的数量时，会有部分consumer无法消费该topic下任何一条消息。 partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
参考文档：http://www.cnblogs.com/coprince/p/5893066.html ，
官网还提供指定partition消费，重置consumer消费的offset位置，KafkaConsumer api及例子

 /**
     * http://www.tutorialspoint.com/apache_kafka/apache_kafka_simple_producer_example.htm
     *
     * @throws Exception
     */
    // @Test
    public void testCustomer() throws Exception {


        //Kafka consumer configuration settings
        String topicName = "page_visits5";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        consumer.subscribe(Arrays.asList(topicName, "page_visits5"));
        //print the topic name
        System.out.println("Subscribed to topic " + topicName);
        int i = 0;
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            if (!records.isEmpty()) {
                System.out.println("=========================");
            }
            for (ConsumerRecord<String, String> record : records) {
                System.out.println(record);
            }
            //     readPartition(consumer, records);


        }
    }

    private KafkaConsumer<String, String> getKafkaConsumer(Properties props) {
        props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");
        //  props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");

        props.put("group.id", "group-2");

        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        //每次poll方法调用都是client与server的一次心跳
        props.put("session.timeout.ms", "30000");
        //so it's natural to want to set a limit on the number of records handled at once. This setting provides that
        // . By default, there is essentially no limit.
       // props.put("max.poll.records", "2");
        props.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        return new KafkaConsumer<String, String>(props);
    }

4.手动确认消息

kafka 9.0以后提供手动控制offset。通过下面的实验明白可以控制。
需要把enable.auto.commit=false，关闭自动提交。可以按其partition批量提交，也一条一条提交。
目前生产场景还没实例验证其稳定性，实践注意相关测试。
consumer code

 @Test
    public void testCustomerByOneByOne() throws Exception {
        //Kafka consumer configuration settings
        //  String topicName = "page_visits4";
        Properties props = new Properties();
        KafkaConsumer<String, String> consumer = getKafkaConsumer(props, false);
        //Kafka Consumer subscribes list of topics here.
        //这里支持配置多个topic
        String topic = "page_visits5";
//        TopicPartition partition1 = new TopicPartition(topic, 1);
//        consumer.assign(Arrays.asList(partition1));
        consumer.subscribe(Arrays.asList(topic));
        //print the topic name
        System.out.println("Subscribed to topic " + topic);
        int i = 0;

        try {
            while (true) {
                ConsumerRecords<String, String> records = consumer.poll(1000);
                for (TopicPartition partition : records.partitions()) {
                    List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
                    for (ConsumerRecord<String, String> record : partitionRecords) {
                        // System.out.println(record.offset() + ": " + record.value());
                        System.out.println(record);
                         // one by one 提交
                        long lastOffset=record.offset();
                        consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
                    }

                      // 批量
//                    long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();/
//                    // 这个提交数量 与 max.poll.records取的数量有关，
//                    consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
                }
            }
        }catch(Exception e){
            e.printStackTrace();
        } finally {
            consumer.close();
        }

    }

批量监控offset


[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:32:25,130] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   300             400             100             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:33:02,699] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   300             400             100             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ 
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 10:35:10,992] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   310             400             90              none

one by one监控


[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 11:01:30,047] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   311             500             189             none
[work@iZbp14iiauukqckkhyphv9Z kafka_2.10-0.10.0.1]$ ./bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 172.16.30.13:2181 --group group-new --topic page_visits5
[2017-07-12 11:01:46,404] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use ConsumerGroupCommand instead. (kafka.tools.ConsumerOffsetChecker$)
Group           Topic                          Pid Offset          logSize         Lag             Owner
group-new       page_visits5                   0   315             500             185             none

五.8.0 旧client

kafka 8.0 client api代码在10.0版本依然保留了。但是client的version必须升级10.0才能与其server通讯。
在10.0版本，老api与新api的producer，consumer是互通的。
老版本producer生产者有序性实践参考新版本的producer。

1.生产者

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

import java.util.Properties;


public abstract class ProducerUtil {
    private static Properties props = new Properties();// 配置文件
    private static Producer<String, String> producer = null;// 生产者

    private static Producer<String, String> oldProducer = null;// 生产者


    static {
        if (producer == null) {
            // props.put("metadata.broker.list", "localhost:9092,");
            props.put("metadata.broker.list", "172.16.1.248:9092,172.16.1.248:9093");
            // ce
            //    props.put("metadata.broker.list", "172.16.1.248:9095");

            props.put("serializer.class", "kafka.serializer.StringEncoder");
            props.put("key.serializer.class", "kafka.serializer.StringEncoder");

            // key.serializer.class默认为serializer.class
            // props.put("partitioner.class", "com.magic.cd.test.PartitionerDemo");
            // 可选配置，如果不配置，则使用默认的partitioner
            props.put("request.required.acks", "1");
            producer = new Producer<String, String>(new ProducerConfig(props));
        }
    }


   // 普通测试
    @Test
    public void testProducer2() throws Exception {
        try {
            //  ProducerUtil.sendMsg("page_visits4", "bbk", "你好一般测试！");
            long b1 = System.currentTimeMillis();
            for (int i = 0; i < 10; i++) {
                ProducerUtil.sendMsg("page_visits4", "bbk", "你好一般测试！" + i);

            }
            long b2 = System.currentTimeMillis();
            System.out.println("时间：" + (b2 - b1));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

2.消费者

下面的例子是consumer生产的多线程消费版本。结合上面提到consumer线程数与partition的关系，结合生产app实例数。在设置topicCountMap.put(KafkaConfigUtils.ORDER_TOPIC_NAME, topicCount); 请注意。如果是单app ，topicCount应该大于等于partition数。


    private static Properties props = new Properties();// 配置文件
    private static Producer<String, String> producer = null;// 生产者

    private static int topicCount = 3;
    /**
     * 消费者，消费线程池
     */
    private ExecutorService executor = Executors.newFixedThreadPool(topicCount);

   * 初始化消费者,8-5暂停该类的xiao
     */
    @PostConstruct
    public void initCustomer() {
        logger.debug("=======System  start init consumer client.===================");
        if (!KafkaConfigUtils.IS_START_CONSUMER) {
            logger.debug("System do not start consumer client.");
            return;
        }
        new Thread(new CustomerKafka()).start();
    }


    /**
     * 消费者启动线程
     *
     * @author my
     * @Date 2016年3月30日 上午11:09:11
     */
    public class CustomerKafka implements Runnable {
        public void run() {
            logger.debug("system starting listern kafka message.");
            ConsumerConnector consumer = ConsumerConnector();
            while (consumer == null) {
                try {
                    Thread.sleep(600000); // 60 second
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                consumer = ConsumerConnector(); // reconnect
            }

            logger.debug(" kafka connector success.");

            snapshotService.initSnapshotRule();
            logger.debug(" snapshot rule init success ");

            // 3.通过消费者获取流
            Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
            topicCountMap.put(KafkaConfigUtils.ORDER_TOPIC_NAME, topicCount);// 数字表示通过几个流执行
            Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
            final List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(KafkaConfigUtils.ORDER_TOPIC_NAME);


            logger.debug("read streams ");
            // 4.读取消息
            for (int i = 0; i < streams.size(); i++) {
                final KafkaStream stream = streams.get(i);
                executor.execute(new Runnable() {
                    public void run() {
                        ConsumerIterator<byte[], byte[]> it = stream.iterator();
                        while (it.hasNext()) {
                            byte[] each = it.next().message();
                            String message = new String(each);
                            logger.info("receive order message:" + StringUtils.substring(message, 0, 500) + "........");
                            // logger.debug("receive order message:" + message);
                            //  saveOrderMessageTwo(message);

                            saveOrderMessageThree(message);
                        }
                    }
                });
            }
        }



public static Map<String, Object> getKafkaConsumer(boolean commintFlag) {
        Map<String, Object> props = new HashMap<>();
        //  props.put("bootstrap.servers", "172.16.1.248:9092,172.16.1.248:9093");

        // props.put("bootstrap.servers", "172.16.30.13:9093");
        //开发测试环境
        props.put("bootstrap.servers", "172.16.30.13:9095,172.16.30.13:9096");

        props.put("group.id", "group-new");
        /**
         * 1.
         *
         */
        props.put("enable.auto.commit", String.valueOf(commintFlag));

        /**
         * 1.自动提交offset间隔时间，•可以这样理解：第二次poll调用的时候，提交上一次poll的offset和心跳发送。
         * 2. 而且是交付一个DelayedTaskQueue 来完成的
         */
        props.put("auto.commit.interval.ms", "1000");


        /**
         * 意思每次心跳间隔时间，要求不高于session.timeout.ms时间1/3
         */
        //  props.put("heartbeat.interval.ms", "1000");

        props.put("session.timeout.ms", "30000");

        /**
         * auto.offset.reset 默认值为largest，那么auto.offset.reset 有什么作用呢？auto.offset
         * .reset定义了Consumer在ZooKeeper中发现没有初始的offset时或者发现offset非法时定义Comsumer的行为，常见的配置有：
         *1.smallest : 自动把offset设为最小的offset；
         *2.largest : 自动把offset设为最大的offset；
         *3.anything else: 抛出异常；
         *
         *遇到过这种情况：先produce一些数据，然后停止produce数据的线程——〉
         * 然后再用consumer 新的group上面的代码消费数据，发现无数据可消费！
         *
         *其原因在于：初始的offset默认是非法的，而auto.offset.reset 默认值为largest，表示自动把offset设为最大的offset，由于此时没有生产者向kafka
         * push数据，当然没有数据可以消费了。如果此时有生产者向kafka push数据，那么该代码可以从最新位置消费数据。
         */
        props.put("auto.offset.reset", "earliest");
        /**
         * so it's natural to want to set a limit on the number of records handled at once. This setting provides that
         * . By default, there is essentially no limit.
         * 1.我发现offset的每次确认与心跳都是在调用poll方法的时候触发，建议max.poll.records设置为100-400的数量，如果你处理速度慢建议低些。
         *
         */
        props.put("max.poll.records", "10");


        props.put("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        // return new KafkaConsumer<String, String>(props);
        return props;
    }

六升级兼容使用

如果你的app中有老kafka0.8的client代码依赖，请直接使用下面新的clients与老api完全隔离，不相冲突。
如果你producer也添加新依赖，重新构建producer来实现新业务。
如果你是consumer，如果老topic还有大量message未消费。则让新老consumer获取同时调用你业务处理逻辑来完成上线老client没有获取的message。

<dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.10.0.0</version>
 </dependency>

请不使用如下dependency来构建新的kafka client。

<dependency>
            <groupId>org.apache.kafka</groupId>           
            <artifactId>kafka_2.10</artifactId>
            <version>0.10.0.0</version>
</dependency>

七.问题反馈

1.producer采用什么发送模式。

producer采用异步批量的发送模式，由单独sender线程来完成发送，发送机制根据batch.size （批量累积的大小），linger.ms（发送间隔时间）来触发发送。我们通过指定key或者 partition也是批量异步发送。请关注上面producer参数描述。
在KafkaProducer在构造方法里初始化RecordAccumulator，Sender两个组件。
RecordAccumulator是一个record缓存器，它按照不同的TopicPartition有序的存储RecordBatch。
Sender是一个独立发送线程，在不断的轮询读取RecordAccumulator。按max.request.size读RecordBatch组装request发送给broker
更多细节参考代码与文档 http://www.cnblogs.com/byrhuangqiang/p/6392532.html

2.一个队列最重要的就是消息丢失问题，kafka是如何处理的？

kafka producer都是异步发送。每次发送数据时，Producer都是send()之后就认为已经发送出去了，但其实大多数情况下消息还在内存的RecordAccumulator当中。这时候如果Producer挂掉，那就会出现丢数据的情况。或者发送失败，客服端不能明确知道消息是否成功。
解决办法： ack机制，一般设置为acks=1，消息只需要被Leader接受并确认即可，这样同时保证了可靠性和效率。对于我们想明确指定消息是否成功，请在producer.send 实现其callback方法，callback方法是在明确server返回后的回调机制。请用单独日志文件打印来跟踪丢失的消息。

3.producer是怎么更新Metadata信息的？

Metadata指topic的partition的leader，follower信息，及broker节点信息。producer在每次发送时候都会检查Metadata的有效性。有周期性，及失效后更新。更多关注http://www.cnblogs.com/byrhuangqiang/p/6377961.html
由于Metadata支持更新，所以动态水平的扩展broker是没有问题的。

4. kafka consumer的rebalance机制

rebalance机制的意思，kafka怎么样根据group的实例数量来动态的分配topic的partition数量，来提高并发效率。每次poll方法的时候都会与broker保持心跳，更新server信息及consumer的信息。
举一个例子描述consumer rebalance机制：


    1. 我通过测试例子来观察，2个partition的topic。只有一个consomer a instance的时候，两个partition都会由a实例轮询消费。
    2. 再添加第2个消费 b instance后，partition会立即自动重新分配，由a,b应用各自负责一个partition消费。
    3. 再关掉b 实例后，partition又会立即自动重新分配，由a 实例轮询消费partition。

5. Consumer消费者的工作过程

在consumer启动时或者coordinator节点故障转移时，consumer发送ConsumerMetadataRequest（请求topic的partition等信息）给任意一个brokers。在ConsumerMetadataResponse中，它接收对应的Consumer Group所属的Coordinator的位置信息。
Consumer连接Coordinator节点，并发送HeartbeatRequest。如果返回的HeartbeatResponse中返回IllegalGeneration错误码，说明协调节点已经在初始化平衡。消费者就会停止抓取数据，提交offsets，发送JoinGroupRequest给协调节点。在JoinGroupResponse，它接收消费者应该拥有的topic-partitions列表以及当前Consumer Group的新的generation编号。这个时候Consumer Group管理已经完成，Consumer就可以开始fetch数据，并为它拥有的partitions提交offsets。
如果HeartbeatResponse没有错误返回，Consumer会从它上次拥有的partitions列表继续抓取数据，这个过程是不会被中断的。
这些通讯步骤都在poll函数中异步完成。参考http://www.cnblogs.com/byrhuangqiang/p/6372600.html

kafka 0.10 client使用例子

一.前言

二.topic,group规范

三.思考

四.10.0后新client

1.生产者

2.生产者有序性

3.消费者

4.手动确认消息

五.8.0 旧client

1.生产者

2.消费者

六 升级兼容使用

七.问题反馈

1.producer采用什么发送模式。

2.一个队列最重要的就是消息丢失问题，kafka是如何处理的？

3.producer是怎么更新Metadata信息的？

4. kafka consumer的rebalance机制

5. Consumer消费者的工作过程

猜你喜欢

六升级兼容使用