Kafka简单使用示例

kafka简介

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
简单来说,就是一个分布式的数据流分发平台。

组件

  • consumer 消费者
  • producer 生成者
  • topic 数据话题
  • broker 存储节点

示例

启动broker

官网下载kafka 进入bin目录,因kafka依赖zookeeper作为分布式协同,需要先启动zookeeper,kafka包中,已经有zookeeper。以mac为例,进入kafka目录下

  1. sh bin/zookeeper-server-start.sh config/zookeeper.properties 启动zookeeper
  2. sh bin/kafka-server-start.sh config/server.properties

生产者

Properties properties = new Properties();
        properties.put("bootstrap.servers", "127.0.0.1:9092");
        properties.put("client.id","DemoProducer");
        properties.put("acks","0");
        properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        KafkaProducer<String, String> producer = null;
        Person person= null;
        try {
    
    
            producer = new KafkaProducer<String, String>(properties);
            for (int i = 0; i < 100; i++) {
    
    
               
                producer.send(new ProducerRecord<String, String>("Message",null, i+""));
            }
        } catch (Exception e) {
    
    
            e.printStackTrace();

        } finally {
    
    
            producer.close();
        }

消费者

Properties properties = new Properties();
        properties.put("bootstrap.servers", "127.0.0.1:9092");
        properties.put("enable.auto.commit", "true");
        properties.put("auto.commit.interval.ms", "1000");
        properties.put("session.timeout.ms", "30000");
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("group.id", "DemoProducer");


        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
        kafkaConsumer.subscribe(Arrays.asList("Message"));


        while (true) {
    
    
            ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
            for (ConsumerRecord<String, String> record : records) {
    
    
                //TimeUnit.MILLISECONDS.sleep(100);
                System.out.printf("offset = %d, partitions = %s ,value = %s ", record.offset(),record.partition(), record.value());
                System.out.println();
               // kafkaConsumer.commitSync();//手动提交
            }
        }

配置说明以及常见问题

生成者

client.id 发出请求时传递给服务器的ID字符串
acks 消息持久化方式

消费者

enable.auto.commit  是否自动提交 提交后不重复消费
auto.commit.interval.ms 自动提交间隔周期
session.timeout.ms 心跳
group.id 群组

1.一个生产者多个消费者,怎样均衡消费?
默认情况下一个topic只有一个partitions,同一个群组下的一个消费者只能消费一个partitions,所以默认情况下上面的两个消费者同时启动也只有一个消费者能够消费到数据。
解决方案:修改partitions kafka/bin下有提供工具

sh kafka-topics.sh --alter --zookeeper 127.0.0.1:2181 --topic Message --partitions 4
修改以后查看Topic信息
sh kafka-topics.sh --describe --zookeeper 127.0.0.1:2181 --topic Message

    Topic: Message	PartitionCount:4	ReplicationFactor:1	Configs:
	Topic: Message	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 3	Leader: 0	Replicas: 0	Isr: 0

2.改过partitions,发现数据还是只在Partition:0 上?
kafka数据分片的规则是 如果生产者指定key 那么就会获取key的hash值 与PartitionCount 取余数 就是Partition的位置,【注意生产者代码new ProducerRecord<String, String>(“Message”,null, i+"")】
如果key为null,分片规则:寻找上一次存储数据的Partition,如果没有则直接存在Partition0,如果存在就存在下一个Partition,均匀存储。

更多配置

https://docs.confluent.io/current/installation/configuration/index.html

猜你喜欢

转载自blog.csdn.net/sinat_25926481/article/details/105269795