kafka简介

Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
简单来说，就是一个分布式的数据流分发平台。

组件

consumer 消费者
producer 生成者
topic 数据话题
broker 存储节点

示例

启动broker

官网下载kafka 进入bin目录，因kafka依赖zookeeper作为分布式协同，需要先启动zookeeper，kafka包中，已经有zookeeper。以mac为例，进入kafka目录下

sh bin/zookeeper-server-start.sh config/zookeeper.properties 启动zookeeper
sh bin/kafka-server-start.sh config/server.properties

生产者

Properties properties = new Properties();
        properties.put("bootstrap.servers", "127.0.0.1:9092");
        properties.put("client.id","DemoProducer");
        properties.put("acks","0");
        properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        KafkaProducer<String, String> producer = null;
        Person person= null;
        try {
    
    
            producer = new KafkaProducer<String, String>(properties);
            for (int i = 0; i < 100; i++) {
    
    
               
                producer.send(new ProducerRecord<String, String>("Message",null, i+""));
            }
        } catch (Exception e) {
    
    
            e.printStackTrace();

        } finally {
    
    
            producer.close();
        }

消费者

Properties properties = new Properties();
        properties.put("bootstrap.servers", "127.0.0.1:9092");
        properties.put("enable.auto.commit", "true");
        properties.put("auto.commit.interval.ms", "1000");
        properties.put("session.timeout.ms", "30000");
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        properties.put("group.id", "DemoProducer");


        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
        kafkaConsumer.subscribe(Arrays.asList("Message"));


        while (true) {
    
    
            ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
            for (ConsumerRecord<String, String> record : records) {
    
    
                //TimeUnit.MILLISECONDS.sleep(100);
                System.out.printf("offset = %d, partitions = %s ,value = %s ", record.offset(),record.partition(), record.value());
                System.out.println();
               // kafkaConsumer.commitSync();//手动提交
            }
        }

配置说明以及常见问题

生成者

client.id 发出请求时传递给服务器的ID字符串
acks 消息持久化方式

消费者

enable.auto.commit  是否自动提交 提交后不重复消费
auto.commit.interval.ms 自动提交间隔周期
session.timeout.ms 心跳
group.id 群组

1.一个生产者多个消费者，怎样均衡消费？
默认情况下一个topic只有一个partitions，同一个群组下的一个消费者只能消费一个partitions，所以默认情况下上面的两个消费者同时启动也只有一个消费者能够消费到数据。
解决方案：修改partitions kafka/bin下有提供工具

sh kafka-topics.sh --alter --zookeeper 127.0.0.1:2181 --topic Message --partitions 4
修改以后查看Topic信息
sh kafka-topics.sh --describe --zookeeper 127.0.0.1:2181 --topic Message

    Topic: Message	PartitionCount:4	ReplicationFactor:1	Configs:
	Topic: Message	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 1	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 2	Leader: 0	Replicas: 0	Isr: 0
	Topic: Message	Partition: 3	Leader: 0	Replicas: 0	Isr: 0

2.改过partitions，发现数据还是只在Partition：0 上？
kafka数据分片的规则是如果生产者指定key 那么就会获取key的hash值与PartitionCount 取余数就是Partition的位置，【注意生产者代码new ProducerRecord<String, String>(“Message”,null, i+"")】
如果key为null，分片规则：寻找上一次存储数据的Partition，如果没有则直接存在Partition0，如果存在就存在下一个Partition，均匀存储。

Kafka简单使用示例

kafka简介

组件

示例

启动broker

生产者

消费者

配置说明以及常见问题

生成者

消费者

更多配置

猜你喜欢