前言

上一篇博客简单记录了kafka的安装步骤，并没有对kafka中各个组件的概念介绍多少，从这篇博客开始先熟悉kafka的相关api操作，之后会进而在kafka的基础上总结一个小实例。本篇博客会先介绍kafka的各个组件，然后简单总结一下admin和producer的api相关实例。

kafka的各个组件

其实网上查了很多资料，总感觉都没有特别好的总结，最后还是跑去翻译官网去了。官网中针对各个概念的介绍如下——kafka官网关于kafka各个组件的介绍。这里为了避免误导，也直接提出官网的原文，直接对照

event

An event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here's an example event:

Event key: "Alice"
Event value: "Made a payment of $200 to Bob"
Event timestamp: "Jun. 25, 2020 at 2:06 p.m."

如果简单点理解，其实Event就是一个消息的封装，其中包含key，value，timestamp和可选的元数据头。

producer和consumer

Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events. In Kafka, producers and consumers are fully decoupled and agnostic of each other, which is a key design element to achieve the high scalability that Kafka is known for. For example, producers never need to wait for consumers. Kafka provides various guarantees such as the ability to process events exactly-once.

这个就是生产者和消费者，接触过消息中间件的，这个概念都不会太陌生。kafka的高性能和高可用一定程度上就是因为消费者和生产者的解耦式设计（额，貌似这个也没啥特别的啊），同时kafka提供了多种能力使得其能精确的处理每一次事件（消息）

topics

Events are organized and durably stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder. An example topic name could be "payments". Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events. Events in a topic can be read as often as needed—unlike traditional messaging systems, events are not deleted after consumption. Instead, you define for how long Kafka should retain your events through a per-topic configuration setting, after which old events will be discarded. Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

这个概念可能稍微复杂点。但官网也依旧解释的很通俗了。topics就是永久存储event的地方topic类似于文件系统中的文件夹，而event就是类似文件夹中的文件。在kafka中一个topic经常会被多个消费者和生产者订阅。不同于其他消息中间件，kafka可以做到消息的随时按需读取，消息在被消费之后并不会被kafka删除。但是，kafka通过配置消息在kafka中存储时间的阈值，来实现kafka自动丢弃相关消息。但是需要知道的是，topic是一个虚拟的概念，在真正的kafka服务器中，并没有实际存储的topic

partitions

Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic's partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.

topics会被分成多个区，即为partition。partition会分布于kafka不同broker上的"桶"中。为了数据的可扩展性，这种分区存储数据的方式非常有用，因为这样客户端多个应用程序可以在同一时间从不同的broker中读取不同partition的数据。如果不同的消息，key值相同，则这些消息会被存储到同一个partition中，同时kafka确保任何的消费者消费partition中的消息的时候，都是按照消息的投递顺序进行消费。

官网中为了解释清楚topic和partition的关系，还给出了如下图片来进行解释

在这里插入图片描述

关于kafka的设计更详细的文档。

kafka整体提供了5个api，这些api基本涵盖了操作kafka的各个方面

名称	作用
Admin API	管理并检测topics，brokers和kafka的其他对象
Producer API	发送event或者stream到一个或多个topics
Consumer API	订阅并处理kafka中的消息和数据流
Kafka Stream API	大数据中流处理相关的操作内容
Kafka Connect API	数据导入导出到数据库会用到这个

准备工作

简单学习kafka的api，直接引入如下依赖即可

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>cn.hutool</groupId>
        <artifactId>hutool-all</artifactId>
        <version>4.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-streams</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.18.12</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>28.2-jre</version>
    </dependency>
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.68</version>
    </dependency>
</dependencies>

admin的api

admin客户端的建立

public final static String LOCAL_KAFKA_ADDRESS = "127.0.0.1:9092";
/**
 * 获取adminClient
 * @return
 */
public static AdminClient adminClient(){
    Properties properties = new Properties();
    //只需要配置连接属性——bootstrap.servers 即可
    properties.setProperty(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,LOCAL_KAFKA_ADDRESS);

    AdminClient adminClient = AdminClient.create(properties);
    return adminClient;
}

admin操作topic

创建topic

在创建topic的时候，可通过第二个参数指定该topic的副本个数

public static void createTopic(){
    
    
    AdminClient adminClient = AdminSimple.adminClient();
    //副本个数
    Short rs = 1;
    NewTopic newTopic = new NewTopic(TOPIC_NAME,1,rs);
    CreateTopicsResult topicsResult = adminClient.createTopics(Arrays.asList(newTopic));//返回topic的集合
    log.info("创建topic之后的集合为:{}",topicsResult);
}

查看topic列表

需要说明的是，kafka中的操作，更多的采用的是future的接口模式，如果想要立即获取结果，则需要调用get阻塞式读取。

/**
 * 查看topic列表
 */
public static void getTopicList() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    ListTopicsResult listTopicsResult = adminClient.listTopics();
    Set<String> topicNames = listTopicsResult.names().get();
    log.info("topic names:{}",topicNames);

    //查看Internal的topic（内部使用的topic）
    ListTopicsOptions options = new ListTopicsOptions();
    options.listInternal(true);
    ListTopicsResult optionsTopicResult = adminClient.listTopics(options);
    Collection<TopicListing> internalTopicList = optionsTopicResult.listings().get();
    log.info("internal Topic info:{}",internalTopicList);
}

删除topic

/**
 * 删除topics
 */
public static void deleteTopics() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    DeleteTopicsResult deleteTopicsResult = adminClient.deleteTopics(Arrays.asList(TOPIC_NAME));
}

查看topic描述信息

/**
 * 查看topic的描述信息
 */
public static void getTopicDescribeInfo() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(Arrays.asList(TOPIC_NAME));
    Map<String, TopicDescription> stringTopicDescriptionMap = describeTopicsResult.all().get();
    log.info("topic 的描述信息:{}",stringTopicDescriptionMap);
}

查看topic配置信息

public static void getTopicConfigInfo() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    ConfigResource configResource = new ConfigResource(ConfigResource.Type.TOPIC,TOPIC_NAME);
    DescribeConfigsResult describeConfigsResult = adminClient.describeConfigs(Arrays.asList(configResource));
    Map<ConfigResource, Config> configResourceConfigMap = describeConfigsResult.all().get();
    log.info("topic 的配置信息:{}",configResourceConfigMap);
}

修改topic配置信息

public static void modifyTopicConfigInfo() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    Map<ConfigResource,Config> configMaps = new HashMap<>();
    ConfigResource configResource = new ConfigResource(ConfigResource.Type.TOPIC,TOPIC_NAME);
    Config config = new Config(Arrays.asList(new ConfigEntry("preallocate","true")));
    configMaps.put(configResource,config);
    AlterConfigsResult alterConfigsResult = adminClient.alterConfigs(configMaps);
    alterConfigsResult.all().get();
}

增加topic中的partition

需要知道的是topic中的partition是不能删除的，除非直接暴力删除kafka的存储文件。

/**
 * 增加partition,partition只能增加
 */
public static void incrPartitioins() throws ExecutionException, InterruptedException {
    
    
    AdminClient adminClient = adminClient();
    Map<String,NewPartitions> partitionsMap = new HashMap<>();
    NewPartitions newPartitions = NewPartitions.increaseTo(2);
    partitionsMap.put(TOPIC_NAME,newPartitions);
    CreatePartitionsResult createPartitionsResult = adminClient.createPartitions(partitionsMap);
    createPartitionsResult.all().get();
}

producer的api

消息的发送

与操作admin的api类似，这里也是通过Properties对象将配置注入进去。

异步发送

/**
 * kafka异步发送实例
 */
public static void producerAsyncSend(){
    
    
    Properties properties = new Properties();

    properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,LOCAL_KAFKA_ADDRESS);
    properties.put(ProducerConfig.ACKS_CONFIG,"all");//producer发出消息的确认模式
    properties.put(ProducerConfig.RETRIES_CONFIG,"0");
    properties.put(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
    properties.put(ProducerConfig.LINGER_MS_CONFIG,"1");
    properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");

 	//指定key值得序列化方式
    properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
 //指定value的序列化方式  
    properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");

    //producer的主体
    Producer<String,String> producer = new KafkaProducer<>(properties);

    for(int i = 0;i<10;i++){
    
    
        ProducerRecord<String,String> msgRecord = new ProducerRecord<>(TOPIC_NAME,"msgKey-"+i,"msgValue-"+i);
        producer.send(msgRecord);
    }
    //关闭所有的通道
    producer.close();

}

这里发送了10条消息到kafka，producer端发送完成之后，就没有继续操作了。

异步阻塞发送

只是相比异步发送消息，这里多了一步get发送消息回执的操作。

/**
 * Producer 异步阻塞发送
 */
public static void producerSyncSend() throws ExecutionException, InterruptedException {
    
    
    Properties properties = new Properties();

    properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,LOCAL_KAFKA_ADDRESS);
    properties.put(ProducerConfig.ACKS_CONFIG,"all");//producer发出消息的确认模式
    properties.put(ProducerConfig.RETRIES_CONFIG,"0");
    properties.put(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
    properties.put(ProducerConfig.LINGER_MS_CONFIG,"1");
    properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");

    properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
    properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");

    //producer的主体
    Producer<String,String> producer = new KafkaProducer<>(properties);

    for(int i = 0;i<10;i++){
    
    
        String keyStr = "msgKey-"+i;
        ProducerRecord<String,String> msgRecord = new ProducerRecord<>(TOPIC_NAME,"msgKey-"+i,"msgValue-"+i);
        producer.send(msgRecord);

        Future<RecordMetadata> send = producer.send(msgRecord);
        RecordMetadata recordMetadata = send.get();//这里会阻塞
        System.out.println(keyStr + "partition : "+recordMetadata.partition()+" , offset : "+recordMetadata.offset());
    }
    //关闭所有的通道
    producer.close();
}

异步回调发送

/**
 * 带有回调函数的发送
 */
public static void producerSendWithCallBack(){
    
    
    Properties properties = new Properties();
    properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,LOCAL_KAFKA_ADDRESS);
    properties.put(ProducerConfig.ACKS_CONFIG,"all");
    properties.put(ProducerConfig.RETRIES_CONFIG,"0");
    properties.put(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
    properties.put(ProducerConfig.LINGER_MS_CONFIG,"1");
    properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");

    properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
    properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");

    // Producer的主对象
    Producer<String,String> producer = new KafkaProducer<>(properties);

    // 消息对象 - ProducerRecoder
    for(int i=0;i<10;i++){
    
    
        ProducerRecord<String,String> record =
                new ProducerRecord<>(TOPIC_NAME,"key-"+i,"value-"+i);
		//这里指定了回调函数，消息发送完成的操作
        producer.send(record, new Callback() {
    
    
            @Override
            public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                System.out.println(
                        "partition : "+recordMetadata.partition()+" , offset : "+recordMetadata.offset());
            }
        });
    }

    // 所有的通道打开都需要关闭
    producer.close();
}

partition自定义分区器

通过自定义分区器，可以实现，根据自定义的业务规则将消息分发到不同的partition中。

自定义分区器

/**
 * autor:liman
 * createtime:2021/3/7
 * comment: 简单根据key是否能被2整除进行区分，partition方法返回的int就是，最终消息被分发到哪一个partition的值 
 */
@Slf4j
public class SelfPartition implements Partitioner {
    
    

    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
    
    
        String keyStr = key+"";
        String keyInt = keyStr.substring(4);
        log.info("keyStr:{},keyInt:{}",keyStr,keyInt);

        int i = Integer.parseInt(keyInt);

        return i%2;
    }

    @Override
    public void close() {
    
    

    }

    @Override
    public void configure(Map<String, ?> configs) {
    
    

    }
}

利用自定义的分区器进行消息投递

/**
 *  Producer 自定义 partition 负载均衡
 */
public static void producerSendWithCallBackAndPartition(){
    
    
    Properties properties = new Properties();
    properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,LOCAL_KAFKA_ADDRESS);
    properties.put(ProducerConfig.ACKS_CONFIG,"all");
    properties.put(ProducerConfig.RETRIES_CONFIG,"0");
    properties.put(ProducerConfig.BATCH_SIZE_CONFIG,"16384");
    properties.put(ProducerConfig.LINGER_MS_CONFIG,"1");
    properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG,"33554432");

    properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
    properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
 //指定我们自定义的消息分区器
    properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG,"com.learn.kafka.client.producer.SelfPartition");

    // Producer的主对象
    Producer<String,String> producer = new KafkaProducer<>(properties);

    // 消息对象 - ProducerRecoder
    for(int i=0;i<10;i++){
    
    
        ProducerRecord<String,String> record =
                new ProducerRecord<>(TOPIC_NAME,"key-"+i,"value-"+i);

        producer.send(record, new Callback() {
    
    
            @Override
            public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                System.out.println(
                        "partition : "+recordMetadata.partition()+" , offset : "+recordMetadata.offset());
            }
        });
    }

    // 所有的通道打开都需要关闭
    producer.close();
}

producer的一些简单细节

1、消息分区器是在KafkaProducer初始化时加载的

2、KafkaProducer对象是线程安全的

3、KafkaProducer并不是一条一条的发送消息，而是批量的发送消息

除了上述三个细节之外，最重要的还要数Producer的消息投递确认机制，这个体现在acks参数的配置上，对应代码中的ProducerConfig.ACKS_CONFIG

acks参数值	作用
0	producer不会收到来自kafka发送消息的任何回执信息。这种情况下无法保证服务器正常的收到消息，producer也仅仅只是将消息发送到缓冲区便是完成任务
1	kafka上的leader收到producer发送的消息之后，在将记录写入都本地的日志中之后，就立即返回给producer确认信息。但是并不会等到所有的leader的从broker确认收到消息。
all(等同于-1)	这种应该是最高级别的，producer会收到所有的replicas的消息确认，才将详细标记为成功发送。

总结

简单的api操作，篇使用，没什么好总结的

kafka学习（二）——admin与producer的基本实例

文章目录

前言