Kafka basic organization, Springboot simple integration

definition:

  • Kafka is a distributed publish/subscribe default message queue
  • It is an open source distributed event streaming platform that is commonly used for data pipelines, flow analysis, data integration, and mission-critical applications.

Consumption pattern:

  • Point-to-point mode (less used)
    consumers actively pull data and clear messages after they are received.
    Insert image description here
  • Publish/subscribe model:
    producers push messages to queues, and consumers subscribe to the messages they need.
    Insert image description here

basic concept:

  • Producer: message producer
  • Consumer: consumer
  • Consumer: Group consumer group, consumers with the same consumer group ID are a consumer group; a consumer also consumes for a consumer group
  • Broker: kafka server
  • Topic: message topic, data classification
  • Partition: Partition, a Tpoic consists of multiple partitions
  • Replica: Replica, each partition corresponds to multiple replicas
  • Leader: The copy contains leader and follower; production and consumption are only for leader

Producer sending process:

  • producer -> send(producerRecord) -> interceprots interceptor -> Serializer serializer -> Partitioner partitioner
  • When the data is accumulated batch.size, the sender will send the data; the default is 16k
  • If the data does not reach batch.size, the sender will wait for linger.msthe set time and then send the data. Unit ms. Default value 0ms, indicating no delay
  • compression.typeData compression method
  • RecordAccumulatorBuffer size, default 32m
  • answer modeack
    • 0: After the producer sends data, there is no need to wait for data response.
    • 1: The data sent by the producer, the leader responds after receiving the data
    • 1: all leader responds after collecting data from all other nodesInsert image description here

General consumption logic:

  • Insert image description here

Consumer Group (CG):

  • groupidIdentical consumers form a consumer group
  • Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by one consumer in a group.
  • Consumer groups have no influence on each other
  • When the number of consumer groups is greater than the number of partitions, there will be闲置
  • coordinator: Assists in realizing 初始化the sum of consumer groups分区的分配
    • Each node has one coordinator. By groupid % 50selecting coordinatornode 50, the number of partitions is _consumer_offset.
    • 1%50 = 1, _consumer_offsetthe number on the partition coordinatoris the leader
    • coordinatorRandomly select a consumer in the consumer group to become the leader. The leader will formulate a consumption plan and return it to the consumer group coordinator, and then coordinatorallocate the consumption technology to other consumers.
    • coordinator心跳The retention time with the consumer 3秒, 45秒 超时- will remove the consumer and trigger再平衡
    • The consumer consumption time is too long. By default 5分钟- the consumer trigger will be removed.再平衡

Consumption process:

  • Create a consumer network connection client ConsumerNetworkClientto interact with kafka
  • Consumption request initialization: each batch 最小抓取大小, the data does not reach the timeout time of 500ms, and the upper limit of the captured data size
  • Send consumption request-》onSuccess() callback, pull data-》Put it into the message queue in batches
  • Consumers consume data from the message queue in each batch (500 items) -》Deserialization-》Interceptor-》Processing data
  • 1

Consumption plan (partition allocation strategy) default Range + CooperativeSticky:

  • Range: For 每一个topicsorting topic partitions and message consumers, determine how many partitions each message consumer consumes through the number of partitions/number of consumers, excluding the inexhaustible previous consumers who consume more.容易产生数据倾斜
    Insert image description here
  • RoundRobin: Polling partitioning strategy, 针对所有topiclists all topic partitions and consumers, sorts them according to hashcode, and 轮询算法allocates partitions to consumers
  • Sticky: Sticky (when performing new allocation, try to be as close as possible to the last allocation result), first try to be as even as possible, and randomly allocate partitions to consumers
  • CooperativeSticky: Collaborator stickiness, Sticky’s strategy is the same, but supports cooperative rebalancing. Consumers can continue to consume from partitions that have not been reallocated.

offset displacement: marks the consumption position

  • <0.9: It is maintained in zookeeper
  • After 0.9: offsets are maintained in a built-in topic: _consumer_offsets
  • Use key-value method to store data, key: groupid + topic + partition number
  • offset 自动提交: By default, offset is automatically submitted every 5 seconds, 默认which is true
  • offset 手动提交: when consuming, manually submit the offset
    • Synchronization: wait for the offset to be submitted successfully before consuming the next one
    • Asynchronous: no waiting, direct consumption, no retry mechanism after failure
  • Specify offset consumption:
    • earliest: Automatically reset the offset to the earliest offset --from-beginning
    • latest(Default): Automatically recharge the offset to the latest offset
    • nono: Throws an exception to the consumer if the previous offset of the consumer group is not found.
//设置自动提交offset
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);
//自动提交时间 5s
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,"5000");
//offset 手动提交
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);

KafkaConsumer kafkaConsumer = new KafkaConsumer<String,String>(properties);
//定义主题
ArrayList<String> topics = new ArrayList<>();
topics.add("first");
//订阅
kafkaConsumer.subscribe(topics);
while (true){
    
    
    ConsumerRecords<String,String> consumerRecords = kafkaConsumer.poll(Duration.ofSeconds(1));
    if (CollectionUtil.isNotEmpty(consumerRecords)){
    
    
        for (ConsumerRecord<String, String> record : consumerRecords) {
    
    
            System.out.println(record);
        }
    }
    //手动提交offset
    kafkaConsumer.commitAsync();
}

Consumption at specified time:

 //查询对应分区
 Set<TopicPartition> partitions = kafkaConsumer.assignment();

  //保证分区分配方案定制完毕
  while (partitions.size()==0){
    
    
      kafkaConsumer.poll(Duration.ofSeconds(1));
      partitions=kafkaConsumer.assignment();
  }
  //把时间转换成对应的 offset
  Map<TopicPartition,Long> map = new HashMap<>(6);
  Map<TopicPartition,Long> offsetmap = kafkaConsumer.offsetsForTimes(map);
  for (TopicPartition topicPartition : partitions) {
    
    
      //一天前
      offsetmap.put(topicPartition,System.currentTimeMillis() - 1 * 24 * 3600 * 1000);
  }
  Map<TopicPartition, OffsetAndTimestamp> offsetsForTimeMap = kafkaConsumer.offsetsForTimes(offsetmap);
  for (TopicPartition partition : partitions) {
    
    
      OffsetAndTimestamp timestamp = offsetsForTimeMap.get(partition);
      kafkaConsumer.seek(partition,timestamp.offset());
  }

kafka file storage mechanism:

  • TopicIt is a logical concept and partitiona physical concept, and each partitioncorresponds to one log文件. The log file stores the data produced by the producer.
  • The data produced by the Producer will be continuously appended to the end of the log file. In order to prevent the log file from being too large and causing low data positioning efficiency, Kafka adopts a sharding and indexing mechanism.
  • Each partition is divided into multiple segments segment, and each segment contains .index .log .timeindex .snapshot files
  • These files are located in a folder, and the folder naming rule is: topic name + partition number first-0
    1
    2
  • Sparse index: Approximately every 4kb of data written to the log file, an index will be written to the index file.
    • The odffset saved in the index file is 相对offset, this can ensure that the space occupied by the offset value will not be too large, so the offset value can be controlled to a fixed size

File cleaning and compression strategies:

  • Kafka’s default log storage time is 7 days
  • Compression strategy: compact, corresponding to the value of the same key, only the latest version is retained.

Kafka efficient reading and writing:

  • Kafka itself is a distributed cluster, which can use partitioning technology and has a high degree of parallelism.
  • Used to read data 稀疏索引, you can quickly locate the data to be consumed
  • Write to the disk sequentially. Kafka's producer produces data that needs to be written log文件. The writing process is appended to the end of the file.顺序写
  • 零拷贝: Kaka’s data processing operations are handled by Kaka producers and Kaka consumers. The Kaka Broker application layer does not care about the stored data, so there is no need to go through the application layer and the transmission efficiency is high.
  • Page Cache: Kaka relies heavily on the PageCache function provided by the underlying operating system. When there is a write operation in the upper layer, the operating system just writes the data to PageCache. When a read operation occurs, it is first searched from PageCache. If it cannot be found, it is read from the disk. In fact, PageCache uses as much free memory as possible as a disk cache.

Commonly used script names:

  • topic related commands :
  • Query topic list:sh kafka-topics.sh --bootstrap-server localhost:9092 --list
  • Create a topic (name: first partition: 1 replica and 3 replicas). The number of replicas cannot exceed the number of clusters.
    • sh kafka-topics.sh --bootstrap-server localhost:9092 --topic first --create --partitions 1 --replication-factor 3
  • topic information
    • sh kafka-topics.sh --bootstrap-server localhost:9092 --topic first --describe
  • Modify the number of topic partitions (can only be increased)
    • sh kafka-topics.sh --bootstrap-server localhost:9092 --topic first --describe --partitions 3
  • Production news:
    • sh kafka-console-producer.sh --bootstrap-server localhost:9092 --topic first
  • Consumption Consumption:
    • sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic first
    • sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic first --from-beginning

Spring boot simple integration:

<dependency>
      <groupId>org.springframework.kafka</groupId>
      <artifactId>spring-kafka</artifactId>
</dependency>
server:
  port: 8200

spring:
  mvc:
    pathmatch:
      matching-strategy: ant_path_matcher
  application:
    name: @artifactId@
  kafka:
    bootstrap-servers:
      - 192.168.1.250:32010
    # 生产配置
    producer:
      #序列化方式
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      properties:
        linger.ms: 10 #sender 等待事件
         #ssl认证配置相关
#        sasl.mechanism: PLAIN
#        security.protocol: SASL_PLAINTEXT
#        sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="admin" password="admin";
      #缓存区大小 32m
      buffer-memory: 33554432
      #批次大小 16k
      batch-size: 16
      # ISR 全部应答
      #acks: -1
      #事务ID前缀 ,配合 @Transactional ,保证多个消息的原子性
      #transaction-id-prefix: "transaction-id-xx"
    #消费配置
    consumer:
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      #group-id: xiaoshu-1
      enable-auto-commit: false
      # 从最早消息开始消费,但是消费后,会记录offset、相同 group-id不会再次消费 
      # offset 是针对每个消费者组
      auto-offset-reset: earliest
      #批量消费,每次最多消费多少条
      #max-poll-records: 50
       #ssl认证配置相关
#      properties:
#        sasl.mechanism: PLAIN
#        security.protocol: SASL_PLAINTEXT
#        sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="admin" password="admin";

    listener:
      # 手动调用Acknowledgment.acknowledge()后立即提交
      ack-mode: manual
      #批量消费,配合 @KafkaListener - batch="true"
      #type: batch

Production:

	@Resource
    private KafkaTemplate<String,String> kafkaTemplate;
    //@Transactional(rollbackFor = RuntimeException.class),配合 ack配置 实现多条消息发送,原子性
    @ApiOperation(value = "推送消息到kafak")
    @GetMapping("/sendMsg")
    public String sendMsg(String topic,String msg){
    
    
        kafkaTemplate.send(topic,msg).addCallback(success -> {
    
    
            if (success==null){
    
    
                System.out.println("消息发送失败");
                return;
            }
            // 消息发送到的topic
            String topicName = success.getRecordMetadata().topic();
            // 消息发送到的分区
            int partition = success.getRecordMetadata().partition();
            // 消息在分区内的offset
            long offset = success.getRecordMetadata().offset();
            System.out.println("发送消息成功:" + topic + "-" + partition + "-" + offset);
        }, failure -> {
    
    
            System.out.println("发送消息失败:" + failure.getMessage());
        });
        return "ok";
    }

Consumption:

@Configuration
public class KafkaConsumer {
    
    

    private static final String TOPIC_DLT=".DLT";

    @Autowired
    private KafkaTemplate<String, Object> kafkaTemplate;

    /**
     * 每个分区由消费者组种得一个消费者消费,每个消费者独立
     * 分区 -》 消费 、2分区2个消费监听
     * @param record
     * @param consumer
     */
    @KafkaListener(groupId = "group-1", topicPartitions ={
    
     @TopicPartition(topic = "four",partitions = {
    
    "0"})},batch = "false")
    public void consumerTopic1(ConsumerRecord<String, String> record, Consumer consumer){
    
    
        String value = record.value();
        String topic1 = record.topic();
        long offset = record.offset();
        int partition = record.partition();
        try {
    
    
            log.info("收到消息:"+value+"topic:"+topic1+"offset:"+offset+"分区"+partition);
            //TODO 异常,推送到 对应死信 ↓
            //int i=1/0;
        } catch (Exception e) {
    
    
            System.out.println("commit failed");
            kafkaTemplate.send(topic1+TOPIC_DLT,value);
        } finally {
    
    
            consumer.commitAsync();
        }
    }

    @KafkaListener(groupId = "group-1", topicPartitions ={
    
     @TopicPartition(topic = "four",partitions = {
    
    "1"})},batch = "false")
    public void consumerTopic2(ConsumerRecord<String, String> record, Consumer consumer){
    
    
        String value = record.value();
        String topic1 = record.topic();
        long offset = record.offset();
        int partition = record.partition();
        try {
    
    
            log.info("收到消息:"+value+"topic:"+topic1+"offset:"+offset+"分区"+partition);
            //TODO 异常,推送到 对应死信 ↓
            //int i=1/0;
        } catch (Exception e) {
    
    
            System.out.println("commit failed");
            kafkaTemplate.send(topic1+TOPIC_DLT,value);
        } finally {
    
    
            consumer.commitAsync();
        }
    }

}

	/**
     * 监听 topic1 ->转发到 topic2
     */
    @KafkaListener(topics = {
    
    "topic1"},groupId = "group-4")
    @SendTo("topic2")
    public String onMessage7(ConsumerRecord<?, ?> record) {
    
    
        return record.value()+"-转发消息";
    }

    @KafkaListener(topics = {
    
    "topic2"},groupId = "group-5")
    public void onMessage8(ConsumerRecord<?, ?> record) {
    
    
        System.out.println("收到转发消息"+record.value());
    }

Supongo que te gusta

Origin blog.csdn.net/hesqlplus730/article/details/129614091
Recomendado
Clasificación