Distributed topic|If you want to enter the big factory, you have to know kafka

Distributed topic|If you want to enter the big factory, you have to know kafka

Introduction

What is kafka?

Kafka was originally developed by Linkedin. It is a distributed, partition-supported, multi-replica (replica), and a distributed messaging system coordinated by zookeeper. Its biggest feature is that it can process large amounts of data in real time. Various demand scenarios: For example, batch processing system based on hadoop, low latency real-time system, Storm/Spark streaming engine, web/nginx log, access log, message service, etc., written in scala language, contributed by Linkedin in 2010 Gave to the Apache Foundation and become a top open source project

Where can kafka be used?

  • Log collection: A company can use Kafka to collect logs of various services, and open it to various consumers, such as hadoop, Hbase, Solr, etc., in a unified interface service through Kafka.

  • Message system: decoupling and producer and consumer, caching messages, etc.

  • User activity tracking: Kafka is often used to record various activities of web users or app users, such as browsing the web, searching, clicking and other activities. These activity information is published by various servers to Kafka topics, and then subscribers subscribe to these topics To do real-time monitoring and analysis, or load it into hadoop or data warehouse for offline analysis and mining.

  • Operational indicators: Kafka is also often used to record operational monitoring data. This includes collecting data from various distributed applications and producing centralized feedback for various operations, such as alarms and reports.

Kafka basic components

  • Broker

    Message middleware processing node, a Kafka node is a broker, one or more brokers can form a Kafka cluster

  • Topic

    Kafka classifies messages according to topics, and each message published to the Kafka cluster needs to specify a topic

  • Producer

    Message producer, the client that sends messages to Broker

  • Consumer

    Message consumer, client that reads messages from Broker

  • ConsumerGroup

    Each Consumer belongs to a specific Consumer Group, a message can be consumed by multiple different Consumer Groups, but only one Consumer in a Consumer Group can consume the message

  • Partition

    Physically, a topic can be divided into multiple partitions. The internal messages of each partition are ordered, and each partition can support the allocation of multiple copies. Among the brokers where multiple copies are located, a leader will be elected. The leader partition is responsible for processing read and write requests, and synchronizes the data to the copies stored in other followers.

Kafka difficult to understand

  • The message will not be deleted after consumption

    After the message is consumed, it will still be stored in the partition. When will it be deleted? It is determined by the configuration parameter log.retention.hours . If it is set to 10, the message will be deleted after 10 hours

  • The relationship between topic, partition, and Broker

    A topic represents a business data set. For example, if you need to process order data, you can create a topic for order messages, and all order messages will be sent to this topic;

    If there are more and more order messages, then this topic will become larger and larger, and it may reach TB. It must not be stored on a single machine. We can partition this topic, and these partitions will be scattered On different machines, dividing multiple partitions is also to improve the concurrent consumption of messages, because as mentioned earlier, a partition can only be consumed by one consumer in each consumer group. If it is split into multiple partitions, then Can be consumed by multiple consumers at the same time;

    The broker is the easiest to understand: the machine running the kafka process is a broker;

  • How does Kafka support the two modes of traditional messaging: queue and subscription

    • Queue mode: All consumers are in the same consumer group, ensuring that the message will only be consumed by one consumer

    • Publish\subscribe model: put consumers in different consumer groups, so that every consumer can receive the same message

    These two modes are based on Kafka's consumption mechanism: the message sent by the producer will be sent to all consumer groups (consumer grop) subscribed to the topic, but only one consumer in each consumer group can consume this Messages.
    Insert picture description here

  • How does Kafka ensure the sequential consumption of messages

    Kafka guarantees that messages in a partition can only be consumed by one consumer in the consumer group, so the producer must send the message to the same partition to ensure the sequential consumption of messages;

How to install kafka on docker

The premise of installing kafka is that you have to install zookeeper

  • Install zookeeper
# 创建文件夹
mkdir -p ~/docker/zookeeper/conf
mkdir -p ~/docker/zookeeper/data
mkdir -p ~/docker/zookeeper/datalog
docker run -d --name zookeeper \
--restart always \                        # docker服务启动时, 默认启动zookeeper容器
-p 2181:2181 -p 2888:2888 -p 3888:3888 \
-v ~/docker/zookeeper/conf:/conf \
-v ~/docker/zookeeper/data:/data \
-v ~/docker/zookeeper/datalog:/datalog \
zookeeper:3.4.14
  • Afka kafka
 docker run  -d --name kafka -p 9092:9092 -e KAFKA_BROKER_ID=0 -v ~/docker/kafka/logs:/kafka -e KAFKA_ZOOKEEPER_CONNECT=zookeeper --link zookeeper:zookeeper -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${IP}:9092 -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 -t wurstmeister/kafka
  • Use kafka's built-in console producer and consumer for testing

# 开启生产者
docker exec -it kafka bash
# 创建主题
kafka-topics.sh --create -zookeeper zookeeper --topic lezai --partitions 3 -replication-factor 1
# 生产者连接kafka
kafka-console-producer.sh --topic lezai -bootstrap-server 127.0.0.1:9092

# 开启消费者
docker exec -it kafka bash
# 消费者连接kafka
kafka-console-consumer.sh --topic lezai -bootstrap-server 127.0.0.1:9092 --from-beginning

# 现在在生产者窗口输入内容,看看消费者窗口是否能收到
  • Kafka command line general operation
1.查看topic的详细信息 
./kafka-topics.sh -bootstrap-server 127.0.0.1:2181 -describe -topic testKJ1 
  
2、为topic增加副本 
./kafka-reassign-partitions.sh -zookeeper 127.0.0.1:2181 -reassignment-json-file json/partitions-to-move.json -execute 
  
3、创建topic 
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testKJ1 
  
4、为topic增加partition 
./bin/kafka-topics.sh –zookeeper 127.0.0.1:2181 –alter –partitions 20 –topic testKJ1 
  
5、kafka生产者客户端命令 
./kafka-console-producer.sh --broker-list localhost:9092 --topic testKJ1 
  
6、kafka消费者客户端命令 
./kafka-console-consumer.sh -zookeeper localhost:2181 --from-beginning --topic testKJ1 
  
7、kafka服务启动 
./kafka-server-start.sh -daemon ../config/server.properties  
  
8、下线broker 
./kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper 127.0.0.1:2181 --broker #brokerId# --num.retries 3 --retry.interval.ms 60 
shutdown broker 
  
9、删除topic 
./kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic testKJ1 --zookeeper 127.0.0.1:2181 
./kafka-topics.sh --zookeeper localhost:2181 --delete --topic testKJ1 
  
10、查看consumer组内消费的offset 
./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper localhost:2181 --group test --topic testKJ1

springboot integrated kafka

  • Dependent import
 <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
 </dependency>
  • Configuration file
server:
  port: 8080
spring:
  kafka:
    bootstrap-servers: 127.0.0.1:9092
    producer: # 生产者
      retries: 3 # 设置大于0的值,则客户端会将发送失败的记录重新发送
      batch-size: 16384
      buffer-memory: 33554432
      # 指定消息key和消息体的编解码方式
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
    consumer:
      group-id: mygroup
      enable-auto-commit: true
  • Add a message producer
    @Autowired
    private KafkaTemplate<String,String> kafkaTemplate;
    @Test
    public  void send(){
        kafkaTemplate.send("lezai",0,"key","kafkasss 发送数据");
        try {
            Thread.sleep(1000000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
  • Add two consumers, there are different consumer groups
    @KafkaListener(topics = "lezai",groupId = "testGroup")
    public void listen(ConsumerRecord<String, String> record) {
        String value = record.value();
        System.out.println("testGroup"+value);
        System.out.println(record);
    }


    @KafkaListener(topics = "lezai",groupId = "testGroup2")
    public void listen2(ConsumerRecord<String, String> record) {
        String value = record.value();
        System.out.println("testGroup2"+value);
        System.out.println(record);
    }
    
    // 可以切换为相同的groupId,来验证消息是否会被同一个消费组中的消费者消费

Wechat search for a search [Le Zai open talk] Follow the handsome me, reply [Receive dry goods], there will be a lot of interview materials and architect must-read books waiting for you to choose, including java basics, java concurrency, microservices, middleware, etc. More information is waiting for you.

Guess you like

Origin blog.csdn.net/weixin_34311210/article/details/110504827