[Kafka] Chapter 1

Insert image description here

1. Course content

======> Course Learning
Insert image description here

Please add image description

2. Introduction to Kafka

Traditional definition of Kafka: Kafka is a distributed message queue (Message Queue) based on the publish/subscribe model, which is mainly used in the field of real-time processing of big data.

======> What is distributed

To put it simply, distribution means splitting a big problem into multiple small problems, solving them one by one, and finally working together. The software system that works together to complete a specific task and supports distributed processing is what we call a system where multiple systems work together to complete a specific task. distributed system.

Publish/Subscribe: The publisher of a message will not send the message directly to specific subscribers, but will divide the published messages into different categories, and the subscribers will only receive the messages of interest.


The latest definition of Kafka: Kafka is an open source distributed event streaming platform (Event StreamingPlatform) used by thousands of companies for high-performance data pipelines, stream analysis, data integration and mission-critical applications.

2.1 Common message queues

At present, the more common message queue products in enterprises mainly include Kafka、ActiveMQ 、RabbitMQ 、RocketMQ .
In big data scenarios, it is mainly used Kafkaas a message queue. Mainly used in JavaEE development ActiveMQ、RabbitMQ、RocketMQ.

The main application scenarios of traditional message queues include: caching/peak elimination, decoupling and asynchronous communication.

2.2 Application scenarios of message queue

2.2.1 Decoupling

Decoupling: Allows you to extend or modify the processes on both sides independently, as long as they comply with the same interface constraints.
Insert image description here

2.2.2 Asynchronous communication

Insert image description here

Supplement: How to enable asynchronous use in java

2.3 Two modes of message queue

  • Peer-to-peer mode
  • publish/subscribe model

2.3.1 Peer-to-peer mode

Insert image description here

2.3.2 Publish/subscribe model

Insert image description here

2.4 Kafka infrastructure

Insert image description here
(1) Producer: The message producer is the client that sends messages to the Kafka broker.
(2) Consumer: Message consumer, the client that obtains messages from the Kafka broker.
(3) Consumer Group (CG): Consumer group consists of multiple consumers. Each consumer in the consumer group is responsible for consuming data from different partitions. A partition can only be consumed by consumers in one group; consumer groups do not affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber.
(4) Broker: A Kafka server is a broker. A cluster consists of multiple brokers. A broker can accommodate multiple topics.
(5)Topic: It can be understood as a queue, and both producers and consumers are oriented to the same topic.
(6) Partition: In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers). A topic can be divided into multiple partitions, and each partition is an ordered queue.
(7) Replica: copy. Each partition of a topic has several copies, a Leader and several Followers.
(8) Leader: The "master" of multiple copies of each partition, the object to which producers send data, and the object to which consumers consume data are all leaders.
(9) Follower: The "slave" among multiple copies of each partition synchronizes data from the Leader in real time and maintains synchronization with the Leader data. When the Leader fails, a Follower will become the new Leader.

3.Kafka Quick Start

======> Kafka download

Insert image description here

Unzip the kafka tgz package to /opt/module
Insert image description here
and then change the name.
Insert image description here
View it from the config directoryserver.propertiesInsert image description here

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Distribute kafka
Insert image description here
configuration
Insert image description here
Insert image description here
distribution
Insert image description here

#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]
then
    echo Not Enough Arguement!
    exit;
fi

#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
    echo ====================  $host  ====================
    #3. 遍历所有目录,挨个发送

    for file in $@
    do
        #4. 判断文件是否存在
        if [ -e $file ]
            then
                #5. 获取父目录
                pdir=$(cd -P $(dirname $file); pwd)

                #6. 获取当前文件的名称
                fname=$(basename $file)
                ssh $host "mkdir -p $pdir"
                rsync -av $pdir/$fname $host:$pdir
            else
                echo $file does not exists!
        fi
    done
done

Start kafka (provided Zookeeper has been started)
Insert image description here

Kafka start and stop scripts
Insert image description here
Insert image description here
Insert image description here

3.1 Theme command line operation

1. View the operation topic parameter command
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh
Insert image description here
2. View all topics in the current server
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop102:9092 --list
3. Create the first topic
[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server hadoop102:9092 --create --partitions 1 --replication-factor 3 --topic first
option description:
–partitions defines the number of partitions
–replication-factor defines the number of replicas
–topic defines the topic name
Insert image description here
Insert image description here

3.2 Create a producer to send data

topicSend data to
Insert image description here
创建消费者
Insert image description here
消费者接收到数据
Insert image description here

消费者端开启历史数据
Insert image description here

4. Kafka producer

======> Kafka producer
Insert image description here

List of important parameters for producers

Insert image description here
Insert image description here
Insert image description here

5. Asynchronous sending

5.1 Asynchronous sending API

// ALL

appendix

1. Kafka setup
2. Kafka Chinese official documentation

Guess you like

Origin blog.csdn.net/Blue_Pepsi_Cola/article/details/131499486