Kafka overview and its installation and deployment

Overview

Preliminary knowledge (understand the message queue)

Concept
Message queue, English name: Message Queue, often abbreviated as MQ. Literally understood, a message queue is a queue used to store messages.

Message queuing middleware
Message queuing middleware is software (component) used to store messages

For example: in
order to analyze the user behavior of the website, we need to record the user's access log.
These pieces of logs can be regarded as pieces of messages, and we can save them in the message queue.
In the future, some applications need to process these logs, and these messages can be taken out for processing at any time.

Application scenarios of message queues

  1. Asynchronous processing
    Insert picture description here

In the e-commerce website, when a new user registers, the user's information needs to be saved in the database, and at the same time, an additional email notification of registration and a SMS registration code need to be sent to the user.
However, because sending emails and sending registration short messages need to connect to an external server, you need to wait an extra period of time. At this time, you can use the message queue for asynchronous processing to achieve fast response.

  1. System decoupling
    Insert picture description here
  2. Traffic peaking
    Insert picture description here
  3. Log processing (common in the field of big data)
    Insert picture description here
    large e-commerce websites (Taobao, JD, Gome, Suning...), App (Tik Tok, Meituan, Didi, etc.) need to analyze user behavior, and find users based on their visit behavior The preferences and activity of, you need to collect a lot of user access information on the page.

Interaction model

  1. Request response model
  • http request response model
    Insert picture description here
  • Database request response model
    Insert picture description here
  1. Producer, consumer model
    Insert picture description here

Note:

  1. Kafka is based on this interaction model
  2. So when using Kafka you need to know which is the producer and which is the consumer

Two models of message queues

  1. Point-to-point model
    Insert picture description here

The message sender produces the message and sends it to the message queue, and then the message receiver takes it out of the message queue and consumes the message.
After the message is consumed, there is no more storage in the message queue, so it is impossible for the message receiver to consume the message that has already been consumed.

Features:

  • One-to-one: Each message has only one receiver (Consumer) (that is, once it is consumed, the message is no longer in the message queue)
  • No dependency: There is no dependency between the sender and the receiver. After the sender sends a message, whether there is a receiver running or not, it will not affect the sender's next message sent;
  • Receiving feedback: the receiver needs to reply to the queue successfully after successfully receiving the message, so that the message queue can delete the currently received message;
  1. Publish and subscribe model

Insert picture description here
Features:

  • One-to-many: each message can have multiple subscribers;
  • Dependent: There is a time dependency between the publisher and the subscriber. For a subscriber of a topic (Topic), it must create a subscriber before it can consume the publisher's messages.
  • Subscribe in advance: In order to consume messages, subscribers need to subscribe to the role topic in advance and keep it running online;

Basic overview of Kafka

Concept
Kafka is a distributed message queuing system based on subscription publishing model with high throughput , high performance , high concurrency , high reliability, etc.

  1. High throughput: Distributed disk storage is used instead of HDFS
  2. High performance: read and write data in real time
  • Distributed memory is used, and PageCache is used: the page caching mechanism of the operating system
  • It belongs to the memory of the operating system level. Even if the Kafka process fails and Kafka is restarted, the data is still in the memory.
  1. High concurrency: distributed parallel read and write
  2. High reliability: distributed master-slave architecture
  3. High security:
  • Memory: Log record operation
  • Disk: copy mechanism

Application scenarios

  • In the real-time architecture of big data, it is used for temporary storage of real-time data

Insert picture description here

In the picture above, we can see:

  • Producers: There can be many applications to put message data into the Kafka cluster.
  • Consumers: There can be many applications that pull message data from the Kafka cluster.
  • Connectors (connectors): Kafka connectors can import data from the database to Kafka, and can also export data from Kafka to
  • Stream Processors: Stream processors can pull data from Kafka or write data to Kafka.

Installation and deployment

  1. Upload the Kafka installation package to the virtual machine and decompress it
cd /export/soft/
tar -zxvf kafka_2.12-2.4.1.tgz -C ../server/
  1. Modify server.properties
cd /export/server/kafka_2.12-2.4.1/config
vim server.properties

添加以下内容:
# 指定broker的id
broker.id=0

# 指定Kafka数据的位置
log.dirs=/export/server/kafka_2.12-2.4.1/data

# 配置zk的三个节点
zookeeper.connect=node1:2181,node2:2181,node3:2181
  1. Copy the installed kafka to the other two servers
cd /export/server
scp -r kafka_2.12-2.4.1/ node2:$PWD
scp -r kafka_2.12-2.4.1/ node3:$PWD

修改另外两个节点的broker.id分别为1和2:
---------node2--------------
cd /export/server/kafka_2.12-2.4.1/config
vim server.properties
broker.id=1

--------node3--------------
cd /export/server/kafka_2.12-2.4.1/config
vim server.properties
broker.id=2

  1. Configure KAFKA_HOME environment variable
vi /etc/profile
export KAFKA_HOME=/export/server/kafka_2.12-2.4.1
export PATH=:$PATH:${KAFKA_HOME}

分发到各个节点
scp /etc/profile node2:$PWD
scp /etc/profile node3:$PWD
每个节点加载环境变量
source /etc/profile
  1. Start the server
# 启动ZooKeeper
/export/server/zookeeper-3.4.6/bin/start-zk-all.sh

# 启动Kafka
cd /export/server/kafka_2.12-2.4.1
nohup bin/kafka-server-start.sh config/server.properties &

# 测试Kafka集群是否启动成功
bin/kafka-topics.sh --bootstrap-server node1:9092 --list

Insert picture description here
No error is reported, indicating that the Kafka cluster started successfully

Configure Kafka one-click startup/shutdown script

  1. Enter the execution script directory
cd /export/server/kafka_2.12-2.4.1/bin/
vim slave
  1. Write a slave configuration file to save kafka on which nodes to start
node1
node2
node3
  1. Write start-kafka.sh script
vim start-kafka.sh

添加以下内容:
cat /export/server/kafka_2.12-2.4.1/bin/slave | while read line
do
{
    
    
 echo $line
 ssh $line "source /etc/profile;export JMX_PORT=9988;nohup ${KAFKA_HOME}/bin/kafka-server-start.sh ${KAFKA_HOME}/config/server.properties >/dev/nul* 2>&1 & "
}&
wait
done
  1. Write stop-kafka.sh script
vim stop-kafka.sh

添加以下内容:
cat /export/server/kafka_2.12-2.4.1/bin/slave | while read line
do
{
    
    
 echo $line
 ssh $line "source /etc/profile;jps |grep Kafka |cut -d' ' -f1 |xargs kill -s 9"
}&
wait
done
  1. Configure execution permissions for start-kafka.sh and stop-kafka.sh
chmod u+x start-kafka.sh
chmod u+x stop-kafka.sh
  1. Perform one-key startup and shutdown
./start-kafka.sh
./stop-kafka.sh

Kafka software directory structure

Insert picture description here

Basic operation

  1. Create topic
    All messages in Kafka are stored in topics. To produce messages to Kafka, you must first have a certain topic.
# 创建名为test的主题
bin/kafka-topics.sh \
--create \
--bootstrap-server node1:9092 \
--topic test

# 查看目前Kafka中的所有主题
bin/kafka-topics.sh \
--list \
--bootstrap-server node1:9092

View test topic details

bin/kafka-topics.sh \
--bootstrap-server node1:9092 \
--describe \
--topic test

Insert picture description here
The first line: a summary of all Partitions of the test theme

PartitionCount: Count the total number of Partitions in the topic. The figure shows that the test has a partition.
ReplicationFactor: Counts the number of partition copies in the topic. The figure shows that there is a copy of a partition in the test.
segment.bytes: Represents the size of a segment file

Starting from the second line:

Partition: partition number, 0 represents the first partition
Leader: represents the current partition responsible for reading and writing on which machine, 2 represents the third machine (starting from 0, 2 represents the third broker)
Replicas: represents the current On which machines are the partition replicas, 2 represents the third machine
Isr: the set of nodes where the replicas have been synchronized

Delete the test subject

bin/kafka-topics.sh \
--delete \
--topic test
  1. To produce messages to Kafka,
    use Kafka's built-in test program to produce some messages to Kafka's test topic:
bin/kafka-console-producer.sh \
--broker-list node1:9092 \
--topic test
  1. Consume messages from Kafka
    Copy the shell window of node1 and
    use the following command to consume messages in the test topic:
bin/kafka-console-consumer.sh \
--bootstrap-server node1:9092 \
--topic test \
--from-beginning

Sent:
Insert picture description here
Received:
Insert picture description here

Guess you like

Origin blog.csdn.net/zh2475855601/article/details/115215310