Kafka message queue installation and use (1)

1. Kafka overview

1.1 Definition

Kafka is a distributed Message Queue based on the publish/subscribe model, which is mainly used in the real-time processing of big data.

1.2 Message queue

1.2.1 Application scenarios of traditional message queues

Insert picture description here
The advantages of using message queues:
1) Decoupling
allows you to extend or modify the processing on both sides independently, as long as you ensure that they comply with the same interface constraints.
2) Recoverability When
a part of the system fails, it will not affect the entire system. The message queue reduces the coupling between processes, so even if a message processing process hangs up, the messages added to the queue can still be processed after the system is restored.
3) Buffer
helps to control and optimize the speed of data flow through the system, and solve the inconsistency of the processing speed of production messages and consumption messages.
4) Flexibility & peak processing capacity
In the case of a surge in traffic, applications still need to continue to function, but such burst traffic is not common. It is undoubtedly a huge waste to invest resources at any time in order to be able to handle such peak visits. The use of message queues enables key components to withstand the sudden access pressure, and will not completely collapse due to sudden overloaded requests.
5) Asynchronous communication In
many cases, users do not want and do not need to process messages immediately. The message queue provides an asynchronous processing mechanism that allows users to put a message into the queue, but does not process it immediately. Put as many messages as you want in the queue, and then process them when needed.

1.2.2 Two modes of message queue

(1) Point-to-point mode (one-to-one, the consumer actively pulls data, and the message is cleared after the message is received) The
message producer produces the message and sends it to the Queue, and then the message consumer takes it out of the Queue and consumes the message. After the message is consumed, there is no more storage in the queue, so it is impossible for the message consumer to consume the message that has already been consumed. Queue supports the existence of multiple consumers, but for a message, only one consumer can consume.
Insert picture description here
(2) Publish/subscribe model (one-to-many, messages will not be cleared after consumers consume data)
message producers (publish) publish messages to topics, and multiple message consumers (subscribe) consume the messages at the same time. Unlike the peer-to-peer method, the messages published to the topic will be consumed by all subscribers.
Insert picture description here

1.3 Kafka infrastructure

Insert picture description here
1) Producer: The message producer, which is the client that sends messages to the Kafka broker;
2) Consumer: The message consumer, the client that gets messages from the Kafka broker;
3) Consumer Group (CG): The consumer group consists of multiple Consumer composition. Each consumer in a consumer group is responsible for consuming data in different partitions. A partition can only be consumed by one consumer in a group; consumer groups do not affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber.
4) Broker: A kafka server is a broker. A cluster is composed of multiple brokers. A broker
can hold multiple topics.
5) Topic: It can be understood as a queue, and both producers and consumers are facing a topic; 6) Partition: In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers), one topic It can be divided into multiple partitions, each partition is an ordered queue;
7) Replica: Replica: To ensure that when a node in the cluster fails, the partition data on the node is not lost, and Kafka can still continue to work , Kafka provides a copy mechanism, each partition of a topic has several copies, a leader and several followers.
8) Leader: The "master" of multiple copies of each partition, the objects that producers send data, and the objects that consumers consume data are all leaders.
9) Follower: The "slave" in multiple copies of each partition synchronizes data from the leader in real time and keeps the data synchronized with the leader. When the leader fails, a follower will become the new leader.

2. Kafka quick start

2.1 Installation and deployment

2.1.1 Cluster Planning

Insert picture description here

2.1.2 jar package download

Download link
Insert picture description here

2.1.3 Cluster deployment

1) Unzip the installation package

[atguigu@hadoop102 software]$ tar -zxvf kafka_2.11-0.11.0.0.tgz -C /opt/module/

2) Modify the file name after decompression

[atguigu@hadoop102 module]$ mv kafka_2.11-0.11.0.0/ kafka

3) Create a logs folder under the /opt/module/kafka directory

[atguigu@hadoop102 kafka]$ mkdir logs

4) Modify the configuration file

[atguigu@hadoop102 kafka]$ cd config/
[atguigu@hadoop102 config]$ vi server.properties

Enter the following:

#broker 的全局唯一编号,不能重复
broker.id=0
#删除 topic 功能使能
delete.topic.enable=true
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘 IO 的现成数量
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小
socket.request.max.bytes=104857600
#kafka 运行日志存放的路径
log.dirs=/opt/module/kafka/logs
#topic 在当前 broker 上的分区个数
num.partitions=1
#用来恢复和清理 data 下数据的线程数量
num.recovery.threads.per.data.dir=1
#segment 文件保留的最长时间,超时将被删除
log.retention.hours=168
#配置连接 Zookeeper 集群地址
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181

5) Configure environment variables

[atguigu@hadoop102 module]$ sudo vi /etc/profile
#KAFKA_HOME
export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin
[atguigu@hadoop102 module]$ source /etc/profile

6) Distribute the installation package

[atguigu@hadoop102 module]$ xsync kafka/

Note: Remember to configure the environment variables of other machines after distribution

7) Modify broker.id=1 and broker.id=2 in the configuration file /opt/module/kafka/config/server.properties on hadoop103 and hadoop104 respectively
8) Start the cluster

[atguigu@hadoop102 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties
[atguigu@hadoop103 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties
[atguigu@hadoop104 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties

9) Shut down the cluster

[atguigu@hadoop102 kafka]$ bin/kafka-server-stop.sh stop
[atguigu@hadoop103 kafka]$ bin/kafka-server-stop.sh stop
[atguigu@hadoop104 kafka]$ bin/kafka-server-stop.sh stop

10) Kafka group script

for i in hadoop102 hadoop103 hadoop104
do
echo "========== $i ==========" 
ssh $i '/opt/module/kafka/bin/kafka-server-start.sh -daemon 
/opt/module/kafka/config/server.properties'
done

2.2 Kafka command line operation

1) View all topics in the current server

[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --list

2) Create topic

[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --create --replication-factor 3 --partitions 1 --topic first

Option description:
-topic defines the topic name
-replication-factor defines the number of replicas
-partitions defines the number of partitions

3) Delete topic

[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --delete --topic first

It is necessary to set delete.topic.enable=true in server.properties, otherwise it is only marked for deletion.

4) Send a message

[atguigu@hadoop102 kafka]$ bin/kafka-console-producer.sh --broker-list hadoop102:9092 --topic first
>hello world
>atguigu atguigu

5) Consumer news

[atguigu@hadoop102 kafka]$ bin/kafka-console-consumer.sh \ --zookeeper hadoop102:2181 --topic first

[atguigu@hadoop102 kafka]$ bin/kafka-console-consumer.sh \
--bootstrap-server hadoop102:9092 --topic first

[atguigu@hadoop102 kafka]$ bin/kafka-console-consumer.sh \
--bootstrap-server hadoop102:9092 --from-beginning --topic first

--From-beginning: All previous data in the subject will be read out.

6) View the details of a topic

[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --describe --topic first

7) Modify the number of partitions

[atguigu@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --alter --topic first --partitions 6

Guess you like

Origin blog.csdn.net/weixin_44726976/article/details/109195120