[Kafka of big data] 2. Introduction to Kafka

1 Installation and deployment

1.1 Cluster Planning

insert image description here

1.2 Cluster deployment

Official download address: http://kafka.apache.org/downloads.html
(1) Unzip the installation package:

tar -zxvf kafka_2.12-3.0.0.tgz -C /opt/module/

(2) Modify the decompressed file name:

mv kafka_2.12-3.0.0/ kafka

(3) Go to the /opt/module/kafka/config directory and modify the configuration file server.properties: modify
:
  parameter value broker.id=0 (broker’s globally unique number, which cannot be repeated, can only be a number);
  kafka operation log (Data) storage path log.dirs=/opt/module/kafka/datas;
  configuration connection Zookeeper cluster address zookeeper.connect=hadoop102:2181, hadoop103:2181, hadoop104:2181/kafka.

#broker 的全局唯一编号,不能重复,只能是数字。
broker.id=0
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘 IO 的线程数量 
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小 
socket.request.max.bytes=104857600
#kafka 运行日志(数据)存放的路径,路径不需要提前创建,kafka 自动帮你创建,可以配置多个磁盘路径,路径与路径之间可以用","分隔 
log.dirs=/opt/module/kafka/datas
#topic 在当前 broker 上的分区个数
num.partitions=1
#用来恢复和清理 data 下数据的线程数量 
num.recovery.threads.per.data.dir=1
# 每个 topic 创建时的副本数,默认时1 个副本 
offsets.topic.replication.factor=1
#segment 文件保留的最长时间,超时将被删除 
log.retention.hours=168
#每个 segment 文件的大小,默认最大 1G 
log.segment.bytes=1073741824
# 检查过期数据的时间,默认 5 分钟检查一次是否数据过期
log.retention.check.interval.ms=300000
#配置连接Zookeeper 集群地址(在 zk 根目录下创建/kafka,方便管理) 
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka

(4) Distribution of installation packages

xsync kafka/

(5) Modify broker.id=1 and broker.id=2 in the configuration file /opt/module/kafka/config/server.properties on hadoop103 and hadoop104 respectively. (broker.id must not be repeated, unique in the entire cluster.)

(6) Add the kafka environment variable configuration in the /etc/profile.d/my_env.sh file:

sudo vim /etc/profile.d/my_env.sh

#添加:
#KAFKA_HOME
export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin

(7) Refresh the environment variables, distribute the environment variables to other nodes, and source them.

source /etc/profile
sudo xsync /etc/profile.d/my_env.sh

(8) When starting the cluster, you need to start the zookeeper cluster first and then start Kafka. When closing the cluster, you must first ensure that Kafka is closed before closing zookeeper.

1.3 Cluster start and stop script

(1) Create the file kf.sh script file in the /home/username/bin directory:

#!/bin/bash
case $1 in
"start"){
    
    
    for i in hadoop102 hadoop103 hadoop104
    do
        echo " --------启动 $i Kafka--------"
        ssh $i "/opt/module/kafka/bin/kafka-server-start.sh - daemon /opt/module/kafka/config/server.properties"
    done
};;
"stop"){
    
    
    for i in hadoop102 hadoop103 hadoop104
    do
        echo " --------停止 $i Kafka--------"
        ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh "
    done
};;
esac

(2) Add execution permission:

chmod 777 kf.sh

(3) Start the cluster:

zk.sh start
kf.sh start

Note: When stopping the Kafka cluster, be sure to wait for all Kafka node processes to stop before stopping the Zookeeper cluster. Because the Zookeeper cluster records the relevant information of the Kafka cluster, once the Zookeeper cluster is stopped first, the Kafka cluster has no way to obtain the information of the stopped process, and can only manually kill the Kafka process.
insert image description here

2 Kafka command line operation

2.1 Topic Command Line Operations

#查看操作主题命令参数
bin/kafka-topics.sh

insert image description here

#查看当前服务器中所有topic
bin/kafka-topics.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --list

#创建first topic
bin/kafka-topics.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --create --partitions 1 --replication-factor 3 --topic first
#选项说明:
#--topic 定义 topic 名
#--replication-factor 定义副本数
#--partitions 定义分区数

#查看first主题详情
bin/kafka-topics.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --describe --topic first

#修改分区数,分区数只能增不能减
bin/kafka-topics.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --alter --topic first --partitions 3

#删除topic
bin/kafka-topics.sh --bootsrtap-server hadoop102:9092,hadoop103:9093 --delete --topic first

2.2 Producer command line operation

#查看操作生产者命令参数
bin/kafka-console-producer.sh

insert image description here

#发送消息
bin/kafka-console-producer.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --topic first
>hello

2.3 Consumer command line operation

#查看操作消费者命令参数
bin/kafka-console-consumer.sh

insert image description here

#消费first主题中的数据
bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --topic first

#把主题中所有的数据读取出来(包括历史数据)
bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092,hadoop103:9092 --from-beginning --topic first

Guess you like

Origin blog.csdn.net/qq_18625571/article/details/132048566