A, Zookeeper Cluster Setup
To ensure high availability cluster, the cluster nodes Zookeeper is preferably an odd number, a minimum of three nodes, so here to build a cluster of three nodes.
1.1 Download & unzip
Download the corresponding version Zookeeper, here I downloaded version 3.4.14
. Official Download: https://archive.apache.org/dist/zookeeper/
# 下载
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
# 解压
tar -zxvf zookeeper-3.4.14.tar.gz
1.2 modify the configuration
Three copies zookeeper installation package. Respectively, into the installation directory conf
directory, copy the configuration template zoo_sample.cfg
to zoo.cfg
modify it, and the three modified configuration file contents are as follows:
zookeeper01 configuration:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-cluster/data/01
dataLogDir=/usr/local/zookeeper-cluster/log/01
clientPort=2181
# server.1 这个1是服务器的标识,可以是任意有效数字,标识这是第几个服务器节点,这个标识要写到dataDir目录下面myid文件里
# 指名集群间通讯端口和选举端口
server.1=127.0.0.1:2287:3387
server.2=127.0.0.1:2288:3388
server.3=127.0.0.1:2289:3389
If multiple servers, each node in the cluster communication port and the port may be the same election, the host IP address changes each node where IP can be.
zookeeper02 configuration, compared with zookeeper01, only dataLogDir
and dataLogDir
different:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-cluster/data/02
dataLogDir=/usr/local/zookeeper-cluster/log/02
clientPort=2182
server.1=127.0.0.1:2287:3387
server.2=127.0.0.1:2288:3388
server.3=127.0.0.1:2289:3389
zookeeper03 configuration, compared with zookeeper01,02, only dataLogDir
and dataLogDir
different:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-cluster/data/03
dataLogDir=/usr/local/zookeeper-cluster/log/03
clientPort=2183
server.1=127.0.0.1:2287:3387
server.2=127.0.0.1:2288:3388
server.3=127.0.0.1:2289:3389
Configuration parameters:
- tickTime : a base time calculating unit. Session timeout example: N * tickTime;
- initLimit : a cluster, and allow connections from the node to the master node synchronizing connection initialization time, expressed in multiples tickTime;
- syncLimit : a cluster, the master node and sent from the Master node between the message, the request and response time length (heartbeat mechanism);
- dataDir : data storage location;
- dataLogDir : log directory;
- the clientPort : a port for connecting client, default 2181
1.3 identifies the node
New data are stored in the directory three nodes myid
file and write the corresponding node identifier. Zookeeper clusters by myid
the file identification cluster nodes and nodes to node communication via the communication ports and port configuration above elections elect a leader node.
Create a storage directory:
# dataDir
mkdir -vp /usr/local/zookeeper-cluster/data/01
# dataDir
mkdir -vp /usr/local/zookeeper-cluster/data/02
# dataDir
mkdir -vp /usr/local/zookeeper-cluster/data/03
Create and node identification is written to myid
the file:
#server1
echo "1" > /usr/local/zookeeper-cluster/data/01/myid
#server2
echo "2" > /usr/local/zookeeper-cluster/data/02/myid
#server3
echo "3" > /usr/local/zookeeper-cluster/data/03/myid
Start Cluster 1.4
It was started three nodes:
# 启动节点1
/usr/app/zookeeper-cluster/zookeeper01/bin/zkServer.sh start
# 启动节点2
/usr/app/zookeeper-cluster/zookeeper02/bin/zkServer.sh start
# 启动节点3
/usr/app/zookeeper-cluster/zookeeper03/bin/zkServer.sh start
1.5 Cluster Verification
Use jps view the process and use zkServer.sh status
to view each cluster node status. Three nodes are shown in process started successfully, and two nodes follower node, a node is a leader node.
Two, Kafka Cluster Setup
2.1 download, unzip
Kafka official installation package download address: http://kafka.apache.org/downloads , this use case download version 2.2.0
, download command:
# 下载
wget https://www-eu.apache.org/dist/kafka/2.2.0/kafka_2.12-2.2.0.tgz
# 解压
tar -xzf kafka_2.12-2.2.0.tgz
Here j explain the naming kafka installation package: with
kafka_2.12-2.2.0.tgz
, for example, 2.12 in front of representatives of Scala's version number (Kafka uses Scala language development), behind the 2.2.0 version number represents Kafka's.
2.2 copy of the configuration file
Enter decompression directory config
directory, copy the three profiles:
# cp server.properties server-1.properties
# cp server.properties server-2.properties
# cp server.properties server-3.properties
2.3 modify the configuration
Modifying portion are arranged three configuration file as follows:
server-1.properties:
# The id of the broker. 集群中每个节点的唯一标识
broker.id=0
# 监听地址
listeners=PLAINTEXT://hadoop001:9092
# 数据的存储位置
log.dirs=/usr/local/kafka-logs/00
# Zookeeper连接地址
zookeeper.connect=hadoop001:2181,hadoop001:2182,hadoop001:2183
server-2.properties:
broker.id=1
listeners=PLAINTEXT://hadoop001:9093
log.dirs=/usr/local/kafka-logs/01
zookeeper.connect=hadoop001:2181,hadoop001:2182,hadoop001:2183
server-3.properties:
broker.id=2
listeners=PLAINTEXT://hadoop001:9094
log.dirs=/usr/local/kafka-logs/02
zookeeper.connect=hadoop001:2181,hadoop001:2182,hadoop001:2183
It should be noted here that log.dirs
refers to the storage location of the log data, to be exact, it is the storage location, not the location of the partition running log data. Location running log is in the same directory by log4j.properties
configuring.
Start Cluster 2.4
Specify a different profile, starting three Kafka nodes. After you start using jps view the process, then there should be three and three zookeeper process kafka process.
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
bin/kafka-server-start.sh config/server-3.properties
2.5 Create a test topic
Create a test Topic:
bin/kafka-topics.sh --create --bootstrap-server hadoop001:9092 \
--replication-factor 3 \
--partitions 1 --topic my-replicated-topic
After you create can use the following command to view the information created topics:
bin/kafka-topics.sh --describe --bootstrap-server hadoop001:9092 --topic my-replicated-topic
0 partitions you can see there are three copies of 0,1,2, and three copies of copies are available, both in the ISR (in-sync Replica synchronized copy) list, where 1 is the leader of a copy, this time on behalf of the cluster has been build success.
More big data series can be found GitHub open source project : Big Data Getting Started