Kafka cluster construction and must know

Kafka cluster construction and must know

Kafka cluster deployment and start

In this article, I will demonstrate how to build a Kafka cluster, and then briefly introduce some basic knowledge points about Kafka cluster. However, this article only introduces the cluster, and does not explain the basic concepts of Kafka too much. It is assumed that the reader has a certain basic knowledge of Kafka.

First, we need to understand some mechanisms of Kafka cluster:

  • Kafka naturally supports clusters, even one node is actually a cluster mode
  • The Kafka cluster relies on Zookeeper for coordination, and a lot of data in the early Kafka version is stored in Zookeeper
  • As long as Kafka nodes are registered on the same Zookeeper, it means that they are in the same cluster.
  • Kafka uses brokerId to distinguish different nodes in the cluster

The cluster topology of Kafka is as follows:

Kafka cluster construction and must know

Several roles in Kafka cluster:

  • Broker: Generally refers to the deployment node of Kafka
  • Leader: Used to process requests for message reception and consumption, which means that the producer pushes the message to the leader, and the consumer polls the message from the leader.
  • Follower: Mainly used to backup message data, a leader will have multiple followers

In this example, in order to get closer to the actual deployment situation, four virtual machines are used for demonstration:

Machine IP effect Roles brokerId
192.168.99.1 Deploy Kafka node broker server 0
192.168.99.2 Deploy Kafka node broker server 1
192.168.99.3 Deploy Kafka node broker server 2
192.168.99.4 Deploy Zookeeper nodes Cluster coordinator

Zookeeper installation

Kafka is based on Zookeeper to achieve distributed coordination, so you need to build a Zookeeper node before building a Kafka node. Both Zookeeper and Kafka depend on the JDK. I have already installed the JDK here:

[[email protected] ~]# java --version
java 11.0.5 2019-10-15 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.5+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)
[root@txy-server2 ~]#

After preparing the JDK environment, go to the download address of Zookeeper's official website and copy the download link:

Then use the wget command to download in Linux, as follows:

[[email protected] ~]# cd /usr/local/src
[[email protected] /usr/local/src]# wget https://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz

Decompress the downloaded compressed package, and move and rename the decompressed directory:

[[email protected] /usr/local/src]# tar -zxvf apache-zookeeper-3.6.1-bin.tar.gz
[[email protected] /usr/local/src]# mv apache-zookeeper-3.6.1-bin ../zookeeper

Go to the Zookeeper configuration file directory, copy the sample configuration file zoo_sample.cfg and name it zoo.cfg, which is the default configuration file name of Zookeeper:

[[email protected] /usr/local/src]# cd ../zookeeper/conf/
[[email protected] /usr/local/zookeeper/conf]# ls
configuration.xsl  log4j.properties  zoo_sample.cfg
[[email protected] /usr/local/zookeeper/conf]# cp zoo_sample.cfg zoo.cfg

Modify the dataDir configuration item in the configuration file to specify a directory with a larger disk space:

[[email protected] /usr/local/zookeeper/conf]# vim zoo.cfg
# 指定Zookeeper的数据存储目录,类比于MySQL的dataDir
dataDir=/data/zookeeper
[[email protected] /usr/local/zookeeper/conf]# mkdir -p /data/zookeeper

If you are just learning to use, this step can actually be ignored.
You can enter the bin directory with the default configuration and use the startup script to start Zookeeper, as shown in the following example:

[[email protected] /usr/local/zookeeper/conf]# cd ../bin/
[[email protected] /usr/local/zookeeper/bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[[email protected] /usr/local/zookeeper/bin]#

After the startup is complete, you can judge whether the startup is successful by checking whether the port number is monitored normally. The startup is successful as follows:

[[email protected] ~]# netstat -lntp |grep 2181
tcp6       0      0 :::2181       :::*         LISTEN      7825/java
[[email protected] ~]#

If your machine has a firewall turned on, you need to open the Zookeeper port, otherwise other nodes cannot be registered:

[[email protected] ~]# firewall-cmd --zone=public --add-port=2181/tcp --permanent
[[email protected] ~]# firwall-cmd --reload

Kafka installation

After installing Zookeeper, you can then install Kafka. For the same routine, first go to Kafka's official website to download the address and copy the download link:

Then use the wget command to download in Linux, as follows:

[[email protected] ~]# cd /usr/local/src
[[email protected] /usr/local/src]# wget https://mirror.bit.edu.cn/apache/kafka/2.5.0/kafka_2.13-2.5.0.tgz

Decompress the downloaded compressed package, and move and rename the decompressed directory:

[[email protected] /usr/local/src]# tar -xvf kafka_2.13-2.5.0.tgz
[[email protected] /usr/local/src]# mv kafka_2.13-2.5.0 ../kafka

Enter Kafka's configuration file directory and modify the configuration file:

[[email protected] /usr/local/src]# cd ../kafka/config/
[[email protected] /usr/local/kafka/config]# vim server.properties
# 指定该节点的brokerId,同一集群中的brokerId需要唯一
broker.id=0
# 指定监听的地址及端口号,该配置项是指定内网ip
listeners=PLAINTEXT://192.168.99.1:9092
# 如果需要开放外网访问,则在该配置项指定外网ip
advertised.listeners=PLAINTEXT://192.168.99.1:9092
# 指定kafka日志文件的存储目录
log.dirs=/usr/local/kafka/kafka-logs
# 指定zookeeper的连接地址,若有多个地址则用逗号分隔
zookeeper.connect=192.168.99.4:2181
[[email protected] /usr/local/kafka/config]# mkdir /usr/local/kafka/kafka-logs

After completing the modification of the configuration file, in order to facilitate the use of Kafka command scripts, we can configure Kafka's bin directory to environment variables:

[[email protected] ~]# vim /etc/profile
export KAFKA_HOME=/usr/local/kafka
export PATH=$PATH:$KAFKA_HOME/bin
[[email protected] ~]# source /etc/profile  # 让配置生效

So you can start Kafka with the following command:

[[email protected] ~]# kafka-server-start.sh /usr/local/kafka/config/server.properties &

After executing the above command, the startup log will be output to the console. You can judge whether the startup is successful through the log, or you can judge whether the startup is successful by checking whether port 9092 is monitored:

[[email protected] ~]# netstat -lntp |grep 9092
tcp6    0     0 192.168.99.1:9092     :::*      LISTEN     31943/java
[[email protected] ~]#

Similarly, if the firewall is turned on, you also need to open the corresponding port number:

[[email protected] ~]# firewall-cmd --zone=public --add-port=9092/tcp --permanent
[[email protected] ~]# firwall-cmd --reload

So far, we have completed the installation of the first Kafka node. The installation steps of the other two nodes are also the same. You only need to modify the brokerId and monitoring ip in the configuration file. So I directly copy the Kafka directory in the node to the other two machines:

[[email protected] ~]# rsync -av /usr/local/kafka 192.168.99.2:/usr/local/kafka
[[email protected] ~]# rsync -av /usr/local/kafka 192.168.99.3:/usr/local/kafka

Then modify the brokerId and listening ip of these two nodes:

[[email protected] /usr/local/kafka/config]# vim server.properties
# 修改brokerId
broker.id=1
# 指定监听的地址及端口号,该配置项是指定内网ip
listeners=PLAINTEXT://192.168.99.2:9092
# 如果需要开放外网访问,则在该配置项指定外网ip
advertised.listeners=PLAINTEXT://192.168.99.2:9092
[[email protected] /usr/local/kafka/config]# 
[[email protected] /usr/local/kafka/config]# vim server.properties
# 修改brokerId
broker.id=2
# 指定监听的地址及端口号,该配置项是指定内网ip
listeners=PLAINTEXT://192.168.99.3:9092
# 如果需要开放外网访问,则在该配置项指定外网ip
advertised.listeners=PLAINTEXT://192.168.99.3:9092
[[email protected] /usr/local/kafka/config]# 

After the configuration modification is completed, start the two nodes according to the steps described earlier. After the startup is successful, enter Zookeeper, and there is corresponding brokerId data under /brokers/ids to indicate that the cluster is successfully built:

[[email protected] ~]# /usr/local/zookeeper/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 4] ls /brokers/ids
[0, 1, 2]
[zk: localhost:2181(CONNECTED) 5]

Kafka Duplicate Collection

About Kafka's replica set:

  • Kafka replica set refers to copying multiple copies of the log. We know that Kafka data is stored in log files, which is equivalent to data backup and redundancy
  • Kafka can set the default number of replica sets through configuration
  • Kafka can set a replica set for each topic, so the replica set is relative to the topic
  • A Topic's replica set can be distributed among multiple Brokers. When one Broker fails, there is still data on the other Brokers, which improves the reliability of the data, which is also the main function of the replica set.

We all know that Topic in Kafka is just a logical concept. The actual storage data is Partition, so it is Partition that is actually copied. As shown below:

Kafka cluster construction and must know

About the copy factor:

  • The copy factor actually determines the number of copies of a Partition. For example, a copy factor of 1, means that all Partitions in the Topic are copied according to the number of Brokers and distributed to each Broker

The copy allocation algorithm is as follows:

  • Sort all N Brokers and i Partitions to be allocated
  • Assign the i-th Partition to the (i mod n) Broker
  • Assign the j-th copy of the i-th Partition to the ((i + j) mod n) Broker

Kafka node failure reasons and treatment methods

Two situations of Kafka node (Broker) failure:

  • Kafka node and Zookeeper heartbeat is not maintained as a node failure
  • When the follower's message is too far behind the leader, it will be regarded as a node failure

Kafka's handling of node failures:

  • Kafka will remove the failed node, so there is basically no data loss due to node failure
  • Kafka's semantic guarantee also largely avoids data loss
  • Kafka balances messages within the cluster to reduce the overheating of messages on some nodes

Introduction to Kafka Leader election mechanism

Leader election of Kafka cluster:

  • If you have been in contact with some other distributed components, you will know that most of the components are elected by voting to elect a leader among many nodes, but in Kafka, voting is not used to elect the leader.
  • Kafka will dynamically maintain a set of copies of Leader data (ISR)
  • Kafka will choose a faster one in the ISR as the leader

Kafka cluster construction and must know

"It's hard for a clever woman to cook without rice": Kafka has a helpless situation, that is, all the copies in the ISR are down. In this case, Kafka will conduct unclean leader election by default. Kafka provides two different ways of processing:

  1. Wait for any Replica in the ISR to recover and select it as the leader

    • Long waiting time will reduce availability, or all Replicas in the ISR cannot be recovered or data is lost, then the Partition will never be available
  2. Select the first restored Replica as the new Leader, regardless of whether it is in the ISR
    • It does not contain all the messages that have been previously Commited by Leader, so it will cause data loss, but the availability is high

Leader election configuration recommendations:

  • Disable unclean leader election
  • Manually set the minimum ISR

For more details about ISR, please refer to:

Original link: https://www.jianshu.com/p/cc0b90636715 Author: Duanwan Chit-Chat

Guess you like

Origin blog.51cto.com/mageedu/2547655