Kafka cluster management-ZooKeeper cluster construction & Kafka cluster construction, multi-cluster synchronization!


A cluster is a computer system that is connected through a group of loosely integrated computer software and/or hardware to perform calculations in a highly close collaboration. In a sense, they can be seen as a computer. A single computer in a cluster system is usually called a node, and is usually connected through a local area network, but there are other possible connection methods. Cluster computers are generally used to improve the calculation speed and/or reliability of a single computer. In general, cluster computers have a much higher cost-performance ratio than single computers, such as workstations or supercomputers.

Characteristics of the cluster

The cluster has the following two characteristics:

  1. Scalability : The performance of the cluster is not limited to a single service entity. New service entities can be dynamically added to the cluster to enhance the performance of the cluster.
  2. High availability : When one of the nodes in the cluster fails, the applications running on this node will be automatically taken over on the other node. Eliminating single points of failure is very important to enhance data availability, reachability and reliability. .

Cluster capabilities

  1. Load balancing : Load balancing distributes tasks evenly to computing and network resources in the cluster environment to improve data throughput.
  2. Error recovery : If a server in the cluster is unavailable due to failure or maintenance needs, resources and applications will be transferred to available cluster nodes. This process of being able to transparently take over and continue to complete the task because the resources of a node cannot work and the resources of another available node is called error recovery.

Load balancing and error recovery require the existence of resources that perform the same task in each service entity, and for each resource of the same task, the information view required to perform the task must be the same.

1. Cluster usage scenarios

KafkaIt is a distributed messaging system with high-level expansion and high throughput characteristics. In a Kafka cluster, there is no concept of a "central master node" , and all nodes in the cluster are equal.

Broker

Each Broker is a Kafka service instance, and multiple Brokers form a Kafka cluster. The messages published by the producers will be stored in the Broker, and consumers will pull the messages from the Broker for consumption.

Kafka cluster architecture diagram

It can be seen from the figure that Kafka strongly relies on ZooKeeper, and manages its own cluster through ZooKeeper, such as: Broker list management, Partition and Broker relationship, Partition and Consumer relationship, Producer and Consumer load balancing, consumption progress Offset recording , Consumer registration, etc., so in order to achieve high availability, ZooKeeper itself must also be a cluster.

Two, cluster construction

1.ZooKeeper cluster construction

Scenes

The real cluster needs to be deployed on different servers, but when we test, we can start more than a dozen virtual machines at the same time. The memory will be overwhelming, so here we build a pseudo cluster , which means that all services are built on one virtual machine. , Distinguish by port.

We here require to build a three-node Zookeeper cluster (pseudo cluster).

Install JDK

Cluster directory

Create a zookeeper-cluster directory and copy the decompressed Zookeeper to the following three directories

itcast@Server-node:/mnt/d/zookeeper-cluster$ ll 
total 0 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 ./ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-1/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-2/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-3/ 
itcast@Server-node:/mnt/d/zookeeper-cluster$

ClientPort settings

Configure the dataDir (zoo.cfg) clientPort of each Zookeeper to be 2181 2182 2183

# the port at which the clients will connect 
clientPort=2181

myid configuration

Create a myid file in the data directory of each zookeeper, the contents are 0, 1, and 2. This file is to record the ID of each server

dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat 
temp/zookeeper/data/myid 
0
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$

zoo.cfg

Configure the client access port (clientPort) and cluster server IP list in zoo.cfg of each zookeeper.

dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat conf/zoo.cfg 
# The number of milliseconds of each tick 
# zk服务器的心跳时间 
tickTime=2000 
# The number of ticks that the initial 
# synchronization phase can take 
initLimit=10 
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement 
syncLimit=5 
# the directory where the snapshot is stored. 
# do not use /tmp for storage, /tmp here is just 
# example sakes. 
#dataDir=/tmp/zookeeper 
dataDir=temp/zookeeper/data 
dataLogDir=temp/zookeeper/log 
# the port at which the clients will connect 
clientPort=2181 
# the maximum number of client connections. 
# increase this if you need to handle more clients 
#maxClientCnxns=60 
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge. 
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance 
#
# The number of snapshots to retain in dataDir 
#autopurge.snapRetainCount=3 
# Purge task interval in hours 
# Set to "0" to disable auto purge feature 
#autopurge.purgeInterval=1 

server.0=127.0.0.1:2888:3888 
server.1=127.0.0.1:2889:3889 
server.2=127.0.0.1:2890:3890 
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$

Explanation: server. server ID = server IP address: communication port between servers: voting port between servers

Start the cluster

Starting the cluster is to start each instance separately. After starting, we check the running status of each instance

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-1$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-1/bin/../conf/zoo.cfg 
Mode: leader

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-2$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-2/bin/../conf/zoo.cfg 
Mode: follower 

itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-3$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default 
Using config: /mnt/d/zookeeper-cluster/zookeeper-3/bin/../conf/zoo.cfg 
Mode: follower

2. Kafka cluster construction

Cluster directory

itcast@Server-node:/mnt/d/kafka-cluster$ ll 
total 0 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 ./ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:39 kafka-1/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-2/ 
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-3/ 
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 kafka-4/ 
itcast@Server-node:/mnt/d/kafka-cluster$

server.properties

# broker 编号,集群内必须唯一 
broker.id=1 
# host 地址 
host.name=127.0.0.1 
# 端口 
port=9092 
# 消息日志存放地址 
log.dirs=/tmp/kafka/log/cluster/log3 
# ZooKeeper 地址,多个用,分隔 
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183

Start the cluster

Enter each Kafka instance through cmd and enter the command to start

............................... 
[2019-07-24 06:18:19,793] INFO [Transaction Marker Channel Manager 2]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager) 
[2019-07-24 06:18:19,793] INFO [TransactionCoordinator id=2] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator) 
[2019-07-24 06:18:19,846] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread) 
[2019-07-24 06:18:19,869] INFO [SocketServer brokerId=2] Started data-plane processors for 1 acceptors (kafka.network.SocketServer) 
[2019-07-24 06:18:19,879] INFO Kafka version: 2.2.1 (org.apache.kafka.common.utils.AppInfoParser) 
[2019-07-24 06:18:19,879] INFO Kafka commitId: 55783d3133a5a49a (org.apache.kafka.common.utils.AppInfoParser) 
[2019-07-24 06:18:19,883] INFO [KafkaServer id=2] started (kafka.server.KafkaServer)

Three, multi-cluster synchronization

MirrorMakerIt exists to solve Kafka cross-cluster synchronization and create mirrored clusters ; the following figure shows its working principle. The tool consumes the source cluster messages and then pushes the data back to the target cluster.

1. Configuration

Create mirror

Using MirrorMaker to create a mirror is relatively simple. After setting up the target Kafka cluster, you only need to start the mirror-maker program. Among them, one or more consumer configuration files and one producer configuration file are required, and whitelist and blacklist are optional. Specify the Zookeeper of the source Kafka cluster in the configuration of the consumer, and specify the Zookeeper (or broker.list) of the target cluster in the configuration of the producer.

kafka-run-class.sh kafka.tools.MirrorMaker – 
consumer.config sourceCluster1Consumer.config – 
consumer.config sourceCluster2Consumer.config –num.streams 2 – 
producer.config targetClusterProducer.config –whitelist=.*”

Consumer configuration file:

# format: host1:port1,host2:port2 ... 
bootstrap.servers=localhost:9092 

# consumer group id 
group.id=test-consumer-group 

# What to do when there is no initial offset in Kafka or if the current 
# offset does not exist any more on the server: latest, earliest, none 
#auto.offset.reset=

Producer configuration file:

# format: host1:port1,host2:port2 ... 
bootstrap.servers=localhost:9092 

# specify the compression codec for all data generated: none, gzip, snappy, lz4, zstd 
compression.type=none

2. Tuning

How to ensure that the synchronized data is not lost. Confirmation is required when first sending to the target cluster: request.required.acks=1 When sending, the blocking mode is adopted, otherwise the buffer is full and the data is discarded: queue.enqueue.timeout.ms=-1

to sum up

This chapter of Kafka cluster expanded to explain, introduced the cluster usage scenarios, Zookeeperand Kafkamulti-point cluster by building , as well as synchronous operation of the multi-cluster .


Students who need the reference article "Kafka Technical Manual" can add assistant VX: C18173184271 Remarks:CSDN Java_Caiyo Get free Java information!

Guess you like

Origin blog.csdn.net/Java_Caiyo/article/details/112610523