[Kafka from entry to abandonment series 2] Kafka cluster construction and basic commands

The previous blog has a detailed understanding of the basic concepts, production and consumer models, and basic architecture of Kafka. I have an overall understanding of Kafka. The overall framework can actually be understood as the following architecture [A Partion partition's leader and follower will not Stored on a broker]: After the
Insert picture description here
message is produced, it is sent to the Kafka cluster according to the topic. The cluster manages the message, and the consumer cluster actively pulls the message and consumes it [the consumer's consumption speed can be determined by itself, and may exist The waste of long-connection polling], the overall message and cluster management are carried out by Zookeeper . After understanding these, let's build a Kafka cluster. Since I only have one computer, I am going to use virtual machine construction to build the environment. Then I first deploy a single-node virtual machine and run kafka on it, and then clone two virtual machines to form one Distributed virtual machine cluster .

Single node virtual machine installation

I use the following combination to build the virtual machine: When preparing the environment, I decided to use VMware+Centos7+SecureCRT+AppNode to build and manage the virtual machine. The detailed construction process is not discussed here. You can refer to my other article [distribution Type cluster building one] Virtual machine configuration (VMware+Centos7+SecureCRT+AppNode) , after building a single-node virtual machine according to this article, we can install Kafka on it:

Download and install Kafka

Find a mirror site on the official website to download Kafka from the official site of kafka . Here we choose the mirror site of Tsinghua University to download a Kafka:

Insert picture description here
Note that there is a pit here. If you pull it directly to centos by wget and decompress it, an error will be reported, so we first download to windows, then upload to centos,

decompress directly through our AppNode: or use the command:, tar -xvf kafka_2.13-2.6.0.tgzyou can see after decompression is complete Go to the above directory

Modify the Zookeeper configuration file

Enter the following path /kafka/kafka_2.13-2.6.0/config/修改zookeeper.propertiesconfiguration file

dataDir=/tmp/zookeeper
dataLogDir=/tmp/zookeeper/log
clientPort=2181
maxClientCnxns=0
admin.enableServer=false
tickTime=2000
initLimit=10
syncLimit=5
#设置broker Id的服务地址,这里的0、1、2和brokerid保持一致
server.0=192.168.5.101:2888:3888
server.1=192.168.5.102:2888:3888
server.2=192.168.5.103:2888:3888

Among them, port 2888 is the communication port of zookeeper, and port 3888 is the election port. Then enter the directory /tmp/zookeeperof dataDir to create a file myid, and write the specific value of server.id (it is recommended to be consistent with kafka's broker.id)
Insert picture description here
and fill in the serial number consistent with brokerid and the corresponding server of the local ip in myid:

Insert picture description here

Modify Kafka configuration file

Enter the following path /kafka/kafka_2.13-2.6.0/config/and modify the configuration of server.properties:

broker.id=0
zookeeper.connect=192.168.5.101:2181,192.168.5.102:2181,192.168.5.103:2181
log.dirs=/kafka/kafka_2.13-2.6.0/data

Clone a virtual machine cluster

To clone the cluster and manage it, please refer to my other blog- [Distributed Cluster Construction Two] Clone the virtual machine and configure the cluster . Three machines are used for distributed cluster configuration. After the configuration is completed, you can see that the cluster is operating normally: of
Insert picture description here
course After cloning is complete, we need to modify the configuration file and

Create a myid file for each machine

Enter the dataDir directory /tmp/zookeeperto create a file myid, and write the specific value of server.id, write 1 and 2 in the myid files of 102 and 103, respectively, and keep it consistent with the server.id and brokerid of each machine.

Modify the Kafka configuration of each machine

Enter the following path of each machine /kafka/kafka_2.13-2.6.0/config/and modify the configuration of server.properties

broker.id=0
zookeeper.connect=192.168.5.101:2181,192.168.5.102:2181,192.168.5.103:2181
log.dirs=/kafka/kafka_2.13-2.6.0/data

broker.id=1
zookeeper.connect=192.168.5.101:2181,192.168.5.102:2181,192.168.5.103:2181
log.dirs=/kafka/kafka_2.13-2.6.0/data

broker.id=2
zookeeper.connect=192.168.5.101:2181,192.168.5.102:2181,192.168.5.103:2181
log.dirs=/kafka/kafka_2.13-2.6.0/data

Run kafka command

After the modification is completed, let's operate Kafka to implement it. At this time, use SecureCRT to open three sessions at the same time: operate separately on the three machines as a daemon:

Start zookeeper and Kafka

启动zookeeper     bin/zookeeper-server-start.sh  -daemon  config/zookeeper.properties   
启动kafka集群     bin/kafka-server-start.sh      -daemon  config/server.properties

Insert picture description here

Create topic, view created topic, view topic details

创建topic       bin/kafka-topics.sh  --zookeeper 192.168.5.101:2181 --create --replication-factor 3 --partitions 1 --topic tml-second
查看topic列表   bin/kafka-topics.sh  --zookeeper 192.168.5.101:2181 --list
查看topic详情   bin/kafka-topics.sh  --zookeeper 192.168.5.101:2181 --describe --topic tml-second

Insert picture description here
It should be noted that the number of copies cannot exceed the number of cluster machines. If two copies of the same partition appear on the same machine, the partition effect will be lost, as follows:

When viewing details, the first line shows a summary of all partitions. Each line below gives the information in one partition. If we only have one partition, only one line is displayed.

The leader is the node responsible for reading and writing among all the given partitons. Each node may become the leader. Here, the leader is 2, which is 102 machines.
Replicas displays the node list of the node where all replicas of a given partiton are stored, regardless of whether the node is the leader or whether it is alive or not, here are our three machines, 0, 1, 2. Corresponding to 101, 102, 103 respectively.
The set of nodes whose copies of isr have been synchronized. All nodes in this set are in a live state and are synchronized with the leader. Here are our three machines, 0, 1, 2. Corresponding to 101, 102, and 103 respectively. It means that all of our three machines are not offline in the cluster.

Let's use a complex case to determine its distribution mechanism:
Insert picture description here

Send messages, consume messages

发送消息：bin/kafka-console-producer.sh --broker-list  192.168.5.101:9092 --topic tml-second
消费消息：bin/kafka-console-consumer.sh  --bootstrap-server 192.168.5.102:9092 --from-beginning --topic tml-second
消费同一个组的消息：bin/kafka-console-consumer.sh  --bootstrap-server 192.168.5.102:9092  --topic tml-second --consumer.config config/consumer.properties

Insert picture description here

Delete topic

 bin/kafka-topics.sh  --zookeeper 192.168.5.101:2181 --delete --topic tml-kafka

Insert picture description here

Shut down the kafka service

bin/kafka-server-stop.sh stop

Insert picture description here

This blog is a difficult process. Since Linux is not very large, the cluster always fails to start after the cluster is configured. It took two days to find that zookeeper and kafka were not started as a daemon process. CTRL+C killed the process and caused the follow-up. Kafka is always unable to connect to zookeeper, but finally a distributed cluster has been built, and there will be another village!