Kafka overview
Kafka is a distributed system consisting of servers and clients that communicate through high-performance TCP network protocols. It can be deployed on bare metal hardware, virtual machines, and containers in internal and cloud environments.
1) Apache Kafka is an open source messaging system written in Scala. It is an open source messaging system project developed by the Apache Software Foundation.
2) Kafka was originally developed by LinkedIn and open sourced in early 2011. Graduated from Apache Incubator in October 2012. The goal of this project is to provide a unified, high-throughput, low-latency platform for processing real-time data.
3) Kafka is a distributed message queue. Kafka categorizes messages according to Topic when they are stored. The sender is called Producer, and the recipient of the message is called Consumer. In addition, the Kafka cluster is composed of multiple Kafka instances, and each instance (server) is called a broker.
4) Both the Kafka cluster and the consumer rely on the zookeeper cluster to save some meta information to ensure system availability.
Kafka architecture diagram
Concepts in Kafka
1) Producer: Producers are client applications that publish (write) events to Kafka.
2) Consumer: Consumers are client applications that subscribe (read and process) these events.
3) Topic: The topic is similar to the folder in the file system, and the event is the file in the folder. Messages are generally distinguished by topic.
4) Consumer Group (CG): User group, Kafka distinguishes consumers by user group. Messages in a topic will be sent to all user groups. However, the message will only be sent to one consumer in the user group.
5) Broker: A kafka server is a broker. A cluster is composed of multiple brokers. A broker can hold multiple topics.
6) Partition: In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers), a topic can be divided into multiple partitions, and each partition is an ordered queue. Each message in the partition will be assigned an ordered id (offset). Kafka only guarantees that messages will be sent to the consumer in the order in a partition, and does not guarantee the order of a topic as a whole (among multiple partitions);
7) Offset: Kafka's storage files are named after offset.kafka. The advantage of using offset as the name is to facilitate searching. For example, if you want to find the location at 2049, just find the file 2048.kafka. Of course the first offset is 00000000000.kafka
Zookeeper installation
Since kafka requires Zookeeper service, we need to install Zookeeper first
Introduction to Zookeeper
ZooKeeper is an open source distributed framework that provides basic services for coordinating distributed applications. It exposes a set of common services-Distributed Synchronization, Naming Service, Group Maintenance, etc. to external applications, simplifying the coordination and management of distributed applications.
It is an open source implementation of Google's Chubby. It can be built into a cluster by itself. This zk cluster is used to manage the application cluster, monitor the status of each node in the application cluster, and decide the next reasonable operation based on the feedback information submitted by each node in the application cluster
Zookeeper installation
Download the zookeeper compressed package: https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
upload to the server to decompress
Deploy a 3-node Zookeeper pseudo-distributed cluster
- First, create a directory for cluster installation, called zookeeper. Secondly, decompress three copies of ZooKeeper under this directory to form 3 nodes. ZooKeeper in each directory represents a node.
This forms the following installation directory structure:
- Create a data directory, logs directory and myid file for each node
. The content of the myid file is the number of the node in the cluster. The number of the zookeeper1 node is written as 1, and the number of zookeeper2 is 2 and the number of zookeeper3 is 3. - Create a configuration file for each node.
Rename the
configuration file zoo_sample.cfg to zoo.cfg under zookeeper1. The content of the configuration file zoo.cfg is as follows:
dataDir=/usr/zookeeper/zookeeper1/data
dataLogDir=/usr/zookeeper/zookeeper1/logs
clientPort=2181
server.1=127.0.0.1:8880:7770
server.2=127.0.0.1:8881:7771
server.3=127.0.0.1:8882:7772
In the same way, create zoo.cfg in the corresponding locations of zookeeper2 and zookeeper3, and copy the contents of the file to zookeeper1's zoo.cfg. Just need to change the three configuration items of clientport, dataDir, and dataLogDir. Zookeeper2's clientport is changed to 2182, zookeeper3's clientport is changed to 2183, and dataDir
and dataLogDir are modified to the corresponding directories.
- Start the zk cluster
Enter the bin directory of zookeeper0, the first node of the zookeeper cluster, and start the service
bin/zkServer.sh start
Then, in the same way, start zookeeper1 and zookeeper2 services in turn.
Zookeeper service commands
#启动ZK服务:
bin/zkServer.sh start
#查看ZK服务状态:
bin/zkServer.sh status
#停止ZK服务:
bin/zkServer.sh stop
#重启ZK服务:
bin/zkServer.sh restart
#连接服务器:
zkCli.sh -server 127.0.0.1:2181
kafka
First, you need to download the
official website address of Kafka on the official website: http://kafka.apache.org/downloads
After downloading, upload to centos7 to
decompress
tar -zxvf kafka_2.12-2.1.0.tgz
Modify the configuration file
cd to the config directory, edit server.properties
broker.id=0
#此处填写你的服务器ip
listeners=PLAINTEXT://192.168.130.128:9092
#选择你的logs存放目录
log.dirs=/usr/kafka2.12/kafka-logs
delete.topic.enable=true
#zookeeper集群信息
zookeeper.connect=192.168.130.128:2181,192.168.130.128:2182,192.168.130.128:2183
Start service
bin/kafka-server-start.sh config/server.properties
Now create a new topic with a copy of 3
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic topic_test
View topic information in the cluster
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic topic_test
View the list of kafka topics
bin/kafka-topics.sh --zookeeper localhost:2181 --list
View topic data that has been consumed
bin/kafka-console-consumer.sh --bootstrap-server 192.168.130.128:9092 --topic topic_test --from-beginning
View kafka consumer-group list
bin/kafka-consumer-groups.sh --bootstrap-server 192.168.130.128:9092 --list
Delete topic_test
bin/kafka-topics.sh --delete --zookeeper 127.0.0.1:2181 --topic topic_test
Modify the number of partitions
bin/kafka-topics.sh --zookeeper 192.168.130.128:2181 --alter --topic topic_test --partitions 4