Article directory
- 1. Introduction to Kafka
- 2. Distributed installation of Kafka
- 3. Kafka cluster
- 4. Zokeeper view kafka log
1. Introduction to Kafka
1.1, message queue
1.1.1. Why is there a message queue?
1.1.2, message queue
- Message
The data transmitted between two computers or two communication devices in a Message network. For example: text, music, video and other content. - Queue Queue
is a special linear table (data elements end-to-end), which is special in that it only allows elements to be deleted at the beginning and elements to be added at the end (FIFO).
Into the team, out of the team. - Message queue MQ
message + queue, a queue for saving messages. The container in the process of message transmission; it mainly provides production and consumption interfaces for external calls to store
and obtain data.
1.1.3. Classification of message queues
MQ is mainly divided into two categories: point-to-point p2p, publish subscription (Pub / Sub)
-
Peer-to-Peer generally receives data based on Pull or Polling. The message sent to the queue is accepted by one and only one receiver.
Even if same message in the same queue, asynchronous "immediate delivery" is supported. "Receive" message delivery method, also supports synchronous request/response
transmission method
-
Publish and subscribe Messages published to the same topic can be received by multiple subscribers. Publish/subscribe can consume data based on Push, or
consume data based on Pull or Polling. The decoupling capability is stronger than the P2P model.
1.1.4 Comparison between p2p and publish-subscribe MQ
- Common points
Message producers send messages to the queue, and then consumers read and consume messages from the queue. - Differences
The p2p model includes: message queue (Queue), sender (Sender), receiver (Receiver).
A message produced by a producer has only one consumer (Consumer) (that is, once it is consumed, the message is not in the message queue). For example, make a phone
call.
pub/Sub includes: message queue (Queue), topic (Topic), publisher (Publisher), subscriber (Subscriber). Each
message can have multiple consumers, which do not affect each other. For example, if I post a Weibo: Anyone who follows me can see it.
1.1.5, the usage scenario of the message system
- Decoupling The systems exchange data through the unified interface of the message system without knowing the existence of each other
- Redundancy Part of the message system has message persistence capability, which can avoid the risk of message loss before processing
- Expansion The message system is a unified data interface, and each system can be independently expanded
- Peak processing capacity The message system can withstand the peak traffic, and the business system can obtain and process the corresponding amount of requests from the message system according to the processing capacity
- Recoverability The failure of some keys in the system will not affect the entire system, and it will still be able to obtain and process data from the message system when it recovers
- Asynchronous communication In the scenario where the request does not need to be processed immediately, the request can be put into the message system and processed when appropriate
1.1.6, common message system
- RabbitMQ Erlang written, supports multi-protocol AMQP, XMPP, SMTP, STOMP. Supports load balancing and data persistence. Both
Peer-to-Peer and publish/subscribe modes are supported. - Redis is based on the NoSQL database of Key-Value pairs, and supports the MQ function at the same time, which can be used as a lightweight queue service. As far as enqueuing operations are concerned,
Redis performs better than RabbitMQ for short messages (less than 10kb), and worse than RabbitMQ for long messages. - ZeroMQ is lightweight and does not require a separate message server or middleware. The application itself plays the role, Peer-to-Peer. It is essentially
a library, which requires developers to combine multiple technologies by themselves, which is highly complex to use. - ActiveMQ JMS implementation, Peer-to-Peer, supports persistence, XA (distributed) transactions
- Kafka /Jafka is a high-performance cross-language distributed publish/subscribe message system, data persistence, fully distributed, and supports both online and offline processing
- MetaQ/RocketMQ pure Java implementation, publish/subscribe message system, support local transaction and XA distributed transaction
1.2 Introduction to Kafka
1.2.1. Introduction
Kafka is a distributed publish-subscribe messaging system . It was originally published by LinkedIn, written in Scala language, and
open sourced in December 2010, becoming Apache's top-level project. Kafka is a high-throughput, persistent, distributed publish-subscribe messaging system. It is mainly used to process
active live data (data generated by user behaviors such as login, browsing, clicking, sharing, liking, etc.).
Three major characteristics of Kafka:
- High throughput
can meet the production and consumption of millions of messages per second - production consumption. - Persistence
There is a complete message storage mechanism to ensure the efficient and safe persistence of data - intermediate storage. - Distributed
Based on distributed expansion and fault tolerance mechanism; Kafka data will be copied to several servers. When one fails, producers and consumers
switch to other machines - overall
robustness.
1.2.2. Design goals
- High throughput : A single machine can support the reading and writing of 1 million messages per second on a cheap commercial machine
- Message Persistence All messages are persisted to disk, no message is lost, and message replay is supported
- Fully distributed Producer, Broker, and Consumer all support horizontal expansion
- Adapt to both online stream processing and offline batch processing
1.2.3. The core concept of kafka
What parts does an MQ need? Produce, consume, message classes, store, and more. For kafka, the kafka service is like a big pool. Constantly
producing, storing, and consuming various types of messages. So what does kafka consist of?
Kafka service:
- Topic: topic, different categories of messages processed by Kafka.
- Broker: message server agent, a kafka service node in the Kafka cluster is called a broker, which mainly stores message data. stored in the hard
disk. Each topic is partitioned.- Partition: Topic physical grouping, a topic is divided into one or more partitions in the broker, and the partitions are specified when creating the topic
.- Message: message, which is the basic unit of communication, and each message belongs to a partition
Kafka service related
- Producer: The producer of messages and data, publishes messages to a topic of Kafka.
- Consumer: A consumer of messages and data, set to a topic and process its published messages.
- Zookeeper: Coordinates the normal operation of Kafka.
2. Distributed installation of Kafka
Download address: https://kafka.apache.org/downloads
Chinese download official website: https://kafka.apachecn.org/downloads.html
Installation package link: https://pan.baidu.com/s/1G9F8TEfI88wPi_j2-hkK1A ?pwd=e9tu
source package link: https://pan.baidu.com/s/1LR7X3Is-JRsOOu3DdAp2aw?pwd=7249
2.1 jdk & zookeeper installation
We know that Kafka is managed by Zookeeper, so before installing Kafka, let's install Zookeeper~
1. jdk installation configuration
First of all, CentOS7 will come with jdk by default. In my virtual machine, centos7 comes with open jdk 1.8.0_262_b10 by default.
If you want to install a specified version of jdk, first download the jdk installation package.
Detailed steps for installing jdk on Linux
2. zookeeper installation
My kafka installation package is version 3.4.0, and the corresponding zookeeper version is 3.6.3, so go to the official website to download the compressed package (note that it is a bin.tar.gz compressed package): Official website: http:
//archive.apache . org/dist/zookeeper/
First, put the installation package in the Linux directory and execute the following command:
$ mkdir zk
# 创建Zookeeper数据存储路径
$ mkdir zk/data
# 创建Zookeeper日志存放路径
$ mkdir zk/logs
# 解压安装包
$ tar -zxvf apache-zookeeper-3.8.1-bin.tar.gz
# 配置环境变量,添加下述内容
$ vi /etc/profile
export ZK_HOME=/home/install_package/apache-zookeeper-3.8.1-bin/bin
export PATH=$ZK_HOME/bin:$PATH
$ source /etc/profile
# 生成Zookeeper配置文件
$ cd apache-zookeeper-3.8.1-bin/conf
$ cp zoo_sample.cfg zoo.cfg # 因为zookeeper默认加载的配置文件名是zoo.cfg
Then modify the configuration (data directory and log directory):
vim zoo.cfg
# 心跳间隔时间,时间单位为毫秒值
tickTime=2000
# leader与客户端连接超时时间,设为5个心跳间隔
initLimit=10
# Leader与Follower之间的超时时间,设为2个心跳间隔
syncLimit=5
# 数据存放目录
dataDir=/home/admin/Study/zk/data
# 日志存放目录
dataLogDir=/home/admin/Study/zk/logs
# 客户端通信端口
clientPort=2181
# 清理间隔,单位是小时,默认是0,表示不开启
#autopurge.purgeInterval=1
# 这个参数和上面的参数搭配使用,这个参数指定了需要保留的文件数目,默认是保留3个
#autopurge.snapRetainCount=5
# 单机版不配下述配置
# server.NUM=IP:port1:port2 NUM表示本机为第几号服务器;IP为本机ip地址;
# port1为leader与follower通信端口;port2为参与竞选leader的通信端口
# 多个实例的端口配置不能重复,如下:
#server.0=192.168.101.136:12888:13888
#server.1=192.168.101.146:12888:13888
1. Start the zookeeper background service:
zkServer.sh start
2. Close the zookeeper background service:
zkServer.sh stop
3. View the running status of the zookeeper background service:
zkServer.sh status
2.2, Kafka installation steps
1. First, in the directory where the kafka compressed package is located under Linux, decompress:
$ mkdir kafka
# 创建kafka日志存放路径
$ mkdir kafka/logs
# 解压安装包
$ tar -zxvf kafka_2.12-3.4.0.tgz
# 移动到kafka目录下
mv kafka_2.12-3.4.0 kafka
# 配置环境变量,添加下述内容
$ vi /etc/profile
export KAFKA_HOME=/home/admin/Study/kafka/kafka_2.12-3.4.0
export PATH=$KAFKA_HOME/bin:$PATH
$ source /etc/profile
# 修改kafka配置
$ cd kafka_2.12-3.4.0/config
$ vi server.properties
Modify Kafka configuration
# broker.id每个实例的值不能重复
broker.id=0
# 配置主机的ip和端口
#listeners=PLAINTEXT://:9092
listeners=PLAINTEXT://192.168.57.30:9092
#advertised.listeners=PLAINTEXT://10.11.0.203:9092
# 配置日志存储路径
log.dirs=/home/admin/Study/kafka/logs
# 配置zookeeper集群
zookeeper.connect=localhost:2181
Start kafka, on the premise that zkeeper starts:
bin/kafka-server-start.sh -daemon config/server.properties
Judging that kafka started successfully:
kafka shutdown:
bin/kafka-server-stop.sh -daemon config/server.properties
3. Kafka cluster
Prepare three virtual machines. Here, two copies of the virtual machine installed above are directly cloned.
The cloning process is very simple and will not be repeated here. Next, let's take a look at what configuration changes are needed after cloning.
3.1, clone machine configuration modification
① Modify the host name
Close all virtual machines, open the cloned first one, and modify the host name to kafka02:
vim /etc/hostname
② Modify the network address
vim /etc//sysconfig/network-scripts/ifcfg-ens33
③ Restart:
reboot
Another modification in the same way:
Host name: kafka03
ip address: 192.168.255.214
(Digression: modify the background color and font color of the command line)
Open the command line --> Edit --> Preferences --> Colors: Uncheck Use colors from system theme
④ Go to the kafka installation directory and modify the kafka server.properties configuration file:
vim config/server.properties
broker.id
This attribute is used to uniquely mark a Kafka Broker, and its value is an arbitrary integer value.
This is especially important when Kafka is running in a distributed cluster.
It is best that the value is related to the physical host where the Broker is located. For example, if the host name is host1.lagou.com, then broker.id=1, if the host name is 192.168.100.101, then broker.id=101 and so on.
listeners
It is used to specify the address and port of the current Broker to publish services to the outside world.
Cooperate with advertised.listeners to isolate internal and external networks.
Internal and external network isolation configuration:
- listener.security.protocol.map
Mapping configuration for listener names and security protocols.
For example, the internal and external networks can be isolated even if they both use SSL.
- listener.security.protocol.map=INTERNAL:SSL,EXTERNAL:SSL
Each listener name can only appear once in the map.
- inter.broker.listener.name
It is used to configure the listener name used for communication between brokers, and the name must be in the list of advertised.listeners.
inter.broker.listener.name=EXTERNAL
listeners
A list of URIs and listener names used to configure broker monitoring. Use commas to separate multiple URIs and listener names.
If the listener name does not represent a security protocol, listener.security.protocol.map must be configured.
Each listener must use a different network port.
- advertised.listeners
The address needs to be published to zookeeper for the client to use, if the address used by the client is different from the listeners configuration.
It can be found in zookeeper's get /myKafka/brokers/ids/<broker.id>.
In an IaaS environment, the network interface of this entry must be different from the network interface bound to the broker.
If this entry is not set, the configuration of listeners is used. Unlike listeners, this entry cannot use the 0.0.0.0 network port.
The address of advertised.listeners must be configured or part of the configuration in listeners.
zookeeper.connect
This parameter is used to configure the address of the Zookeeper/cluster that Kafka will connect to.
Its value is a string, using commas to separate multiple Zookeeper addresses. The single address of Zookeeper is in the form of host:port, and the root node path of Kafka in Zookeeper can be added at the end.
3.2. Kafka cluster startup
1. Zookeeper start
zkServer.sh start
2. Kafka start
Order:
bin/kafka-server-start.sh -daemon config/server.properties
The following error message:
It is because the broker.id in the meta.properties file of the logs folder under the kafka directory is inconsistent with that in the server.properties, just modify it.
Started successfully:
3.3, kafka operation command
Refer to the official website for a quick start: Kafka Chinese Documentation
1. View the theme
(The newly installed kafka has no theme)
bin/kafka-topics.sh --list --bootstrap-server 192.168.255.212:9092
2. Create a theme:
2.1. Create a topic named "test", which has a partition and a replica
bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 1 --partitions 1 --topic test
Check the topic at this time:
You can also see it on the other two kafka hosts:
At this time, we check the logs directory of the three kafkas:
you can see the created test topic.
2.2. Test Create a topic again, set the partition to 3, (preferably consistent with the number of hosts):
bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 1 --partitions 3 --topic city
You can see that each of the three machines has a topic partition under the logs directory.
2.3. Create the topic cities, the replication factor is 2, and the partition is 3
Order:
bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 2 --partitions 3 --topic cities
Check the log directory and you can see that there are three partitions each with two copies :
3. Delete the theme
bin/kafka-topics.sh --delete --bootstrap-server 192.168.255.212:9092 --topic 主题名
4. Start the producer/consumer
The following command: Create a producer client, generate messages, and the subject is test. Note that the producer client can be executed on any host as long as kafka is included and the command exists. Currently, on the kafka of 213, it acts as both a server and a client.
bin/kafka-console-producer.sh --broker-list 192.168.255.213:9092 --topic test
Enter and enter the message:
>Beijing
>Shanghai
Now start a consumer:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Adding –from-beginning at the end means accepting all previous messages; not adding means only accepting the latest news, and the past (before the consumer starts) will not be accepted.
At this point, a Hello message is produced:
two consumers receive:
Note: Consumers can receive messages no matter which IP they consume~
4. Zokeeper view kafka log
We execute the above operation commands in sequence and view the generated topics in the kafka logs directory. So in fact we can also check it in zookeeper.
Command:
enter the zookeeper bin directory
zkCli.sh
ls /
ls /brokers
ls /brokers/ids
ls /brokers/topics
Open brokerid = 0 to view the data content:
ls /brokers/ids/0
get /brokers/ids/0
You can view the information of the current host and store it in json format.
Check out the topic next:
ls /brokers/topics/cities
ls /brokers/topics/cities/partitions
ls /brokers/topics/cities/partitions/0
ls /brokers/topics/cities/partitions/0/state
get /brokers/topics/cities/partitions/0/state
get /brokers/topics/cities
Host 0 has cities-1 and cities-2 partitions.
Similarly, other parts can also be viewed:
segment segment
A segment is a logical concept consisting of two types of physical files, namely ".index" files and ".log" files
. The ".log" file stores the messages, and the ".index" file stores the index of the messages in the ".log" file.
Enter the logs of the host where the test topic is located:
it means that there are 0 messages before.
00000000000000001456.log indicates that there are 1456 messages ahead.
View segments
To view the log files in the segment, you need to view them through a tool that comes with Kafka.
bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files
/home/admin/Study/kafka/logs/test-0/00000000000000000000.log --print-data-log
A topic for a user is committed to a __consumer_offsets partition. Use
the hash value of the topic string and 50 to take the modulus, and the result is the partition index. Generally, the default is 50 partitions (0 ~ 49).