Message Middleware - Getting to Know Kafka for the First Time

1. Introduction to Kafka

1.1, message queue

1.1.1. Why is there a message queue?

insert image description here

1.1.2, message queue

  • Message
    The data transmitted between two computers or two communication devices in a Message network. For example: text, music, video and other content.
  • Queue Queue
    is a special linear table (data elements end-to-end), which is special in that it only allows elements to be deleted at the beginning and elements to be added at the end (FIFO).
    Into the team, out of the team.
  • Message queue MQ
    message + queue, a queue for saving messages. The container in the process of message transmission; it mainly provides production and consumption interfaces for external calls to store
    and obtain data.

1.1.3. Classification of message queues

MQ is mainly divided into two categories: point-to-point p2p, publish subscription (Pub / Sub)

  • Peer-to-Peer generally receives data based on Pull or Polling. The message sent to the queue is accepted by one and only one receiver.
    Even if same message in the same queue, asynchronous "immediate delivery" is supported. "Receive" message delivery method, also supports synchronous request/response
    transmission method
    insert image description here

  • Publish and subscribe Messages published to the same topic can be received by multiple subscribers. Publish/subscribe can consume data based on Push, or
    consume data based on Pull or Polling. The decoupling capability is stronger than the P2P model.
    insert image description here

1.1.4 Comparison between p2p and publish-subscribe MQ

  • Common points
    Message producers send messages to the queue, and then consumers read and consume messages from the queue.
  • Differences
    The p2p model includes: message queue (Queue), sender (Sender), receiver (Receiver).
    A message produced by a producer has only one consumer (Consumer) (that is, once it is consumed, the message is not in the message queue). For example, make a phone
    call.
    pub/Sub includes: message queue (Queue), topic (Topic), publisher (Publisher), subscriber (Subscriber). Each
    message can have multiple consumers, which do not affect each other. For example, if I post a Weibo: Anyone who follows me can see it.

1.1.5, the usage scenario of the message system

  • Decoupling The systems exchange data through the unified interface of the message system without knowing the existence of each other
  • Redundancy Part of the message system has message persistence capability, which can avoid the risk of message loss before processing
  • Expansion The message system is a unified data interface, and each system can be independently expanded
  • Peak processing capacity The message system can withstand the peak traffic, and the business system can obtain and process the corresponding amount of requests from the message system according to the processing capacity
  • Recoverability The failure of some keys in the system will not affect the entire system, and it will still be able to obtain and process data from the message system when it recovers
  • Asynchronous communication In the scenario where the request does not need to be processed immediately, the request can be put into the message system and processed when appropriate

1.1.6, common message system

  • RabbitMQ Erlang written, supports multi-protocol AMQP, XMPP, SMTP, STOMP. Supports load balancing and data persistence. Both
    Peer-to-Peer and publish/subscribe modes are supported.
  • Redis is based on the NoSQL database of Key-Value pairs, and supports the MQ function at the same time, which can be used as a lightweight queue service. As far as enqueuing operations are concerned,
    Redis performs better than RabbitMQ for short messages (less than 10kb), and worse than RabbitMQ for long messages.
  • ZeroMQ is lightweight and does not require a separate message server or middleware. The application itself plays the role, Peer-to-Peer. It is essentially
    a library, which requires developers to combine multiple technologies by themselves, which is highly complex to use.
  • ActiveMQ JMS implementation, Peer-to-Peer, supports persistence, XA (distributed) transactions
  • Kafka /Jafka is a high-performance cross-language distributed publish/subscribe message system, data persistence, fully distributed, and supports both online and offline processing
  • MetaQ/RocketMQ pure Java implementation, publish/subscribe message system, support local transaction and XA distributed transaction

1.2 Introduction to Kafka

1.2.1. Introduction

Kafka is a distributed publish-subscribe messaging system . It was originally published by LinkedIn, written in Scala language, and
open sourced in December 2010, becoming Apache's top-level project. Kafka is a high-throughput, persistent, distributed publish-subscribe messaging system. It is mainly used to process
active live data (data generated by user behaviors such as login, browsing, clicking, sharing, liking, etc.).
insert image description here
Three major characteristics of Kafka:

  • High throughput
    can meet the production and consumption of millions of messages per second - production consumption.
  • Persistence
    There is a complete message storage mechanism to ensure the efficient and safe persistence of data - intermediate storage.
  • Distributed
    Based on distributed expansion and fault tolerance mechanism; Kafka data will be copied to several servers. When one fails, producers and consumers
    switch to other machines - overall
    robustness.

1.2.2. Design goals

  • High throughput : A single machine can support the reading and writing of 1 million messages per second on a cheap commercial machine
  • Message Persistence All messages are persisted to disk, no message is lost, and message replay is supported
  • Fully distributed Producer, Broker, and Consumer all support horizontal expansion
  • Adapt to both online stream processing and offline batch processing

1.2.3. The core concept of kafka

What parts does an MQ need? Produce, consume, message classes, store, and more. For kafka, the kafka service is like a big pool. Constantly
producing, storing, and consuming various types of messages. So what does kafka consist of?

Kafka service:

  • Topic: topic, different categories of messages processed by Kafka.
  • Broker: message server agent, a kafka service node in the Kafka cluster is called a broker, which mainly stores message data. stored in the hard
    disk. Each topic is partitioned.
  • Partition: Topic physical grouping, a topic is divided into one or more partitions in the broker, and the partitions are specified when creating the topic
    .
  • Message: message, which is the basic unit of communication, and each message belongs to a partition

Kafka service related

  • Producer: The producer of messages and data, publishes messages to a topic of Kafka.
  • Consumer: A consumer of messages and data, set to a topic and process its published messages.
  • Zookeeper: Coordinates the normal operation of Kafka.

2. Distributed installation of Kafka

Download address: https://kafka.apache.org/downloads
Chinese download official website: https://kafka.apachecn.org/downloads.html
Installation package link: https://pan.baidu.com/s/1G9F8TEfI88wPi_j2-hkK1A ?pwd=e9tu
source package link: https://pan.baidu.com/s/1LR7X3Is-JRsOOu3DdAp2aw?pwd=7249

2.1 jdk & zookeeper installation

We know that Kafka is managed by Zookeeper, so before installing Kafka, let's install Zookeeper~

1. jdk installation configuration

First of all, CentOS7 will come with jdk by default. In my virtual machine, centos7 comes with open jdk 1.8.0_262_b10 by default.

If you want to install a specified version of jdk, first download the jdk installation package.
Detailed steps for installing jdk on Linux

2. zookeeper installation

My kafka installation package is version 3.4.0, and the corresponding zookeeper version is 3.6.3, so go to the official website to download the compressed package (note that it is a bin.tar.gz compressed package): Official website: http:
//archive.apache . org/dist/zookeeper/

First, put the installation package in the Linux directory and execute the following command:

$ mkdir zk
# 创建Zookeeper数据存储路径
$ mkdir zk/data
# 创建Zookeeper日志存放路径
$ mkdir zk/logs
# 解压安装包
$ tar -zxvf apache-zookeeper-3.8.1-bin.tar.gz
# 配置环境变量,添加下述内容
$ vi /etc/profile
export ZK_HOME=/home/install_package/apache-zookeeper-3.8.1-bin/bin
export PATH=$ZK_HOME/bin:$PATH
$ source /etc/profile
# 生成Zookeeper配置文件
$ cd apache-zookeeper-3.8.1-bin/conf
$ cp zoo_sample.cfg zoo.cfg   # 因为zookeeper默认加载的配置文件名是zoo.cfg

Then modify the configuration (data directory and log directory):

vim zoo.cfg
# 心跳间隔时间,时间单位为毫秒值
tickTime=2000
# leader与客户端连接超时时间,设为5个心跳间隔
initLimit=10
# Leader与Follower之间的超时时间,设为2个心跳间隔
syncLimit=5
# 数据存放目录
dataDir=/home/admin/Study/zk/data
# 日志存放目录
dataLogDir=/home/admin/Study/zk/logs
# 客户端通信端口
clientPort=2181
# 清理间隔,单位是小时,默认是0,表示不开启
#autopurge.purgeInterval=1
# 这个参数和上面的参数搭配使用,这个参数指定了需要保留的文件数目,默认是保留3个
#autopurge.snapRetainCount=5
# 单机版不配下述配置
# server.NUM=IP:port1:port2 NUM表示本机为第几号服务器;IP为本机ip地址;
# port1为leader与follower通信端口;port2为参与竞选leader的通信端口
# 多个实例的端口配置不能重复,如下:
#server.0=192.168.101.136:12888:13888
#server.1=192.168.101.146:12888:13888

1. Start the zookeeper background service:

zkServer.sh start

insert image description here

2. Close the zookeeper background service:

zkServer.sh stop

3. View the running status of the zookeeper background service:

zkServer.sh status

insert image description here

2.2, Kafka installation steps

1. First, in the directory where the kafka compressed package is located under Linux, decompress:

$ mkdir kafka
# 创建kafka日志存放路径
$ mkdir kafka/logs
# 解压安装包
$ tar -zxvf kafka_2.12-3.4.0.tgz
# 移动到kafka目录下
mv kafka_2.12-3.4.0 kafka
# 配置环境变量,添加下述内容
$ vi /etc/profile
export KAFKA_HOME=/home/admin/Study/kafka/kafka_2.12-3.4.0
export PATH=$KAFKA_HOME/bin:$PATH
$ source /etc/profile
# 修改kafka配置
$ cd kafka_2.12-3.4.0/config
$ vi server.properties

Modify Kafka configuration

# broker.id每个实例的值不能重复
broker.id=0
# 配置主机的ip和端口
#listeners=PLAINTEXT://:9092
listeners=PLAINTEXT://192.168.57.30:9092
#advertised.listeners=PLAINTEXT://10.11.0.203:9092
# 配置日志存储路径
log.dirs=/home/admin/Study/kafka/logs
# 配置zookeeper集群
zookeeper.connect=localhost:2181

Start kafka, on the premise that zkeeper starts:

bin/kafka-server-start.sh -daemon config/server.properties

Judging that kafka started successfully:
insert image description here

kafka shutdown:

bin/kafka-server-stop.sh -daemon config/server.properties

3. Kafka cluster

Prepare three virtual machines. Here, two copies of the virtual machine installed above are directly cloned.
The cloning process is very simple and will not be repeated here. Next, let's take a look at what configuration changes are needed after cloning.

3.1, clone machine configuration modification

① Modify the host name
Close all virtual machines, open the cloned first one, and modify the host name to kafka02:

vim /etc/hostname

insert image description here

② Modify the network address

vim /etc//sysconfig/network-scripts/ifcfg-ens33

insert image description here

③ Restart:

reboot

Another modification in the same way:

Host name: kafka03
ip address: 192.168.255.214

(Digression: modify the background color and font color of the command line)

Open the command line --> Edit --> Preferences --> Colors: Uncheck Use colors from system theme

④ Go to the kafka installation directory and modify the kafka server.properties configuration file:

vim config/server.properties

insert image description here
insert image description here

broker.id

This attribute is used to uniquely mark a Kafka Broker, and its value is an arbitrary integer value.
This is especially important when Kafka is running in a distributed cluster.
It is best that the value is related to the physical host where the Broker is located. For example, if the host name is host1.lagou.com, then broker.id=1, if the host name is 192.168.100.101, then broker.id=101 and so on.

listeners

It is used to specify the address and port of the current Broker to publish services to the outside world.
Cooperate with advertised.listeners to isolate internal and external networks.

Internal and external network isolation configuration:

  • listener.security.protocol.map

Mapping configuration for listener names and security protocols.

For example, the internal and external networks can be isolated even if they both use SSL.

  • listener.security.protocol.map=INTERNAL:SSL,EXTERNAL:SSL

Each listener name can only appear once in the map.

  • inter.broker.listener.name

It is used to configure the listener name used for communication between brokers, and the name must be in the list of advertised.listeners.

  • inter.broker.listener.name=EXTERNAL

  • listeners

A list of URIs and listener names used to configure broker monitoring. Use commas to separate multiple URIs and listener names.

If the listener name does not represent a security protocol, listener.security.protocol.map must be configured.

Each listener must use a different network port.

  • advertised.listeners

The address needs to be published to zookeeper for the client to use, if the address used by the client is different from the listeners configuration.

It can be found in zookeeper's get /myKafka/brokers/ids/<broker.id>.

In an IaaS environment, the network interface of this entry must be different from the network interface bound to the broker.

If this entry is not set, the configuration of listeners is used. Unlike listeners, this entry cannot use the 0.0.0.0 network port.

The address of advertised.listeners must be configured or part of the configuration in listeners.

zookeeper.connect

This parameter is used to configure the address of the Zookeeper/cluster that Kafka will connect to.

Its value is a string, using commas to separate multiple Zookeeper addresses. The single address of Zookeeper is in the form of host:port, and the root node path of Kafka in Zookeeper can be added at the end.

3.2. Kafka cluster startup

1. Zookeeper start

zkServer.sh start

2. Kafka start

Order:

bin/kafka-server-start.sh -daemon config/server.properties

The following error message:
insert image description here
It is because the broker.id in the meta.properties file of the logs folder under the kafka directory is inconsistent with that in the server.properties, just modify it.
insert image description here
Started successfully:

insert image description here

3.3, kafka operation command

Refer to the official website for a quick start: Kafka Chinese Documentation

1. View the theme

(The newly installed kafka has no theme)

bin/kafka-topics.sh --list --bootstrap-server 192.168.255.212:9092

insert image description here

2. Create a theme:

2.1. Create a topic named "test", which has a partition and a replica

bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 1 --partitions 1 --topic test

insert image description here
Check the topic at this time:
insert image description here
You can also see it on the other two kafka hosts:
insert image description here
insert image description here
At this time, we check the logs directory of the three kafkas:
insert image description here
insert image description here
insert image description here
you can see the created test topic.

2.2. Test Create a topic again, set the partition to 3, (preferably consistent with the number of hosts):

bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 1 --partitions 3 --topic city

insert image description here
You can see that each of the three machines has a topic partition under the logs directory.
insert image description here
insert image description here
insert image description here

2.3. Create the topic cities, the replication factor is 2, and the partition is 3

Order:

bin/kafka-topics.sh --create --bootstrap-server 192.168.255.212:9092 --replication-factor 2 --partitions 3 --topic cities

insert image description here
Check the log directory and you can see that there are three partitions each with two copies :
insert image description here
insert image description here
insert image description here

3. Delete the theme

bin/kafka-topics.sh --delete --bootstrap-server 192.168.255.212:9092 --topic 主题名

4. Start the producer/consumer

The following command: Create a producer client, generate messages, and the subject is test. Note that the producer client can be executed on any host as long as kafka is included and the command exists. Currently, on the kafka of 213, it acts as both a server and a client.

bin/kafka-console-producer.sh --broker-list 192.168.255.213:9092 --topic test

Enter and enter the message:

>Beijing
>Shanghai

Now start a consumer:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Adding –from-beginning at the end means accepting all previous messages; not adding means only accepting the latest news, and the past (before the consumer starts) will not be accepted.
insert image description here
insert image description here
At this point, a Hello message is produced:
insert image description here
two consumers receive:
insert image description here
insert image description here

Note: Consumers can receive messages no matter which IP they consume~

4. Zokeeper view kafka log

We execute the above operation commands in sequence and view the generated topics in the kafka logs directory. So in fact we can also check it in zookeeper.
Command:
enter the zookeeper bin directory

zkCli.sh

insert image description here

ls /

insert image description here

ls /brokers
ls /brokers/ids
ls /brokers/topics

insert image description here
Open brokerid = 0 to view the data content:

ls /brokers/ids/0
get /brokers/ids/0

You can view the information of the current host and store it in json format.
insert image description here
Check out the topic next:

ls /brokers/topics/cities
ls /brokers/topics/cities/partitions
ls /brokers/topics/cities/partitions/0
ls /brokers/topics/cities/partitions/0/state
get /brokers/topics/cities/partitions/0/state

insert image description here

get /brokers/topics/cities

insert image description here
Host 0 has cities-1 and cities-2 partitions.

Similarly, other parts can also be viewed:

insert image description here


segment segment

A segment is a logical concept consisting of two types of physical files, namely ".index" files and ".log" files
. The ".log" file stores the messages, and the ".index" file stores the index of the messages in the ".log" file.

Enter the logs of the host where the test topic is located:
insert image description here
it means that there are 0 messages before.
00000000000000001456.log indicates that there are 1456 messages ahead.
insert image description here

View segments

To view the log files in the segment, you need to view them through a tool that comes with Kafka.

bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
/home/admin/Study/kafka/logs/test-0/00000000000000000000.log --print-data-log

insert image description here
A topic for a user is committed to a __consumer_offsets partition. Use
the hash value of the topic string and 50 to take the modulus, and the result is the partition index. Generally, the default is 50 partitions (0 ~ 49).
insert image description here

Guess you like

Origin blog.csdn.net/qq_36256590/article/details/132170538