[Kafka] Building kafka service under Linux, a complete learning case

(1) Kafka architecture foundation

【1】What is graphic kafka?

(1) Why do you need a message queue

When a program sends a message to another program, it can be sent directly one-to-one

Please add a picture description

Please add a picture description

Please add a picture description
When there are more and more programs and communication links, the disadvantages of this model can be found
1-teams may perform repetitive work, which will cause waste of resources
2-when there is too much information to achieve timely synchronization, It will cause the loss of information
3- Direct communication between programs, the coupling degree is too high, it may affect the whole body

Kafka can receive messages from different producers, and then subscribe to these messages by different consumers for their own use
Please add a picture description

(2) Topic theme

So how can consumers ensure that they get the news they want? The Topic theme is used here
Please add a picture description

(3) partition

Partition concept, a topic may contain multiple partitions, partitions can be distributed on different servers, so a topic can be distributed on multiple servers,

Please add a picture description
The producer will put the message into the corresponding partition of the corresponding topic. How does the producer know which message should be put into which partition? There are two cases:
(1) The producer specifies a partition
Please add a picture description
(2) The producer does not specify a partition.
The partitioner will determine the destination of the message according to the key key.
The key can be regarded as a mark, and each value will correspond to a mark.
The partitioner can Think of it as an algorithm, the input value is the key, and the output value is which partition to go to.
Please add a picture description
Then a message should include these parts: topic, partition, key, and value. Only with such a message can you find where you want to go
Please add a picture description

(4) Consumers read data (offset)

So how do consumers read the data next?
Introduce a concept: the offset (that is, the meaning of the number)
is already set when the offset is written, and the consumer reads the data according to the offset when reading the data

Please add a picture description
1-offset (offset): the number
2-in a partition, the offset of each message is unique
3-consumers can only read sequentially

Please add a picture description

(5) kafka cluster

An independent kafka server above is a broker, and
there can be multiple topics in a broker, and there can be multiple partitions in a topic.
The broker receives messages from producers, sets the corresponding offset for each message, and then sends the message Save to disk.
Broker responds to consumer requests

Please add a picture descriptionMultiple brokers will form a Kafka cluster to ensure the safety of the project, and one can make up for the other if it goes down
Please add a picture description

[2] Kafka usage scenarios

(1) Log tracking

Please add a picture description

(2) Buffering and peak clipping

Please add a picture description

(3) Decoupling

Please add a picture description

(4) Asynchronous communication

Please add a picture description

[3] Two modes of message queue

Please add a picture description

[4] The infrastructure of kafka

Please add a picture description
If it is massive data, it will be too stressful to use a single server to store it. A topic can be divided into several pieces for processing, and clusters can be built on several servers, and each server is a broker. If the producer needs to produce 100T of data in topic A, then each topic A on the 3 servers produces and stores about 33T, and the same topic is divided into three different partitions on the 3 servers

Please add a picture description
Consumers are introduced into consumer groups, which contain multiple consumers. Each consumer consumes messages from a partition, which can improve the ability of consumers to consume messages from a topic. There are some rules (1) The message
data
of a partition is only It can be consumed by one consumer in a consumer group. If a partition is consumed by two consumers, then it will be uncertain which consumer, and may consume repeatedly, which is not conducive to management.
Please add a picture description
If a partition is hung up, then this partition exists The 33T data is gone, so Kafka introduces replicas to back up data
1-these replicas are different, divided into leader and follower, consumers only consume leader when they consume, and follower is only responsible for backup, waiting for leader to hang up Only a follower can become a new leader

Please add a picture description
There are also some information stored in zookeeper. The stored information is which kafka server brokers are online and working, and it will also record which of each broker is the leader partition. One is
Please add a picture description
based on zookeeper, and the other is not based on
a certain partition. Data can only be handed over to one consumer for consumption to prevent conflicts between the two.
In order to ensure the reliability of the partition, a copy is introduced. The copy is divided into leader and follower. Production and consumption are only for the leader. Only when the leader hangs up the follower can it become a new condition. leader
zookeeper stores upper and lower limit information
Please add a picture description

(2) Install Kafka on Linux

【1】Installation process

(1) Install JDK

Teaching path:
Check whether the local JDK installation is successful
Please add a picture description

(2) Install kafka

Go to the official website to download
and transfer the installation package from mac to Linux system

scp -P 22 /Library/Java/kafka_2.12-3.1.0.tgz root@192.168.19.11:/
root

unzip files

tar -zxvf kafka_2.13-3.2.0.tgz

insert image description here

(3) Modify the configuration file of kafka: config/server.properties

# 现在不用改,但是后面如果要配置kafka集群的话,要保证每个服务器的broker.id都是不一样的
broker.id=0
# 配置当前主机的地址,默认端口号就是9092
listeners=PLAINTEXT://192.168.19.11:9092
advertised.listeners=PLAINTEXT://192.168.19.11:9092
# 配置日志文件的路径
log.dirs=/root/kafka_2.12-3.1.0/data/kafka-logs
# 连接zookeeper
zookeeper.connect=127.0.0.1:2181

External or built-in Zookeeper can be used, and built-in Zookeeper is used here

(4) Modify the zookeeper configuration file: config/zookeeper.properties

dataDir=../zkData
dataLogDir=../zkLogs
audit.enable=true

(5) Start kafka

Enter the bin directory
and start zookeeper first

./zookeeper-server-start.sh -daemon ../config/zookeeper.properties

Then start kafka

./kafka-server-start.sh -daemon ../config/server.properties &

Check whether the startup is successful

ps -aux | grep server.properties
ps -aux | grep zookeper.properties

insert image description here

【2】Test example

(1) Concept introduction

(1) Concept
insert image description here(2) Why use message queues
If you use a synchronous communication method to solve the communication between multiple services, you must ensure that the communication at each step is smooth, otherwise you will go wrong and
Please add a picture descriptionif you use asynchronous Decoupling can be achieved by using communication methods to solve the communication between multiple services
Please add a picture description
Please add a picture description

Please add a picture description

(2) Create Topic

Create a topic named javaTopic with 2 partitions and 1 replica

./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --create --topic javaTopic --partitions 2 --replication-factor 1

replication-factor: specify the number of replicas
partitions: specify the partition

insert image description here

(3) View Topic

./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --list

insert image description here

(4) Delete Topic

./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --delete -topic testTopic

insert image description here

(5) Production/consumption data

Enter the bin directory, open two terminals, and execute the following commands respectively:
(1) Execute the producer command first (send a message)
Kafka comes with a producer command client, which can read content from local files, or we can use The command line directly enters the content, and sends the content to the Kafka cluster in the form of a message. By default, each line is treated as a separate message. Use Kafka's client to send messages, specify the Kafka server address and topic to send to

./kafka-console-producer.sh --broker-list 192.168.19.11:9092 --topic javaTopic

(2) Input message
insert image description here

(3) Another window enters the bin directory to execute consumer commands

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --topic javaTopic

(4) Continue to input content in the producer window
insert image description here(5) Look at
insert image description herethe success of the consumer window

(3) Further case analysis

【1】How do consumers know where to start consumption

When we tested above, the producer first entered 3333, and then the consumer was turned on. Then enter 4444 to be consumed by consumers, why is the content before 4444 not consumed?

This introduces the "offset"

For the consumer, Kafka also carries a command-line client, which will output the obtained content in the command, and consume the latest news by default. A client that uses kafka's consumer message to consume information from a specified topic of a specified kafka server

(1) Method 1: Start consumption from the offset of the last message + 1

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --topic javaTopic

(2) Method 2: Consumption from scratch

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --from-beginning --topic javaTopic

insert image description here
A few points to note:
(1) The message will be stored, the producer will send the message to the broker, and the broker will save the message to the local log file
(2) The message is stored sequentially, and the message is described by offset Orderliness
(3) The message has an offset
(4) When consuming, the offset can be specified for consumption

Enter the log path we set in the server.peoperties configuration file: kafka_2.12-3.1.0/data/kafka-logs. You can see that there are 50 corresponding maintenance offset files below:
(1) __consumer_offsets-49
1-kafka internally created the __consumer_offsets topic, which contains 50 partitions. This topic is used to store the offset of a consumer's consumption of a topic, that is, to record where a consumer has consumed.
2- There will be many topics and many consumers, and each consumer will put offset data into this file, so this file stores a lot of information, creating 50 partitions can ensure a large amount of storage The amount of data. By setting 50 partitions, the concurrency of this topic can be improved.
3-Because each consumer will maintain the offset of the consumed topic by itself, that is to say, each consumer will independently report the offset of the consumed topic to the default topic in kafka: consumer_offsets.
(2) Information Supplement
1 - Submit the offset of your consumption partition to Kafka on a regular basis: __consumer_offsets. When submitting it, the key is consumerGroupid+topic+partition number, and the value is the value of the current offset. Kafka will periodically clean up the messages in the topic. Finally, keep the latest data.
2-Because __consumer_offsets may receive highly concurrent requests, Kafka assigns 50 partitions to it by default (can be set through offsets.topic.num.partitions), so that it can resist large concurrency by adding machines
3-The following formula can be used to select which partition of __consumer_offsets the offset consumed by the consumer should be submitted to: hash(consumerGroupId)%__consumer_offsets The number of partitions in the topic (common hash algorithm + modulo) can also be seen: javaTopic-
insert image description here0 、javaTopic-1 has two partitions we created, enter the partition file to view:
(1)
The principle of 0000.index sparse index, using the binary search method, can find data more quickly
(2) 0000.log
saves the producer The message sent to the topic partition
(3) 0000.timeindex
looks for data according to the time index

insert image description here
The summary process is: the message input by the producer is stored in the 0000.log file in the javaTopic-0 directory. The message has its own offset. When the consumer obtains the message, it reads the information from this file according to the offset. And just reading will not delete the message, so other consumers can still read all the messages when they come to read again.

insert image description here

【2】Unicast message

In a kafka topic, start two consumers and one producer, and ask: the producer wants to send a message, will this message be consumed by two consumers at the same time? ?

(1) Turn on the producer and two consumers in a consumer group
Producer

./kafka-console-producer.sh --broker-list 192.168.19.11:9092 --topic javaTopic

consumer group

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --consumer-property group.id=javaGroup --topic javaTopic

The producer sent a message.
insert image description hereConsumer 1 received the message.
insert image description hereConsumer 2 did not receive the message.
insert image description here
Continue to enter the message
insert image description hereand see that the two consumers take turns to get the message. This is because I specified 2 partitions when I created the topic before, so the produced messages are put into the two partitions in turn, and the two consumers in the consumption group each get the messages in one partition, so see It's like taking turns to consume. So what if there is only one partition?
insert image description hereinsert image description hereIf there is only one partition, only one consumer can consume the messages in this partition, which can ensure the orderliness of consumption, and only the latest consumer who joins the consumer group can consume.

Summary: If multiple consumers are in the same consumer group, only one consumer can receive messages in the subscribed topic. In other words, only one consumer in the same consumer group can receive messages from a topic.

[3] Multicast message

Start the producer and the two consumers in the two consumer groups

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --consumer-property group.id=javaGroup01 --topic javaTopic
./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092 --consumer-property group.id=javaGroup02 --topic javaTopic

insert image description hereinsert image description hereinsert image description here
It can be seen that the consumers of two different consumer groups have obtained the messages sent by the producer at the same time

Summary: Different consumption groups subscribe to the same topic, so only one consumer in different consumption groups can receive the message. In fact, multiple consumers in multiple consumer groups received the same message.

insert image description here

【4】View consumer groups and information

# 查看当前主题下有哪些消费者组
./kafka-consumer-groups.sh --bootstrap-server 192.168.19.11:9092 --list

insert image description here

./kafka-consumer-groups.sh --bootstrap-server 192.168.19.11:9092 --describe --group javaGroup

insert image description here

At this time, all consumers are closed, and then the producer continues to send messages, and then look at the consumer group information, and you can see that the amount of LAG is increasing

[5] The concept of topics and partitions

(1) Topic

Topic Topic is a logical concept in Kafka, and Kafka classifies messages through topic. Different topics will be consumed by consumers who subscribe to the topic.

But there is a problem. If there are so many messages on this topic, it will take several T to store them, because the messages will be saved in the log file. In order to solve the problem of too large files, kafka proposed the concept of partition

(2) Partition Partition

There are several advantages to storing messages in a topic by partition:
1-Partitioned storage can solve the problem of excessively large files in unified storage
2-Provide read and write throughput: read and write can be performed simultaneously in multiple Partition

insert image description here

# 这里就创建了两个javaTopic分区
./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --create --topic javaTopic --partitions 2 --replication-factor 1
# 查看当前主题的信息
./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 -topic testTopic

(4) Build a kafka cluster (3 brokers)

【1】Building process

Enter the config directory and copy two copies of server.properties

cp server.properties server01.properties
cp server.properties server02.properties

insert image description here
The main modifications are as follows:
(1) server.properties

broker.id=0
listeners=PLAINTEXT://192.168.19.11:9092
advertised.listeners=PLAINTEXT://192.168.19.11:9092
log.dirs=/root/kafka_2.12-3.1.0/data/kafka-logs

(2)server01.properties

broker.id=1
listeners=PLAINTEXT://192.168.19.11:9093
advertised.listeners=PLAINTEXT://192.168.19.11:9093
log.dirs=/root/kafka_2.12-3.1.0/data/kafka-logs-1

(3)server02.properties

broker.id=2
listeners=PLAINTEXT://192.168.19.11:9094
advertised.listeners=PLAINTEXT://192.168.19.11:9094
log.dirs=/root/kafka_2.12-3.1.0/data/kafka-logs-2

Use the following command to start 3 servers

./zookeeper-server-start.sh -daemon ../config/zookeeper.properties
./kafka-server-start.sh -daemon ../config/server.properties &
./kafka-server-start.sh -daemon ../config/server01.properties &
./kafka-server-start.sh -daemon ../config/server02.properties &

【2】Test process

(1) The concept of copy

A copy is a backup of a partition. In the cluster, different replicas will be deployed on different brokers, for example: create 1 topic, 2 partitions, 3 replicas

./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --create --topic replicatedTopic --partitions 2 --replication-factor 3

View topic details

./kafka-topics.sh --bootstrap-server 192.168.19.11:9092 --describe --topic replicatedTopic

insert image description hereInformation that can be obtained from the results:
(1) The replicatedTopic topic has two partitions, and each partition has 3 copies, which are placed on 3 servers respectively.
(2) The partition number of the first partition is 0, and the leader of the copy It is copy 1, and other copies are followers
(3) The partition number of the second partition is 1, the leader of the copy is copy 0, and the other copies are followers
(4) The producer will only send messages to the leader copy of each partition , that is, only one server will send a message, and the copies on the other two servers will synchronize the information of the leader copy as a backup. In the same way, consumers will only consume messages from the leader copy.
(5) Isr: The broker nodes that can be synchronized and the broker nodes that have been synchronized are stored in the Isr set, indicating all broker nodes that can normally participate in synchronization. If the efficiency of replica synchronization on a broker is particularly poor, the broker will be deleted from the ISR set by the cluster, and it will not be synchronized in the next synchronization. If the leader copy hangs, a follower will be selected from the Isr set as the new leader

insert image description here
The summary is: the data of one topic can be split into multiple partitions, and multiple copies of each partition can be created and placed in different brokers

insert image description here
insert image description hereinsert image description hereinsert image description here

(2) Production of cluster messages

./kafka-console-producer.sh --broker-list 192.168.19.11:9092,192.168.19.11:9093,192.168.19.11:9094 --topic replicatedTopic

(3) Consumption of cluster messages

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092,192.168.19.11:9093,192.168.19.11:9094 --from-beginning --topic replicatedTopic

(4) Consumption of cluster consumption groups

./kafka-console-consumer.sh --bootstrap-server 192.168.19.11:9092,192.168.19.11:9093,192.168.19.11:9094 --from-beginning --consumer-property group.id=replicatedGroup --topic replicatedTopic

insert image description here

(1) The messages in a copy can only be consumed by one consumer in a consumer group, so that the order of consumption can be guaranteed and will not be disrupted by other consumers. So how to achieve the total order of consumption?
(2) A consumer in a consumer group can consume messages in multiple copies. If a consumer hangs up, the rebalance mechanism will be triggered, and other consumers can top up to consume the partition. (3) Consumption in the consumer
group The number of consumers cannot be more than the number of partition copies in a topic, otherwise the extra consumers will not be able to consume information
(4) Kafka can only guarantee the partial order of message consumption within the scope of the partition, and cannot be multiple in the same topic The total consumption sequence is guaranteed in the partition.

Guess you like

Origin blog.csdn.net/weixin_44823875/article/details/128411190
Recommended