Table of Contents of Series Articles
1. Basic principles of kafka
2. Simple operation of kafka using java
3. Simple understanding of kafka design principles
Article directory
Preface
It doesn’t matter if you don’t understand the Linux commands in the article. There are simple usage methods for normal use. You don’t need to remember these commands. They are just used to explain some functions of kafka.
Understood kafka
Kafka was originally developed by Linkedin. It is a distributed messaging system that supports partitions, multiple replicas, and is coordinated by zookeeper. Its biggest feature is that it can process large amounts of data in real time to meet the needs of Various demand scenarios: such as hadoop-based batch processing system, low-latency real-time system, Storm/Spark streaming engine, web/nginx log, access log, message service, etc., written in scala language, contributed by Linkedin in 2010 Given to the Apache Foundation and became a top open source project.
1. Briefly understand the basic concepts of kafka
serial number | name | basic concept | Remark |
---|---|---|---|
1 | Broker | A kafka is a broker, and one or more brokers can form a kafka cluster. | |
2 | Topic | Kafka classifies messages according to topics, and each message published to the Kafka cluster specifies a topic. | theme |
3 | Producer | Message producer, the client that sends messages to the broker | producer |
4 | Consumer | Message consumer, a client that reads messages from the broker | consumer |
5 | ConsumerGroup | Each consumer belongs to a specific consumer group. A message can be consumed by multiple different consumer groups, but a consumer group can only have one consumer that can consume it. | consumer group |
6 | Partition | Physical concept, a topic can be divided into multiple partitions, and the internal messages of each partition are ordered. | Partition |
2. Use kafka
1. Install kafka
Refer to the following tutorial
2.Easy to use
- Create a theme named test
cd /opt/software/kafka_2.13-2.7.1
bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181 --replication-factor 1 --partitions 1 --topic test
- Check the topics that exist in kafka
bin/kafka-topics.sh --list --zookeeper 192.168.220.66:2181
- Send a message to the topic test
bin/kafka-console-producer.sh --broker-list 192.168.220.66:9092 --topic test
Enter a message1
to consume the message (the default here is the message produced by the producer after the consumption command is executed. You can later configure the consumption from the specified location or time and other conditions)
bin/kafka-console-consumer.sh --bootstrap-server 192.168.220.66:9092 --topic test
3. Unicast messages
A mode in which a message can only be consumed by a certain consumer, similar to the queue mode. (Just make sure all consumers are in the same consumer group)
4. Multicast messages (publish and subscribe)
A mode in which a message can be consumed by multiple consumers, similar to the publish-subscribe mode. (Kafka can only consume the same message 同一个消费组下的某一个消费者
. To implement multicast, just ensure that these consumers belong to different consumer groups.)
- Set up consumer groups
Let the consumer group of testGroup consume the topic whose topic is test
bin/kafka-console-consumer.sh --bootstrap-server 192.168.220.66:9092 --consumer-property group.id=testGroup --topic test
- View the consumption offset of the consumer group
bin/kafka-consumer-groups.sh --bootstrap-server 192.168.220.66:9092 --describe --group testGroup
5.topic, partition, message log Log
Let’s take your time to explain the content in this picture.
A topic can have multiple partitions. Above we created a test topic with only one partition.
Let's create another topic with two partitions, test2.
bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181 --replication-factor 1 --partitions 2 --topic test2
View test2
bin/kafka-topics.sh --describe --zookeeper 192.168.220.66:2181 --topic test2
Partition
partition
It is a有序
message sequence. The messages produced by the producercommit log
are added to a file called in order. The messages in each partition have a唯一
number, called a numberoffset
, which is used to identify the message in a certain partition (partition).- Why Kafka can consume already consumed messages again is because it stores the messages in the log file, and the storage address of the log has been set during installation.
vim config/server.properties
- These log files are saved for one week by default. This can be configured in the same configuration file as the above configuration. Kafka's performance has nothing to do with the amount of message data retained, because saving a large amount of data message log information will not have any impact (of course If the disk capacity is tight, let’s talk about it separately).
Each consumer works based on its own consumption progress (offset) in the commit log, that is, the offset. In Kafka, 消费offset由consumer自己来维护
. Under normal circumstances, we consume the messages in the commit log one by one in order. We can also consume them through the system offset to achieve repeated consumption, skip some message consumption, etc. This means that the impact of consumers in Kafka on the cluster is very small. Adding or reducing a consumer will have no impact on the cluster or other consumers 每个consumer维护各自的消费offset
.
为什么要对Topic下数据进行分区存储?
-
The commit log
file will be limited by the file system size of the machine. After partitioning, different partitions can be stored on different machines, which is equivalent to distributed storage of data. In theory, a topic can handle any amount of data. -
To improve parallelism.
Partitions in topics can be expanded.
6.Leader、Replicas、Isr
- Leader: The node is responsible for all read and write requests for the specified partition.
- Replicas: Indicates which brokers a certain partition has backups on (there are several replicas). Regardless of whether these nodes are leaders or not, or even if this node is down, they will still be listed.
- Isr: It is a subset of replicas. It only lists the nodes that are still alive and have synchronized the backup of the partition. (I understand it to be a healthy node, a node that can be changed from follower to leader at any time)
A broker may not reflect much, look at the kafka cluster below.
Build a kafka cluster
and create a topic called test4. The number of replicas is three and the number of partitions is two.
bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181 --replication-factor 3 --partitions 2 --topic test4
Query this topic
bin/kafka-topics.sh --describe --zookeeper 192.168.220.66:2181 --topic test4
If the leader node fails, an election will be conducted through zookeeper to select a new leader.
Simulate
the topic information before Kafka with broker ID 1 hangs up.
In order to distinguish which process is the broker's ID 1, I restarted a virtual machine, then started the broker, kafka with ID 1, and then queried the process.
bin/kafka-server-start.sh -daemon config/server-1.properties
jps
kill 2286
3. Briefly understand the official documents
The installed kafka version is 2.7.
If you forget the kafka version you installed, I mentioned before that the last part of the kafka you downloaded represents your kafka version.
My English is not very good, but I suggest that if you want to master kafka in depth, you have time to read more official documents.
Summarize
These commands can be found in the official documentation, including detailed information for development using java. For example, introduce those dependencies and so on. No matter how good the tutorial is, we ultimately have to be able to understand the official documents ourselves. One hundred people have one hundred Hamlets. Maybe you understand Kafka more thoroughly than I do.