1. Basic principles of kafka

Table of Contents of Series Articles

1. Basic principles of kafka
2. Simple operation of kafka using java
3. Simple understanding of kafka design principles


Preface

It doesn’t matter if you don’t understand the Linux commands in the article. There are simple usage methods for normal use. You don’t need to remember these commands. They are just used to explain some functions of kafka.

Understood kafka

Kafka was originally developed by Linkedin. It is a distributed messaging system that supports partitions, multiple replicas, and is coordinated by zookeeper. Its biggest feature is that it can process large amounts of data in real time to meet the needs of Various demand scenarios: such as hadoop-based batch processing system, low-latency real-time system, Storm/Spark streaming engine, web/nginx log, access log, message service, etc., written in scala language, contributed by Linkedin in 2010 Given to the Apache Foundation and became a top open source project.


1. Briefly understand the basic concepts of kafka

serial number name basic concept Remark
1 Broker A kafka is a broker, and one or more brokers can form a kafka cluster.
2 Topic Kafka classifies messages according to topics, and each message published to the Kafka cluster specifies a topic. theme
3 Producer Message producer, the client that sends messages to the broker producer
4 Consumer Message consumer, a client that reads messages from the broker consumer
5 ConsumerGroup Each consumer belongs to a specific consumer group. A message can be consumed by multiple different consumer groups, but a consumer group can only have one consumer that can consume it. consumer group
6 Partition Physical concept, a topic can be divided into multiple partitions, and the internal messages of each partition are ordered. Partition

2. Use kafka

1. Install kafka

Refer to the following tutorial

VMware install kafka

2.Easy to use

  • Create a theme named test
 cd /opt/software/kafka_2.13-2.7.1
bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181  --replication-factor 1 --partitions 1 --topic test
  • Check the topics that exist in kafka
bin/kafka-topics.sh --list --zookeeper 192.168.220.66:2181
  • Send a message to the topic test
bin/kafka-console-producer.sh --broker-list 192.168.220.66:9092 --topic test

Enter a message1
Insert image description here
to consume the message (the default here is the message produced by the producer after the consumption command is executed. You can later configure the consumption from the specified location or time and other conditions)

bin/kafka-console-consumer.sh --bootstrap-server 192.168.220.66:9092 --topic test

Insert image description here
Insert image description here

3. Unicast messages

A mode in which a message can only be consumed by a certain consumer, similar to the queue mode. (Just make sure all consumers are in the same consumer group)

4. Multicast messages (publish and subscribe)

A mode in which a message can be consumed by multiple consumers, similar to the publish-subscribe mode. (Kafka can only consume the same message 同一个消费组下的某一个消费者. To implement multicast, just ensure that these consumers belong to different consumer groups.)

  • Set up consumer groups

Let the consumer group of testGroup consume the topic whose topic is test

bin/kafka-console-consumer.sh --bootstrap-server 192.168.220.66:9092 --consumer-property group.id=testGroup --topic test
  • View the consumption offset of the consumer group
bin/kafka-consumer-groups.sh --bootstrap-server 192.168.220.66:9092 --describe --group testGroup 

Insert image description here
Insert image description here

5.topic, partition, message log Log

Let’s take your time to explain the content in this picture.
Insert image description here
A topic can have multiple partitions. Above we created a test topic with only one partition.
Let's create another topic with two partitions, test2.

bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181 --replication-factor 1 --partitions 2 --topic test2

View test2

bin/kafka-topics.sh --describe --zookeeper 192.168.220.66:2181 --topic test2

Insert image description here

Partition

  • partitionIt is a 有序message sequence. The messages produced by the producercommit log are added to a file called in order. The messages in each partition have a 唯一number, called a number offset, which is used to identify the message in a certain partition (partition).
  • Why Kafka can consume already consumed messages again is because it stores the messages in the log file, and the storage address of the log has been set during installation.
vim config/server.properties

Insert image description here

  • These log files are saved for one week by default. This can be configured in the same configuration file as the above configuration. Kafka's performance has nothing to do with the amount of message data retained, because saving a large amount of data message log information will not have any impact (of course If the disk capacity is tight, let’s talk about it separately).

Insert image description here
Each consumer works based on its own consumption progress (offset) in the commit log, that is, the offset. In Kafka, 消费offset由consumer自己来维护. Under normal circumstances, we consume the messages in the commit log one by one in order. We can also consume them through the system offset to achieve repeated consumption, skip some message consumption, etc. This means that the impact of consumers in Kafka on the cluster is very small. Adding or reducing a consumer will have no impact on the cluster or other consumers 每个consumer维护各自的消费offset.

为什么要对Topic下数据进行分区存储?

  • The commit log
    file will be limited by the file system size of the machine. After partitioning, different partitions can be stored on different machines, which is equivalent to distributed storage of data. In theory, a topic can handle any amount of data.

  • To improve parallelism.
    Partitions in topics can be expanded.

6.Leader、Replicas、Isr

Insert image description here

  • Leader: The node is responsible for all read and write requests for the specified partition.
  • Replicas: Indicates which brokers a certain partition has backups on (there are several replicas). Regardless of whether these nodes are leaders or not, or even if this node is down, they will still be listed.
  • Isr: It is a subset of replicas. It only lists the nodes that are still alive and have synchronized the backup of the partition. (I understand it to be a healthy node, a node that can be changed from follower to leader at any time)

A broker may not reflect much, look at the kafka cluster below.
Build a kafka cluster
and create a topic called test4. The number of replicas is three and the number of partitions is two.

bin/kafka-topics.sh --create --zookeeper 192.168.220.66:2181 --replication-factor 3 --partitions 2 --topic test4

Query this topic

bin/kafka-topics.sh --describe --zookeeper 192.168.220.66:2181 --topic test4

Insert image description here
Insert image description here
If the leader node fails, an election will be conducted through zookeeper to select a new leader.
Simulate
the topic information before Kafka with broker ID 1 hangs up.
Insert image description here
In order to distinguish which process is the broker's ID 1, I restarted a virtual machine, then started the broker, kafka with ID 1, and then queried the process.

bin/kafka-server-start.sh -daemon config/server-1.properties 
jps

Insert image description here

kill 2286

Insert image description here

3. Briefly understand the official documents

The installed kafka version is 2.7.
If you forget the kafka version you installed, I mentioned before that the last part of the kafka you downloaded represents your kafka version.
Insert image description here

Corresponding link
Insert image description here

Insert image description here
My English is not very good, but I suggest that if you want to master kafka in depth, you have time to read more official documents.
Insert image description here
Insert image description here


Summarize

These commands can be found in the official documentation, including detailed information for development using java. For example, introduce those dependencies and so on. No matter how good the tutorial is, we ultimately have to be able to understand the official documents ourselves. One hundred people have one hundred Hamlets. Maybe you understand Kafka more thoroughly than I do.

Guess you like

Origin blog.csdn.net/xiaobai_july/article/details/127716390