What is kafka?

Kafka is a high-throughput distributed publish-and-subscribe messaging system. It is a distributed, partitioned, and reliable distributed log storage service.
Insert picture description here

Insert picture description here

Several concepts of Kafka

Topic

In Kafka, messages can be classified, each type of message is called a topic, and consumers can process different topics differently.

Broker

Each Broker is a Kafka service instance, and multiple Brokers form a Kafka cluster. The messages published by the producers will be stored in the Broker, and consumers will pull the messages from the Broker for consumption.
In the Broker cluster, there will be a leader (controller leader) responsible for managing the status of the partitions and replicas in the entire cluster and electing the partition leader

Producer

Responsible for producing messages and sending them to Broker.

Consumer (producer)

Responsible for consuming Topic messages in Broker, each Consumer instance belongs to a Consumer Group See more introduction

Partition

Partition is a more characteristic part of Kafka. A topic can be divided into multiple Partitions. Each Partition is an ordered queue. Each message in the Partition has an ordered offset (Offest), the same Consumer In a Group, only one Consumer instance can consume messages from a Partition.
Partion can be regarded as an ordered queue, the data inside is stored in the hard disk, appended. The function of partition is to provide distributed expansion. A topic can have many partitions, and multiple partitions can process data in parallel, so a considerable amount of data can be processed. Only the leader of the partition will perform read and write operations, and the folower only performs replication, and the client is not aware of it.

ISR

The leader will track and maintain the lag status of all followers in the ISR. If the lag is too much (the time lag replica.lag.time.max.ms is configurable), the leader will remove the replica from the ISR. The replica of the removed ISR has been catching up with the leader. As shown in the figure below, the leader will not commit after writing the data. It will only commit after all folowers in the ISR list are synchronized. Removing the lagging follower from the ISR is mainly to avoid the delay in writing messages. The ISR is set up mainly to re-elect the leader of the partition from the ISR list after the broker goes down.

2 types of leaders

From the above basic concepts, we can see that there are two kinds of leaders in the Kafka cluster, one is the leader of the broker or the controller leader, and the other is the leader of the partition. The following describes the general process of the election of the two kinds of leaders.

Controller leader

When the broker starts, the KafkaController object will be created, but there can only be one leader in the cluster to provide external services. The KafkaController on each node will create a temporary node under the specified zookeeper path, and only the first successfully created node KafkaController can become the leader, and the rest are followers. When the leader fails, all the followers will be notified and compete again to create nodes under the path to elect a new leader

Partition leader

Executed by the controller leader

Read all ISR (in-sync replicas) sets of the current partition from Zookeeper
Call the configured partition selection algorithm to select the leader of the partition

What are partitions and replicas

Let me give you an example, there are still three brokers.
Topic is a logical concept. For example, if you have 100 messages, the topic is named "A", and A has two partitions PA and PB, which means that PA stores 50 and PB stores 50, respectively, on broker1 and broker2, so you Has the news of "been broken up, and will not be placed on a single machine, so does the load of a single machine become smaller?

Next, if broker1 is down, all partitions on it are also down, and your 50 messages are lost. In order not to lose messages, you add a copy of the partition, so that PA and PB synchronize 50 messages on broker2 and broker3 respectively. When broker1 goes down again, the partition leader cuts to the replica partition on broker3, and the message is still there.

Therefore, partition to break up your messages, allowing you to load balance huge messages among brokers, and copy partitions to synchronize messages to ensure your reliability.

Introduction to basic concepts of Kafka

A brief introduction to Kafka