The basic structure of Kafka and idempotence and transactions


insert image description here

broker

  • A Kafka cluster usually consists of multiple brokers to achieve load balancing and fault tolerance
  • Brokers are stateless (Sateless), they maintain the cluster state through ZooKeeper
  • A Kafka broker can handle hundreds of thousands of reads and writes per second, and each broker can handle TB messages without affecting performance

zookeeper

  • ZK is used to manage and coordinate brokers, and stores Kafka metadata (for example: how many topics, partitions, consumers)
  • The ZK service is mainly used to notify producers and consumers that there is a new broker joining in the Kafka cluster, or a broken broker in the Kafka cluster.

Kafka has now stripped ZooKeeper, because the cost of maintaining two sets of clusters is high, the community proposed that KIP-500 is to replace the dependency of ZooKeeper.

producer

  • The producer is responsible for pushing data to the topic of the broker

consumer

  • Consumers are responsible for pulling data from the broker's topic and processing it themselves

consumer group

  • Consumer group is a scalable and fault-tolerant consumer mechanism provided by Kafka
  • A consumer group can contain multiple consumers
  • A consumer group has a unique ID (group Id)
  • Consumers in the group consume all partition data of the topic together

Partitions

  • In a Kafka cluster, a topic (Topic) is divided into multiple partitions

Replicas

  • Replicas ensure that data is still available when a server fails
  • In Kafka, the number of copies is generally designed to be > 1

Subject (Topic)

  • A topic is a logical concept for producers to publish data and consumers to pull data.
  • Topics in Kafka must have identifiers and be unique. There can be any number of topics in Kafka, and there is no limit on the number.
  • The messages in the topic are structured, generally a topic contains a certain type of message
  • Once a producer sends messages to a topic, those messages cannot be updated (changed)

offset (offset)

  • offset records the sequence number of the next message to be sent to Consumer
  • By default Kafka stores offset in ZooKeeper
  • In a partition, messages are stored in a sequential manner, and each consumption in the partition has an increasing id. This is the offset offset.
  • Offsets are only meaningful within partitions. Between partitions, offset is meaningless

Kafka supports multiple consumers to consume data in a topic at the same time.

Leader and Follower

In Kafka, each topic can be configured with multiple partitions and multiple copies. Each partition has a leader and 0 or more followers. When creating a topic, Kafka will evenly distribute the leader of each partition to each broker. When we use kafka normally, we don't feel the existence of leader and follower. But in fact, all read and write operations are handled by the leader, and all followers copy the log data files of the leader. If the leader fails, the follower will be elected as the leader. So, let's say:

  • The leader in Kafka is responsible for processing read and write operations, while the follower is only responsible for the synchronization of replica data.
  • If the leader fails, other followers will be re-elected as the leader.
  • Like a consumer, a follower pulls data from the partition corresponding to the leader and saves it in the log data file.

Idempotency of Kafka

Take http as an example, one or more requests, the response is the same (except network timeout and other issues), in other words, the impact of executing multiple operations is the same as executing one operation.
If a system is not idempotent, if a user submits a form repeatedly, it may cause adverse effects. For example, if the user clicks the submit order button multiple times on the browser, multiple identical orders will be generated in the background.

When not idempotent:
insert image description here

The same information is saved repeatedly.
Therefore, in order to realize the idempotence of producers, Kafka introduces the concepts of Producer ID (PID) and Sequence Number.

  • PID : Each Producer is assigned a unique PID when it is initialized, and this PID is transparent to the user.
  • Sequence Number : For each producer (corresponding to PID), the message sent to the specified topic partition corresponds to a Sequence Number that increments from 0.

The Sequence Number will only be incremented after the ACK is returned and successfully received, and if the current Sequence Number is found to be less than or equal to the previous time in the broker, it will choose not to save it.
insert image description here

Kafka transactions

Kafka's transaction is a new feature introduced in Kafka 0.11.0.0 in 2017. Similar to database transactions.
Kafka transaction refers to the operation of producers producing messages and consumers submitting offsets in one atomic operation, either succeeding or failing. Especially when producers and consumers coexist, transaction protection is particularly important.

The following five transaction-related methods are defined in the Producer interface:

  1. initTransactions (initialization transaction): To use Kafka transactions, the initialization operation must be performed first;
  2. beginTransaction (start transaction): start a Kafka transaction;
  3. sendOffsetsToTransaction (commit offset): send the offset corresponding to the partition to the transaction in batches, so as to facilitate subsequent submission;
  4. commitTransaction (commit transaction): commit transaction;
  5. abortTransaction (abandon the transaction): cancel the transaction;

Guess you like

Origin blog.csdn.net/weixin_45970271/article/details/126549560