[Kafka from entry to abandonment series four] Kafka architecture in-depth-producer strategy

The previous blog gave a general understanding of Kafka’s basic workflow and storage mechanism. In fact, it was thought from the unique perspective of distributed systems. Then this blog introduces Kafka’s producer-related strategies. In the final analysis, it also supports Distributed system characteristics [ high scalability, high availability, high concurrency, mass storage ] perspective.

Partition strategy

Kafka has N replicas for each topic partition, where N (greater than or equal to 1) is the number of topic replication factors (replica fators). Kafka implements automatic failover through a multiple copy mechanism. When a broker fails in the Kafka cluster, the copy mechanism can ensure service availability. For any partition, among its N replicas, one of the replicas is the leader and the others are followers . The leader is responsible for processing all read and write requests of the partition, and the follower is responsible for passively replicating data on the leader.
Insert picture description here

Reason for partition

Why partition? Previous blog [Kafka from Getting Started to Abandoning Series Three] Kafka architecture is in-depth-the workflow and storage mechanism have been introduced in detail, let's emphasize again:

  • High scalability : It is convenient to expand in the cluster. Each Partition can be adjusted to fit the machine where it is located, and a topic can be composed of multiple Partitions, so the entire cluster can adapt to data of any size
  • High concurrency : Concurrency can be improved, because it can be read and written in the unit of Partition, and messages can be sent to multiple Partions of a topic concurrently
  • High availability : Of course, with high availability and high scalability, we also hope that the entire cluster is stable, and messages will not be lost under concurrent conditions. In order to ensure the reliability of data, we have multiple copies of each partition to ensure No message is lost. If the broker where the leader is located fails or goes down, the corresponding partition will not be able to process client requests because there is no leader. At this time, the role of the copy is reflected: a new leader will be elected from the follower and continue Handling client requests

As shown in the figure above, the characteristics of our distributed cluster can be reflected. In fact, not only Kafka, but all distributed middleware will have this concept. For example, ElasticSearch also has node nodes, indexing, sharding, and replication. Corresponding to Kafka's broker, topic, partition, and replica are all connected to Belden.

Principle of Zoning

The producer uses the push mode to publish messages to the broker, and each message is appended to the patition, which is a sequential write to the disk (sequential write to the disk is more efficient than random writes to memory and guarantees the throughput rate of Kafka). When the producer sends a message to the broker, since there are partitions, how do we know which partition the producer’s message should be sent to? The producer will choose which partition to store it in according to the partition algorithm .
Insert picture description here
From the code structure, we can see that it can actually be summarized into three methods, that is, three routing mechanisms to determine which partition the message is sent to. They are:

  1. In the case of specifying the partition, directly use the specified value as the partition value;
  2. If the partition value is not specified but there is a key, take the remainder of the hash value of the key and the partition number of the topic to obtain the partition value
  3. When there is neither a partition value nor a key value, an integer is randomly generated at the first call (increment on this integer for each subsequent call), and this value and the total number of partitions available for the topic are taken to obtain the partition value. It is the often-called round-robin algorithm [polling algorithm].

After understanding how messages are sent to partitions, we have solved high scalability and high concurrency . We also need to think about a problem, how to ensure high scalability, that is, how to transmit data reliably.

Data reliability guarantee

To ensure that the data sent by the producer can be reliably sent to the specified topic, each partition of the topic needs to send an ack (acknowledgement confirmation) to the producer after receiving the data sent by the producer . If the producer receives the ack, it will Send the next round, otherwise resend the data.
Insert picture description here

ACK response mechanism [ACK sending timing]

Kafka provides users with three levels of reliability. Users can weigh the reliability and latency requirements. When the producer sends data to the leader, the data reliability level can be set through the request.required.acks parameter:

  1. request.required.acks = 0, the producer continuously sends data to the leader without the leader's feedback of success messages. In this case, the data transmission efficiency is the highest, but the data reliability is indeed the lowest. Data may be lost during the sending process, or data may be lost when the leader is down. [ Highest transmission efficiency, lowest reliability ]

  2. request.required.acks = 1, this is the default situation, that is: the producer sends data to the leader, the leader writes the local log successfully, and returns to the client successfully; at this time, other copies in the ISR have not had time to pull the message. If the leader is down, the message sent this time will be lost. [In transmission efficiency, in reliability ]
    Insert picture description here

  3. request.required.acks = -1 (all), the producer sends data to the leader. After receiving the data, the leader waits until all the copies in the ISR list have synchronized the data (strong consistency) before returning a success message to the producer. If the success message has not been received, the data will be automatically resent if it is considered that the data has failed to be sent. This is the most reliable solution, of course, performance will also be affected. [ Low transmission efficiency and high reliability ] At the same time, if the leader fails after the follower synchronization is completed before the broker sends an ack, it will cause data duplication
    Insert picture description here

When request.required.acks = -1, you need to pay attention. If you want to improve the reliability of the data, while setting request.required.acks=-1, you also need the parameter min.insync.replicas to cooperate, so as to maximize the effect . min.insync.replicas This parameter is used to set the minimum number of replicas in the ISR. The default value is 1. This parameter will take effect if and only when the request.required.acks parameter is set to -1. When the number of replicas in the ISR is less than the number configured in min.insync.replicas, the client will return an exception: org.apache.kafka.common.errors.NotEnoughReplicasExceptoin: Messages are rejected since there are fewer in-sync replicas than required. By setting the parameter min.insync.replicas to 2, when the actual number of replicas in the ISR is 1 (only the leader), reliability cannot be guaranteed, because if the leader goes down after sending an ack, then the message will be Loss, so the client's write request should be rejected to prevent message loss .

Replica synchronization strategy [ACK sending condition]

So how many foller copies are synchronized before sending an ack? The two existing solutions choose the second one. The first one occupies too much machine resources, causing a lot of data redundancy, and the network delay has little effect on Kafka.
Insert picture description here

ISR election strategy

After adopting the full replica synchronization scheme, the timing of sending ack is determined as follows: the leader receives the data, and all the followers start to synchronize the data , but imagine the following situation: there is a follower, because of some kind of failure, the delay can not synchronize with the leader, then The leader has to wait until it completes synchronization before sending an ack. How to solve this problem? We introduce the concept of ISR

  • All copies (replicas) are collectively referred to as Assigned Replicas, or AR
  • ISR is a subset of AR. The leader maintains the ISR list. Followers have some delay in synchronizing data from the leader (the timeout threshold is set by the parameter replica.lag.time.max.ms ). Followers that exceed the threshold will be excluded from the ISR. deposit OSR (outof-Sync Replicas) list, the newly added follower will be first stored in the OSR in
  • AR=ISR+OSR , that is, all copies = available copies + backup copies .
  • The ISR list includes: leader + followers that are synchronized with the leader. After the leader fails, a new leader will be elected from the ISR

Under this mechanism, the ISR is always a dynamic and stable cluster. After the message comes, the leader reads it first, and then pushes it to each follower to ensure that each copy in the ISR is in a synchronized state. After the leader hangs up, it can immediately get from the ISR Elect a new leader to process the message.

Failure handling mechanism [guarantee copy synchronization]

In the data reliability assurance strategy, we have learned how to ensure the reliability of messages through partitions and replicas, as well as dynamic ISR and ack mechanisms. Then we will discuss in depth , how do we restore the cluster to normal when a failure occurs ?

Basic concepts of HW&LEO

Insert picture description here
The message flow process between HW and LEO is as follows:
Insert picture description here
Kafka's replication mechanism is neither completely synchronous replication nor pure asynchronous replication. In fact, synchronous replication requires that all working followers have replicated before this message will be committed. This replication method is limited to the slowest follower, which will greatly affect the throughput rate. In the asynchronous replication mode, the follower replicates the data from the leader asynchronously. As long as the data is written to the log by the leader, it is considered to have been committed. In this case, if the follower has not replicated yet and is behind the leader, the leader suddenly goes down. Data will be lost and reliability will be reduced. Kafka's strategy of using ISR has achieved a good balance between reliability and throughput [ synchronize replication and kill slow replicas ]

Failure synchronization mechanism

When different machines fail, let’s take a look at how ISR handles clusters and messages, which are divided into follower failure and leader failure:

  • If the follower fails , the follower will be temporarily kicked out of the ISR after it fails. After the follower recovers, the follower will read the last HW recorded on the local disk, and intercept the part of the log file higher than the HW, and start from the HW from the leader Synchronize. After the follower's LEO is greater than or equal to the partition's HW, that is, after the follower catches up with the leader, you can rejoin the ISR.
  • The leader fails . After the leader fails , a new leader will be selected from the ISR. After that, in order to ensure the data consistency between multiple copies, the remaining followers will first cut off the parts of their log files higher than HW , And then synchronize data from the new leader.

All in all, the latest HW that is synchronized with all copies shall prevail . But this is only a processing method, and does not guarantee that the data will not be repeated or lost. Let's look at a case of data duplication: Leader is down : Consider a scenario: acks=-1, part of the ISR copy completes synchronization, at this time the leader Hang up, as shown in the following figure: follower1 synchronizes messages 4 and 5, follower2 synchronizes message 4, and at the same time, follower2 is elected as the leader.
Insert picture description here
In this way, the phenomenon of data duplication occurs, so the HW&LEO mechanism can only ensure that the copies are synchronized, but cannot guarantee that the data is not repeated or lost. If you want to ensure all, you need to combine the ACK level to eat

Leader election

In the event of a possible failure, when the leader hangs, we need to elect a new leader, following the following strategy: Kafka dynamically maintains an ISR for each partition in ZooKeeper, and all replicas in this ISR are synchronized with the leader , Only members in ISR can be elected as leader.

Of course there are extreme cases : when the ISR least one follower (ISR including the leader), Kafka can be sure that you have commit message is not lost, but if all replica one partition are hung up, naturally, can not guarantee that data is not lost. How to conduct leader election in this case? There are usually two options:

  • Wait for any replica in the ISR to recover and choose it as the leader [ high reliability ]
  • Select the first recovered replica (not necessarily in the ISR) as the leader [ high availability ]

If you must wait for the replica in the ISR to recover, the unavailable time may be relatively long. And if all the replicas in the ISR cannot be recovered or the data is lost, this partition will never be available. Select the first recovered replica as the leader. If this replica is not the replica in the ISR, then it may not have all the committed messages, resulting in message loss. By default, Kafka uses the second strategy, unclean.leader.election.enable=true, or you can set this parameter to false to enable the first strategy

Exactly Once semantics

After understanding the failure recovery mechanism to ensure synchronization between replicas and the ACK mechanism to ensure data reliability , let's discuss how to ensure the idempotence of data transmission.

  • Setting the ACK level of the server to -1 can ensure that no data will be lost between the Producer and the Server, that is, the semantics of AtLeast Once. At Least Once can ensure that the data is not lost, but it cannot guarantee that the data is not repeated
  • Setting the server ACK level to 0 can ensure that each message of the producer will only be sent once, that is, the semantics of At Most Once

For some very important information, consumers require that the data is neither repeated nor lost, that is, Exactly Once semantics. Kafka before version 0.11 can do nothing about this. It can only ensure that the data is not lost, and then downstream consumers will globally de-duplicate the data. In the case of multiple downstream applications, each needs to be individually deduplicated globally, which has a great impact on performance.

Idempotence

Kafka version 0.11 introduced a major feature: idempotence. The so-called idempotence means that no matter how many times the Producer sends repeated data to the server, the server will only persist one. Idempotence combined with At Least Once semantics constitutes Kafka's Exactly Once semantics. That is: At Least Once + idempotence = Exactly Once

To enable idempotence, you only need to set enable.idompotence in the parameter of Producer to true. The realization of Kafka's idempotence is actually to de-replace the original downstream needs to the data upstream. The Producer with idempotence turned on will be assigned a PID when it is initialized, and the message sent to the same Partition will be accompanied by a Sequence Number. The Broker will cache <PID, Partition, SeqNumber>. When a message with the same primary key is submitted, the Broker will only persist one. But the PID will change when restarted, and different Partitions also have different primary keys, so idempotence cannot guarantee Exactly Once across partitions and sessions .

Producer affairs

In order to achieve cross-partition cross-session data transaction and prevent duplicate PID caused by the restart, the need to introduce a Topic globally unique Transaction ID, and bind the PID Producer acquired ID and Transaction . In this way, when the Producer is restarted, the original PID can be obtained through the ongoing TransactionID. In order to manage Transaction, Kafka introduced a new component Transaction Coordinator. The Producer obtains the task status corresponding to the Transaction ID by interacting with the Transaction Coordinator. The Transaction Coordinator is also responsible for writing all transactions to an internal topic of Kafka, so that even if the entire service is restarted, since the transaction status is saved, the ongoing transaction status can be restored. So as to continue.

to sum up

This blog describes in detail Kafka's producer strategy, from the partitioning mechanism to the data reliability mechanism, to the failure recovery mechanism, and finally how to implement the exact once semantics of the message. It feels that Kafka's main strategy is still concentrated on the producer side. It is more complicated to understand. However, it is beneficial to exchange resources with scheduling. It is beneficial to save resources through complex scheduling.

Part of the content is quoted from https://gitbook.cn/books/5ae1e77197c22f130e67ec4e/index.html

Guess you like

Origin blog.csdn.net/sinat_33087001/article/details/108397968