In-depth understanding of data reliability assurance of Kafka architecture

In order to ensure that the data sent by the producer can be reliably sent to the specified topic, after each partition of the topic receives the data sent by the producer, it needs to send an ack (acknowledgement confirmation) to the producer. If the producer receives an ack, it will Send the next round, otherwise resend the data.

1. ack response mechanism

Kafka provides users with three levels of reliability, and users can weigh them according to their requirements for reliability and latency.

  1. 0 The
    producer does not wait for the ack of the broker. This operation provides the lowest delay. The broker returns as soon as the receipt is received without writing to the disk. When the broker fails, data may be lost.

  2. 1 The
    producer waits for the ack of the broker, and the leader of the partition returns ack after successfully writing to the disk. If the leader fails before the follower synchronization is successful, the data will be lost;

  3. (-1) All
    producers wait for the ack of the broker, and the leader and follower of the partition have all written to the disk before returning the ack. But if the leader fails after the follower synchronization is completed and before the broker sends an ack, it will cause data duplication. Under extreme conditions also can cause loss of data, that ISR data set only in the case of a leader, because the other follower behind schedule length of time consumption ( replica.lag.time.max ) is kicked out of ISR, then The leader returns ack after writing to the disk, and then suddenly goes down. This situation will cause data loss.

The above copies are described in the ISR data set

2、Exactly Once

  1. At Least Once
    sets the ACK level of the server to -1 to ensure that no data will be lost between the Producer and the Server, that is, the semantics of At Least Once.

  2. At Most Once
    sets the server ACK level to 0 to ensure that each message from the producer will only be sent once, that is, the semantics of At Most Once.

At Least Once can guarantee that the data is not lost, but it cannot guarantee that the data is not repeated; On the contrary, At Most Once can guarantee that the data is not repeated, but it cannot guarantee that the data is not lost. But for some very important information, it is required that the data is neither duplicated nor lost, that is, Exactly Once semantics. Kafka prior to version 0.11 can do nothing about this. It can only ensure that data is not lost, and downstream consumers will globally de-duplicate data. In the case of multiple downstream applications, each needs to be individually deduplicated globally, which has a great impact on performance. Kafka version 0.11 introduced a major feature: idempotence.

  1. Idempotence The
    so-called idempotence means that no matter how many times the Producer sends repeated data to the server, the server will only persist one. Idempotence combined with At Least Once semantics constitutes Kafka's Exactly Once semantics. That is: At Least Once + idempotence = Exactly Once. To enable idempotence, you only need to set enable.idompotence in the parameter of Producer to true.

The realization of Kafka's idempotence is actually to de-replace the original downstream needs to the data upstream. The Producer with idempotence turned on will be assigned a PID when it is initialized, and the message sent to the same Partition will be accompanied by a Sequence Number. The Broker will cache <PID, Partition, SeqNumber>. When a message with the same primary key is submitted, the Broker will only persist one.

If the Producer restarts, then the PID will change, so idempotence cannot guarantee Exactly Once across sessions; when the Producer sends messages to different partitions, the Producer goes down just when the data is sent to the last partition, and when the Producer restarts After that, the PID has changed, so data still needs to be sent to the broker, which will cause data inconsistency, so idempotence cannot guarantee Exactly Once across partitions. It can only guarantee Exactly Once under single partition and single session.

3. Producer transaction

From the above we can know that idempotence cannot guarantee Exactly Once across partitions and sessions. Kafka has introduced transaction support since version 0.11. Transactions can ensure that Kafka is based on Exactly Once semantics. Production and consumption can cross partitions and sessions, and either all succeed or all fail.

In order to achieve cross-partition and cross-session transactions, it is necessary to introduce a globally unique Transaction ID, and bind the PID obtained by the Producer with the Transaction ID. In this way, when the Producer is restarted, the original PID can be obtained through the ongoing TransactionID. To manage Transaction, Kafka introduced a new component Transaction Coordinator. The Producer obtains the task status corresponding to the Transaction ID by interacting with the Transaction Coordinator. The TransactionCoordinator is also responsible for writing all transactions to an internal topic of Kafka, so that even if the entire service restarts, as the transaction status is saved, the transaction status in progress can be restored and proceed.

Guess you like

Origin blog.csdn.net/qq_42599616/article/details/107216573