Analysis of Kafka producer ack mechanism

Insert picture description here
Kafka has two very important configuration parameters, acksand min.insync.replicas. Among them acksare the configuration parameters of the producer and the configuration parameters min.insync.replicasof the Broker. These two parameters play a great role in preventing the producer from losing data. Next, this article will show Explain the meaning and usage of these two parameters in a way. Through this article, you can learn:

  • Kafka partition copy
  • What is in-sync replicas
  • What is the acks confirmation mechanism
  • What is the smallest synchronized copy
  • How ack=all and the smallest synchronized copy work

Partition copy

Kafka's topic can be partitioned, and multiple replicas can be configured for the partition, and the configuration can be achieved through replication.factorparameters. There are two types of partition replicas in Kafka: Leader Replica and Follower Replica, When each partition is created, a replica must be elected as the leader replica, and the remaining replicas automatically become follower replicas. In Kafka, follower replicas do not provide external services, that is, any follower replica is Cannot respond to consumer and producer read and write requests. All requests must be processed by the leader copy. In other words, all read and write requests must be sent to the broker where the leader copy is located, and the broker is responsible for processing. The follower copy does not process client requests. Its only task is to asynchronously pull messages from the leader copy and write it to its commit log, thereby achieving synchronization with the leader copy.

Kafka's default replication factor is 3, that is, each partition has only 1 leader copy and 2 follower copies. The details are shown in the following figure:

Insert picture description here

As mentioned above, the producer client only writes to the leader broker, and the follower replicates data asynchronously. Since Kafka is a distributed system, there is bound to be the risk of not being able to synchronize with the leader in real time, so a method is needed to determine whether these followers have kept up with the leader, that is, whether the followers have synchronized the latest data. In other words, Kafka must tell us clearly, under what conditions does the follower copy count as synchronized with the leader? This is the ISR synchronization copy mechanism described below.

In-sync replicas

In-sync replica (ISR) is called a synchronized replica. The replicas in the ISR are all synchronized with the leader, so followers that are not in the list will be considered out of sync with the leader. Then, what replica exists in the ISR What? First of all, it is clear that: the leader copy always exists in the ISR. Whether the follower copy is in the ISR depends on whether the follower copy is "synchronized" with the leader copy.

Screaming reminder: The understanding of "whether the follower copy is synchronized with the leader copy" is as follows:

(1) The synchronization mentioned above does not mean complete synchronization. It does not mean that once the follower copy lags behind the leader copy, it will be kicked out of the ISR list.

(2) Kafka's broker side has a parameter replica.lag.time.max.msthat indicates the maximum time interval between the follower copy lag and the leader copy. The default is 10 seconds. This means that as long as the follower copy lags behind the leader copy, the time interval does not exceed 10 seconds , It can be considered that the follower copy is synchronized with the leader copy, so even if the current follower copy lags behind the leader copy by a few messages, as long as the leader copy is caught up within 10 seconds, it will not be kicked out.

(3) If the follower copy is kicked out of the ISR list, the copy will be added to the ISR list again after the copy catches up with the progress of the Leader copy, so the ISR is a dynamic list, not static.

Insert picture description here

As shown in the figure above: the partition1 copy on Broker3 has exceeded the specified time and is not synchronized with the leader copy, so it is kicked out of the ISR list. The ISR at this time is [1,3].

acks confirmation mechanism

The acks parameter specifies how many partition replicas must receive the message before the producer considers the message to be written successfully. This parameter plays an important role in whether the message is lost. The configuration of this parameter is as follows:

  • acks=0, which means that the producer will not wait for any response from the server before successfully writing the message. In other words, once a problem occurs and the server does not receive the message, the producer has no way to know and the message is lost . Changing the configuration does not need to wait for the server's response, so messages can be sent at the maximum speed supported by the network, thereby achieving high throughput.

Insert picture description here

  • acks=1, means that as long as the leader partition copy of the cluster receives the message, it will send a successful response ack to the producer. At this time, the producer can consider the message to be written successfully after receiving the ack. Once the message cannot be written Into the leader partition copy (such as network reasons, leader node crash), the producer will receive an error response, when the producer receives the error response, in order to avoid data loss, will resend the data. The throughput of this method depends on Whether to use asynchronous transmission or synchronous transmission.

    Screaming reminder: If the producer receives an error response, even if the message is re-sent, there may still be data loss. For example, if a node that has not received the message becomes a new leader, the message will be lost.

Insert picture description here

  • acks =all, which means that only when all the nodes participating in the replication (copy of the ISR list) all receive the message, the producer will receive the response from the server. This mode is the highest level and the safest, which can ensure more than A broker has received the message. The delay in this mode will be very high.

Insert picture description here

Smallest synchronized copy

As mentioned above, when acks=all, all replicas need to be synchronized to send a successful response to the producer. In fact, there is a problem: what happens if the leader replica is the only synchronized replica? At this time, it is equivalent to acks=1. So it is not safe.

Kafka's Broker side provides a parameter ** min.insync.replicas**, this parameter controls at least how many copies of the message are written to be considered "real write", the default value of this value is 1, and the production environment is set to a value greater than 1. The value of can improve the durability of the message. Because if the number of synchronized replicas is lower than the configured value, the producer will receive an error response to ensure that the message is not lost.

Case 1

As shown in the figure below, when min.insync.replicas=2 and acks=all, if only [1,2],3 in the ISR list is kicked out of the ISR list at this time, you only need to ensure that the two replicas are synchronized, and the producer will accept To a successful response.

Insert picture description here

Case 2

As shown in the figure below, when min.insync.replicas=2, if only [1], 2 and 3 in the ISR list are kicked out of the ISR list at this time, then when acks=all, the data cannot be successfully written; when acks=0 or Acks=1 can successfully write data.

Insert picture description here

Case 3

This situation is easy to cause misunderstanding. If acks=all and min.insync.replicas=2, the ISR list is [1,2,3] at this time, then it will still wait until all synchronized replicas have synchronized messages. A successful response ack will be sent to the producer. Because min.insync.replicas=2 is only a minimum limit, that is, if the synchronization copy is less than the configuration value, an exception will be thrown, and acks=all, it is necessary to ensure all ISR lists Only when the replicas are synchronized can a successful response be sent. As shown below:

Insert picture description here

to sum up

acks=0, the producer will not wait for any response from the server before successfully writing the message.

acks=1, as long as the leader partition replica of the cluster receives the message, it will send a successful response ack to the producer.

acks=all, which means that only when all nodes participating in the replication (copy of the ISR list) all receive the message, the producer will receive the response from the server. At this time, if the number of ISR synchronization copies is less than min.insync.replicasthe value, the message will not be recorded.

Guess you like

Origin blog.csdn.net/jmx_bigdata/article/details/107142748