Apache Kafka(五)- Safe Kafka Producer

Kafka Safe Producer

Kafka application scenarios need to be considered when an abnormality occurs (e.g., network abnormality), the message is transmitted there may be missing, out of order, and repeating messages.

For these cases, we can create a "safe producer", used to circumvent these problems. Below we will first introduce and explain the configuration for these types of situations. Finally, an example configuration.

 

1. acks detail

Before we introduced Kafka Producer of acks has three modes, further below we introduce these three modes:

1.1.  acks = 0(no acks

= 0 use acks, which means:

  • After sending a message, need not response
  • If the broker off the assembly line or a failure occurs, we will not know, and will lose data because broker does not return any response to the producers

acks = 0 operation mode as shown below, do not need to receive any ack:

 

The general use of acks = 0 scenario is: can accept data may be lost. E.g:

  • Index information collection
  • Collecting log (log data pharmaceutically few occasional loss)

 

1.2.  acks=1(leader acks

Use acks = 1 when:

  • producer need to get response leader, in order to confirm that a message has been received. But replication is not whether it has received assurance (performs replication in the background)
  • If the producer does not receive ack, it may retry

acks = 1 works in the following figure, Producer need to receive each ack message:

 

  • If the leader broker offline or fails, but also replicas of data sent is not copied, then we will have data loss
  • This is the default mode

 

1.3.  acks=all(replicas acks

When using acks = all:

  • Leader needs and Replicas of ack
  • Increased latency and higher "data is not lost" Security
  • If there is enough replicas, there will be no loss of data

acks = all one works as follows, each replica will need to reply ack, in order to ensure a write writes:

 

 

If you need to completely lose data, then this setting is necessary to consider.

When provided acks = all (i.e. replica acks), it must be used with other parameters, namely: min.insync.replicas:

  • min.insync.replicas broker level parameter may be provided in or topic level setting (the override)
  • min.insync.replicas = 2 is represented by: at least two brokers are ISR (in-sync replicas) (including Leader), and they must respond to represent data, otherwise an error is returned. This parameter set 2 is the most common configuration.

Assumed setting parameter replication.factor = 3, min.insync = 2, acks = all, only a maximum allowable exception broker 1 a, when data is transmitted or producer will receive an error.

Suppose there are three brokers, min.insync.replicas = 2, wherein there are two broker if abnormal, the Producer receives "NOT_ENOUGH_REPLICAS" exception.

 

2. Producer Retry

In order to prevent some transient errors (for example NotEnoughReplicasException) affect the entire application, in general we need to deal with some exceptions, to avoid data loss. Also in Producer retry configuration, the default is 0, its value can be adjusted manually, can be up to Integer.MAX_VALUE.

In retry, by default, it will cause a message is sent out of order. Because the general message transmission failure may be re-queue, and transmitting again, the message will cause some disorder.

In this case key-based message sequence, particularly serious. Because all messages with the same key will be sent to the same partition, and if there is to requeue the message, and then retransmitted, the partition will be part of the scramble key.

In this case, we can set the parameters max.in.flight.requests.per.connection control: the same time, the number of produce requests can be initiated in parallel:

  • The default is 5
  • If in order to ensure complete message retries can maintain strict and orderly, you can set this parameter to 1 (but may affect throughput)

 But Kafka> = 1.0.0 in for this scene will be a better solution, part of this will be mentioned later.

 

3. Idempotent Producer

In the retransmission scene, we encountered a common problem: Due to the network, Producer introduce in duplicate messages in Kafka.

As shown below:

 

A request is a normal request:

  1. Producer send a message to Kafka
  2. Kafka commit this message
  3. Kafka send back ack Producer

However, generating a request procedure is repeated message:

  1. Producer send a message to Kafka
  2. Kafka commit this message
  3. Kafka send ack back to Producer, due to network reasons, ack does not reach the end of Producer
  4. After some time Producer retransmit message
  5. Kafka commit this duplicate ack message and returns to the Producer

From Producer perspective, it is only a normal transmission message, because it received a ack. From Kafka's point of view, it received two messages, so commit twice.

After Kafka> = 0.11, you can define a "idempotent producer", can solve the duplicate messages caused by network problems. As shown below:

For an idempotent producer, the process repeats the processing request message is:

  1. Producer send a message
  2. Kafka receives the message and commit
  3. Kafka sent back to the Producer ack not arrive due to network problems Producer
  4. Producer重试发送消息,在Producer>=0.11 的版本中,消息里会带上一个produce request id
  5. Kafka收到消息后,通过对比produce request id,可以辨别出这条消息是一条重复的消息,所以不会再次commit,并会再次发送一个ack

Idempotent producers可以很好的保证一个稳定,以及无重复数据的pipeline。

伴随Idempotent producers一起被设置的参数有:

  1. retries = Integer.MAX_VALUE (2^31-1 = 2147483647),也就是说基本会在出错时无限重传
  2. max.in.flight.requests=1 (Kafka >= 0.11 & < 1.1) ,也就是说在这些版本中,若是设置max.in.flight.requests > 1时仍会有可能产生乱序数据
  3. 或者 max.in.flight.requests=5 (Kafka >= 1.1 –> High Performance),也就是说在高于1.1版本的Kafka中,设置max.in.flight.requests=5也可以在保证不乱序的同时,保证并行的高性能

 

对于Idempotent Producer的配置,仅需配置类似以下参数即可:

properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");

 

4. Safe Producer 配置总结

上面介绍了创建一个safe producer 所需的配置,下面我们总结一下在不同版本的Kafka中所需要做的配置:

Kafka < 0.11

  • ack=all (procuder level):确保在发送ack前,数据已经正常备份
  • min.insync.replicas=2 (broker/topic level):确保至少有两个in ISR 的brokers 有数据后再回送ack
  • retires=MAX_INT (producer level):确保在发生瞬时问题时,可以无限次重试
  • max.in.flight.requests.per.connection=1 (producer level):确保每次仅有一个请求发送,防止在重试时产生乱序数据

 

Kafka >= 0.11

  • enable.idempotence=true (producer level) + min.insync.replicas=2 (broker/topic level)
    • 隐含的配置为 acks=all, retries=MAX_INT, max.in.flight.requests.per.connection=5 (default)
    • 可以在保证消息顺序的同时,提高performance

这里必须要提到的是:运行一个“safe producer”可能会影响系统的throughput与latency,所以在应用到生产系统前,必须先做测试以判断影响。

 

5. safe producer 示例

我们按照之前的步骤启动一个由Java编写的Kafka Producer,并查看输出的配置,可以看到默认的部分参数为:

acks = 1

enable.idempotence = false

max.in.flight.requests.per.connection = 5

retries = 2147483647

 

现在我们显式地加上以下参数:

// create a safe Producer
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");
properties.setProperty(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));

 

然后查看producer配置的部分输出为:

 

 

 

以上为创建一个safe producer所需的配置介绍以及示例,在实际生产环境中,务必要先测试safe producer 对应用吞吐以及延时的影响后,再斟酌是否有必要对参数做部分调整。

 

Guess you like

Origin www.cnblogs.com/zackstang/p/11409014.html