Apache Kafka message delivery reliability analysis

If MQ does not have a similar database transaction structure and guarantee, it is impossible to achieve 100% reliable message delivery. In extreme cases, message delivery is either lost or repeated.

Let's analyze what happens in Kafka from the perspective of producer, broker, and consumer:

1. The producer sends a message to the Broker

There are currently three ways for producers to send messages (request.required.acks).

 

acks =  0 : The producer will not wait for the broker to send the ack, because the sending message network times out or the broker crashes (1. The leader of the Partition has not yet committed the message 2. The data of the leader and the follower are out of sync), which may be lost or resent.

acks =  1 : When the leader sends an ack after receiving the message, it will be resent if it is lost, and the probability of losing it is very small

acks = -1: Send an ack when all followers have successfully synchronized the message. The possibility of lost messages is low.

 

2.Consumer pulls messages from Broker

There are two consumer interfaces in Kafka, namely Low-level API and High-level API

(1). Low-level API  SimpleConsumer

 

This set of interfaces is relatively complex, and users must consider a lot of things. The advantage is that they can have complete control over Kafka.

(2).  High-level API ZookeeperConsumerConnector

 

The high-level API is relatively simple to use. It has encapsulated the management of partition and offset. By default, it will automatically commit offset regularly, which may cause data loss, because the consumer may get the data and not process the crash. The characteristics of the high-level API interface are automatic management and easy to use, but the control of Kafka is not flexible enough.

 

3. Broker Analysis

 

    (1). For the broker, the data placed on the disk will not be lost unless the disk is broken.

    (2). For dirty memory (no flush disk) data, the broker restarts will be lost. 
        You can configure the flush interval through log.flush.interval.messages and log.flush.interval.ms. It will affect the performance 
        , but after version 0.8.x, the replica mechanism can be used to ensure that the data is not lost, at the cost of requiring more resources, especially disk resources. Kafka currently supports GZip and Snappy compression to alleviate this problem. 
        Whether to use a replica depends on the balance between reliability and resource cost.

 

 

Summarize

 

Kafka can only guarantee at-least once message semantics, that is, data may be repeated, which needs to be tolerated in applications. 
For Kafka consumers, it is generally recommended to use the high-level API interface, and it is best not to use the low-level API directly , it is more troublesome and difficult to write by yourself.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326710564&siteId=291194637