Kafka Idempotence Principle and analysis

1 Overview

Recently some students and exchange feedback when said in the interview Kafka, Kafka asked component part, API use, Consumer and Producer principle and the role of other issues can be detailed answer. However, ask a question usually pay no attention to, it is idempotency Kafka, and is the main card. Well, today author to analyze what idempotency principle Kafka and implementation.

2. Content

2.1 Kafka why need idempotency?

Producer in the production of sending a message, it will send duplicate messages. Producer generates a retry mechanism when retry, message occurs repeatedly transmitted. After introduction of the idempotent repeatedly transmitted only generate a valid message. Kafka as a distributed messaging system, its common usage scenarios and distributed systems, such as push messaging systems, business platform system (such as logistics platform, bank clearing platform, etc.). Bank clearing platform, as the upstream side of the business to report the data to the bank clearing platform, if the copy of the data is calculated, processed multiple times, then the impact will be very serious.

2.2 of factors affecting Kafka idempotent What?

When using Kafka, the need to ensure Exactly-Once semantics. Distributed systems, some have a lot of uncontrollable factors, such as network, OOM, FullGC and so on. When the lead when Kafka Broker acknowledgment Ack, the emergence of network anomalies, FullGC, OOM and other issues Ack timeout, Producer will be sent repeatedly. Possible scenarios are as follows:

 

 

Idempotency 2.3 Kafka is how to achieve?

To achieve Kafka idempotent, which introduces ProducerID SequenceNumber at the bottom and designed architecture. What use is it these two concepts?

  • ProducerID: at each new Producer initialization, will be assigned a unique ProducerID, this ProducerID client user is not visible.
  • SequenceNumber: For each ProducerID, and each Topic Partition Producer corresponds to a data transmission from zero SequenceNumber monotonically increasing value.

2.3.1 power and other issues before the introduction of sex?

Kafka prior to introduction idempotent, sending a message to Producer Broker, Broker then appended to the message after the return value to the Ack signal Producer message flow. Implementation process is as follows:

 

FIG implementation flow of a message transmitted on the case of an ideal state, but in reality, there will be uncertainty various factors, such as occurs when transmitting to the network anomaly Broker at Producer. This abnormal situation such as the following appears:

 

FIG upper case, when the first transmission message to the Producer Broker, Broker message (x2, y2) is added to the message stream, but fails to return Producer the Ack signal (such as a network error). At this time, the end trigger Producer retry mechanism to resend the message (x2, y2) to the Broker, Broker after receiving the message, the message is added again to the message flow, and a successful return Ack signal to the Producer. This way, the message flow is repeatedly added to the same (x2, y2) of the two messages.

2.3.2 Idempotence introduced to solve the problem?

Faced with this problem, Kafka introduced idempotency. So Idempotence is how to solve these problems repeatedly send the message it? Now we can take a look at the flowchart:

 

 Again, this process is transmitted in an ideal state. Practice, there are many uncertain factors, such as network Broker abnormal, resulting in a failure when sent to the Producer sending Ack signal. Abnormality as shown below:

 

 When the message sent by the Producer (x2, y2) to the Broker, Broker receives a message and append it to the message flow. At this time, Broker Ack signal is returned to the Producer, resulting in abnormal Producer receives Ack signal failure. For the Producer, the retry mechanism is triggered, the message is (x2, y2) is sent again, however, due to the introduction idempotency, with the PID (ProducerID) and SequenceNumber in each message. SequenceNumber same PID and sent to the Broker, whereas the same message previously transmitted before Broker cached, then the message in the message flow in only one (x2, y2), the situation does not occur repeatedly transmitted.

How 2.3.3 ProducerID is generated?

When the client generates Producer, it instantiates the following code:

// instantiate an object is Producer 
Producer <String, String> Producer = new new KafkaProducer <> (The props);

In org.apache.kafka.clients.producer.internals.Sender class, there is a maybeWaitForPid () method run () are used to generate a ProducerID, codes are as follows:

 private void maybeWaitForPid() {
        if (transactionState == null)
            return;

        while (!transactionState.hasPid()) {
            try {
                Node node = awaitLeastLoadedNodeReady(requestTimeout);
                if (node != null) {
                    ClientResponse response = sendAndAwaitInitPidRequest(node);
                    if (response.hasResponse() && (response.responseBody() instanceof InitPidResponse)) {
                        InitPidResponse initPidResponse = (InitPidResponse) response.responseBody();
                        transactionState.setPidAndEpoch(initPidResponse.producerId(), initPidResponse.epoch());
                    } else {
                        log.error("Received an unexpected response type for an InitPidRequest from {}. " +
                                "We will back off and try again.", node);
                    }
                } else {
                    log.debug("Could not find an available broker to send InitPidRequest to. " +
                            "We will back off and try again.");
                }
            } catch (Exception e) {
                log.warn("Received an exception while trying to get a pid. Will back off and retry.", e);
            }
            log.trace("Retry InitPidRequest in {}ms.", retryBackoffMs);
            time.sleep(retryBackoffMs);
            metadata.requestUpdate();
        }
    }

3. Transaction

Another characteristic idempotency is related to the transaction. Kafka similar services in the database transaction, the transaction attribute refers to Kafka Producer series of production and consumption of messages in the message Offsets commit a transaction are atomic operations. While the corresponding result is success or failure at the same time.

It should be distinguished from the transaction database, operation database transaction refers to a series of additions and deletions change check, for Kafka, the operation of the transaction refers to a series of production and consumption atomicity operations.

3.1 Kafka introduced the use of affairs?

Prior to the introduction of property transaction, first introduced idempotency Producer, and its role is to:

  • Producer repeatedly transmitted message may be encapsulated into an atomic operation, i.e. while successfully or simultaneously failed;
  • Under Consumers & Producers mode, because when Commit Offsets Consumer problems, leading to repeated consumer news, Producer repetitive messages. Consumer needs of this operation mode Commit Offsets series of production operations and Producer message encapsulated into an atomic operation.

The scenario generation are:

For example, when the Consumer Commit Offsets, when the Consumer Commit complete consumption of Offsets 100 (assuming that the most recent Commit Offsets 50), when the trigger is executed Balance, Other Consumer consumption will repeat the message (between consumption of Offsets message between 50 and 100).

3.2 Services provides API which can be used?

Producer provides five transaction methods, which are: initTransactions (), beginTransaction (), sendOffsetsToTransaction (), commitTransaction (), abortTransaction (), the code is defined in org.apache.kafka.clients.producer.Producer <K, V > interface, the interface specifically defined as follows:

// initialize a transaction, are needed to ensure transation.id attribute assigned 
void initTransactions (); 

// open transaction 
void beginTransaction () throws ProducerFencedException; 

// to Commit Offsets Consumer provides operations within a transaction 
void sendOffsetsToTransaction (the Map <TopicPartition, OffsetAndMetadata> offsets, 
                              String consumerGroupId) throws ProducerFencedException; 

// commit the transaction 
void commitTransaction () throws ProducerFencedException; 

// abandon the transaction, the transaction is rolled back operation similar to 
void AbortTransaction () throws ProducerFencedException;

The actual application scenarios 3.3 What matters?

In Kafka transaction, an atomic operation, according to the type of operation can be divided into three cases. details as following:

  • Only Producer production news, this scenario requires intervention affairs;
  • Consumer news and news production co-exist, such as Consumer & Producer mode, this scenario is more common in the general Kafka project model, the transaction needs the intervention;
  • Only Consumer consumption message, this operation is of little significance in the actual project, and the results of the manual Commit Offsets are the same, but the introduction of such a scenario is not the purpose of the transaction.

4. Summary

Kafka idempotency and transactions are more important characteristic is very important especially on the issue of data loss and data duplication. The introduction of the principle of Kafka idempotency, the design is relatively easy to understand. The transaction characteristics similar transactions with the database, the database used by experienced understanding of Kafka transactions are also easier to accept.

5. Conclusion

This blog will share here, if you have any questions in the process of research study, you can add the group to discuss or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!

In addition, a blogger book " Kafka used to live not difficult to learn " and " Hadoop big data mining from entry to advanced combat ", like a friend or a classmate, you can click the link to buy the book to buy bloggers learn the bulletin board there, thank you for your support. The following public concern number, follow the prompts, you can get free video teaching books. 

Guess you like

Origin www.cnblogs.com/smartloli/p/11922639.html