Kafka transaction mechanism

Kafka is a highly scalable distributed messaging system that plays an important role in the massive data processing ecosystem.

A key feature of data processing is data consistency. Specific to the field of Kafka, that is, the one-to-one consistency between the data produced by the producer and the data consumed by the consumer. In a distributed system environment where various types of failures are common, ensuring that an entire set of messages at the business level is atomically published and processed exactly once is an actual requirement for data consistency in the Kafka ecosystem.

This article introduces the concept and process of the transaction mechanism in the Kafka ecosystem.

The concept of Kafka transaction mechanism

Kafka has supported the transaction mechanism since version 0.11. The Kafka transaction mechanism supports atomic writing of messages across partitions. Specifically, messages submitted by Kafka producers to multiple partitions within the same transaction either succeed or fail at the same time. This guarantee still holds true after the producer runs abnormally or even restarts after a shutdown.

In addition, messages within the same transaction will be uniquely submitted to the Kafka cluster in the order sent by the producer. That is to say, the transaction mechanism guarantees that the message is submitted to the Kafka cluster exactly once. It is well known that exactly-once delivery is impossible in a distributed system. This assertion has some subtle noun overloading issues, but it's mostly true that all systems that claim to be able to do exactly-once processing rely on idempotence somewhere.

Kafka's transaction mechanism is widely used in real-world scenarios where complex businesses need to ensure that atomic concepts in a business domain are submitted atomically.

For example, an order flow includes an order generation message and an inventory deduction message. If these two messages are historically managed by two topics, their atomicity in business requires Kafka to use the transaction mechanism to atomically submit to Kafka on the cluster.

Also, for complex stream processing systems, the upstream of the Kafka producer may be another stream processing system, which may have its own consistency scheme. In order to coordinate with the consistency scheme of the upstream system, Kafka needs to provide a consistency mechanism that is as general and easy to combine as possible, that is, a flexible transaction mechanism, to help achieve end-to-end consistency.

Process of Kafka transaction mechanism

Data consistency in distributed systems is hard. In order to understand what kind of data consistency guarantee a system provides, what kind of requirements such a guarantee puts on the application program, and under what circumstances the consistency guarantee will fall back, study its consistency carefully. The implementation of the mechanism is required.

As we mentioned above, the core feature of the transaction mechanism is the ability to atomically submit message sets across multiple partitions, even if these partitions belong to different topics. At the same time, each message in the submitted message set is submitted only once, and the order in which they are produced in the producer application is kept and written to the message log of the Kafka cluster. In addition, transactions can tolerate exceptions or even restarts when producers are running.

The most critical concept in implementing the transaction mechanism is the unique identifier ( TransactionalID ) of the transaction. Kafka uses TransactionalID to associate transactions in progress. The TransactionalID is provided by the user, because Kafka, as a system, cannot independently identify that the two different processes before and after the downtime actually want the same logical transaction.

For multiple transactions before and after the application of the same producer, TransactionalID does not need to generate a new one each time. This is because Kafka also implements the ProducerID and epoch mechanism. The purpose of this mechanism in the transaction mechanism is mainly to identify different sessions. The same session ProducerID has the same value, but there may be multiple terms. The ProducerID only changes on session switch, while the tenure is updated every time a new transaction is initialized. In this way, the same TransactionalID can be used as an identifier for multiple independent transactions across sessions.

Next, we start from the complete process of a transaction to discuss what role the client, that is, the producer and consumer, and the server, that is, the Kafka cluster, play in this process and what actions are performed.

Initialize transaction context

Logically speaking, transactions are always initiated from the producer. The producer initializes the transaction context by calling the initTransactions method. The first thing to do is to find the transaction coordinator ( Transaction Coordinator ) that the Kafka cluster is responsible for managing the current transaction, and apply for the ProducerID resource from it. The initial ProducerID and epoch are uninitialized.

After receiving the corresponding method call, the transaction manager (TransactionManager) on the producer side sends the information to find the transaction coordinator and the information to initialize the ProducerID. All metadata information related to the transaction will be completed by the client, the transaction manager on the producer side, and the server, that is, the transaction coordinator on a Broker of the Kafka cluster.

At the beginning, the producer does not know which Broker has the transaction coordinator associated with its TransactionalID. Logically, all transaction-related data that needs to be persisted will eventually be written to a special topic __transaction_state. This and the special topic __consumer_offsets for managing consumer consumption sites in the previous answer to the consumption site management article constitute the only two special topics in the current Kafka system.

For a producer or a transaction uniquely identified by TransactionalID, its transaction coordinator is the partition leader of the corresponding partition whose metadata of the transaction is finally stored on the __transaction_state topic. For a specific transaction, its metadata will be recorded by the absolute value of its TransactionalID hash value modulo the number of partitions, which is also a common scheme for determining partitions.

The producer sends the information of finding the transaction coordinator to any broker in the cluster, which calculates the actual transaction coordinator, obtains the corresponding node information and returns it to the producer. In this way, the producer finds the transaction coordinator.

Subsequently, the producer will apply to the transaction coordinator for a ProducerID resource, which includes the ProducerID and the corresponding epoch information. After the transaction coordinator receives the corresponding request, it will first judge the status of the transaction under the same TransactionalID to deal with the management of cross-session transactions.

In the first step, the transaction coordinator will obtain the transaction metadata information corresponding to the TransactionalID. As mentioned earlier, these metadata information will be written on the special topic __transaction_state, which is also the need for transaction metadata information to be fault-tolerant to both the producer and the Kafka cluster.

If the metadata information cannot be obtained, then initialize the transaction metadata information, including obtaining a new ProducerID resource, and package and persist it together with TransactionalID, partition number and some other configuration information.

Among them, obtaining a new ProducerID resource requires the ProducerID manager to apply for a ProducerID number segment from ZooKeeper and allocate them one by one. The means of applying for the number segment is to modify the information of the /latest_producer_id_block node on ZooKeeper. The process is to read the information of the last applied ProducerID on the node, add the length of the number segment to be applied for, and then update the last applied ProducerID on the node. information. Since ZooKeeper has version control for node updates, concurrent requests will cause several of the request target versions to mismatch, and a retry will be initiated. The length of the ProducerID is the length of the Long type, so it is almost impossible to use it up in actual use. Kafka throws a fatal error when the number segment resources are exhausted and does not try to recover.

If the previous metadata information of the same TransactionalID is obtained, take different actions according to the previous state of the transaction coordinator transaction.

  1. If the state transfer is in progress at this time, directly return CONCURRENT_TRANSACTIONS exception. Note that there are concurrent state transfers happening on the transaction coordinator. Generally speaking, concurrent state transfers should be performed sequentially. Returning this exception directly can avoid the request from the client, that is, the producer, to time out, and let the producer retry later. This is also an optimistic locking strategy.

  2. If the status is PrepareAbort or PrepareCommit, then return CONCURRENT_TRANSACTIONS exception. Similarly, at this time the state is about to transition to the final state, and there is no need to forcibly terminate the previous transaction, otherwise unnecessary waste will be generated.

  3. If the state is Dead or PrepareEpochFence or the current ProducerID and epoch do not match, a non-retryable exception is directly thrown. This is because either the previous Producer has been replaced by a new Producer, or the transaction has timed out and there is no need to try again.

  4. If the state is Ongoing at this time, the transaction coordinator will transfer the transaction to the PrepareEpochFence state, then discard the current transaction and return a CONCURRENT_TRANSACTIONS exception.

  5. If the state is one of CompleteAbort or CompleteCommit or Empty at this time, then first transfer the state to Empty and then update the epoch value.

After such a series of operations, Kafka initializes the context of transaction execution.

start a transaction

The process of initializing a transaction is actually that the producer and the corresponding transaction coordinator agree on the transaction state and enter a state where a new transaction can be initiated. At this point, the producer can start a transaction operation through the beginTransaction method. This method will only transfer the local transaction state to the IN_TRANSACTION state, and there will be no interaction with the Kafka cluster until the messages in the transaction are actually committed.

After the producer marks itself as starting the transaction, that is, after the local transaction state is transferred to the transaction-in-progress state, it can start sending messages in the transaction.

Send a message in a transaction

When the producer sends the message in the transaction, it will add the partition corresponding to the message to the transaction manager. If the partition has not been added before, the transaction manager will insert an AddPartitionsToTxnRequest request before sending the message next time to tell Information about the partitions that the transaction coordinator of the Kafka cluster participates in the transaction. After the transaction coordinator receives this information, it will update the metadata of the transaction and persist the metadata to __transaction_state.

For the message sent by the producer, the ProduceRequest request is still used in the same way as the general message production. Except that the corresponding TransactionalID information and the identifier of the message belonging to the transaction will be carried in the request, it is no different from the ordinary information produced by the producer. If the consumer does not configure the read-committed isolation level, then these messages are already visible to the consumer and can be consumed when they are accepted by the Kafka cluster and persisted to the topic partition.

The ordering guarantees of messages in a transaction are also checked when the transaction is sent.

The producer has applied for a ProducerID resource at this time. When it sends a message to a partition, an internal message manager will maintain a sequence number ( SequenceNumber ) for each different partition. Correspondingly, the Kafka cluster also maintains a sequence number for the message production from each ProducerID to each partition.

The sequence number information is included in the ProducerRequest request. If the Kafka cluster sees that the sequence number of the request is continuous with its own sequence number, that is, it is exactly one larger than its own sequence number, then accept this message. Otherwise, if the sequence number of the request is greater than one, it means that it is an out-of-order message, and it is directly rejected and an exception is thrown. If the sequence number of the request is the same or smaller, it means that it is a repeated message, just ignore it and tell the client that it is a repeated message.

commit transaction

After all messages related to a transaction have been sent, the producer can call the commitTransaction method to commit the entire transaction. In the case of an exception occurring in the middle of a transaction, the entire transaction can also be discarded by calling abortTransaction. Both of these operations transfer the state of a transaction to a finalized state, and have many similarities to each other.

Regardless of submitting or discarding, the producer sends an EndTxnRequest request to the transaction coordinator, and the request contains a field to determine whether to submit or discard. After receiving this request, the transaction coordinator first updates the transaction state to PrepareAbort or PrepareCommit and updates the state to __transaction_state.

If the transaction coordinator goes down before the status update is successful, the recovered transaction coordinator will think that the transaction is in the Ongoing state. At this time, the producer will retry the EndTxnRequest request because it cannot receive the confirmation reply, and finally update the transaction to PrepareAbort or PrepareCommit status.

Then, according to whether it is committed or discarded, a transaction marker ( TransactionMarker ) is sent to the partition leaders of all partitions involved in the transaction.

The transaction flag is a transaction control message introduced by the Kafka transaction mechanism that is different from business messages. Its function is mainly to identify that the transaction has been completed. This message can be consumed by consumers just like the business message, and it can be associated with the business message in the transaction through TransactionalID, so that consumers configured with the read committed feature can ignore the unfinished A committed transaction message or a discarded transaction message.

If the transaction coordinator goes down or the partition leader goes down or the network partitions before the transaction flag is written to the partition leaders of all the partitions involved, the new transaction coordinator or the transaction coordinator that retries after timeout will re-submit to the partition leader Write transaction flags. Transaction flags are idempotent and thus do not affect the outcome of transaction commits. Here we confirm what we said before that all systems that claim to be able to do exactly-once processing rely on idempotence somewhere.

After all the partitions involved in the current transaction have persisted the transaction flag information to the topic partition, the transaction coordinator will set the status of this transaction as commit or discard, and persist it to the transaction log file. After that, a Kafka transaction is truly complete. The metadata about the current transaction cached in the transaction coordinator can then be cleaned up.

If the transaction coordinator responds to the producer before the shutdown is successful, the producer will directly return the transaction submission success when the producer submits the transaction again after recovery.

In general, the status of the transaction is based on the persistent metadata information on the __transaction_state topic.

timeout expired transaction

Due to natural failure reasons such as network congestion or partitions in distributed systems, operations have a third state of timeout besides success and failure. Real-world distributed systems must handle timeouts reasonably, otherwise blocking or waiting forever is unacceptable in any practical business domain.

The Kafka transaction mechanism itself can configure the transaction timeout. The configuration of the transaction timeout will be checked during the interaction between the transaction manager and the transaction coordinator. If the transaction has timed out, an exception will be thrown.

However, in the case of a network partition, the Kafka cluster may not be able to wait for the message sent by the producer at all. At this time, the Kafka cluster needs a corresponding mechanism to actively expire. Otherwise, the never-expiring intermediate state transactions will gradually accumulate into storage garbage when the producer is down and cannot be recovered or will not be recovered.

The Kafka cluster periodically polls the in-memory transaction information. If it is found that the last status update time of the ongoing transaction has exceeded the configured cluster transaction cleanup time threshold, the transaction will be discarded. At the same time, in order to avoid receiving transaction update requests from the original Producer concurrently during the operation, first update the epoch of the ProducerID associated with the transaction to isolate the epoch of the original Producer. To put it another way, it is to perform the discard transaction operation with a new effective identity, so as not to be confused about who is discarding the transaction.

In addition, the latest transaction information of TransactionalID will also be checked during polling. If the last transaction of a TransactionalID has exceeded the configured cluster TransactionalID cleanup time threshold, all metadata information corresponding to the TransactionalID will be cleaned up.

There are two other important topics that have been neglected in the above discussion. One is that the Kafka transaction mechanism supports message production and message consumption site submission in the same transaction, and the other is how to configure read-committed consumers to correctly read the messages in the transaction when the transaction is not committed or discarded.

The former is not particularly complicated. You only need to regard the submission of the consumption site as a message in a transaction, which is treated the same as the message production and control messages, and is also defined by the transaction flag when submitting.

I don’t want to talk about it because this feature is usually only useful in scenarios where Kafka is only used to build a streaming pipeline, especially the Kafka Streams solution.

For a stream processing pipeline that combines multiple systems, the consumption of messages from Kafka is upstream, the production to Kafka is downstream, and the middle is another stream computing system such as Flink. In this scenario, the management of consumption sites and the transactional production of messages are two things that can be considered separately, and can be combined with other system consistency schemes such as Flink's Checkpoint mechanism, without having to be in the same transaction Both the consumption site and the new message are submitted.

The latter mainly relies on the Kafka cluster to respond to the request configured to read the submitted consumer by introducing the newly added LastStableOffset concept along with the transaction mechanism when managing the pull request of the consumption site. Read-committed consumers will not be allowed to pull messages in a transaction until the transaction is complete. Obviously, this may cause long-term blocking when consumers pull new messages. Therefore, in practice, long-term transactions should be avoided as much as possible.

For messages of discarded transactions, the Kafka cluster maintains metadata of messages of discarded transactions, so that consumers can pull messages and metadata of messages of discarded transactions at the same time, and compare and screen out messages of discarded transactions by themselves. In normal business scenarios, there will not be too many discarded transactions, so maintaining such a piece of metadata and allowing consumers to filter by themselves would be an acceptable choice.

Guess you like

Origin blog.csdn.net/qq_35240226/article/details/108124318
Recommended