This is an article written from last year to this year, I hope you will like it

1. Background

Distributed transactions have always been a common topic. I have written many articles related to distributed transactions under my official account, but they have not been fully analyzed. In the previous articles, I also mentioned many times that we can use message queues to implement our distributed transactions, but most of them have been taken in one go. Many readers have a lot of questions about this piece. I hope that after reading this article, you will be able to Understand how to implement distributed transactions with message queues.

Of course, let’s review some of our basic concepts first:

CAP

CAP theorem, also known as Brewer's theorem. For architects designing distributed systems (not just distributed transactions), CAP is your starting point.

C (Consistency): For a given client, a read operation can return the latest write operation. For data distributed on different nodes, if the data is updated on a certain node, if the latest data can be read by other nodes, it is called strong consistency. If a node does not read the data If you get it, it is distributed inconsistency.
A (availability): A non-faulty node returns a reasonable response (not an error and timeout response) in a reasonable amount of time. The two keys to availability are reasonable time and reasonable response. Reasonable time means that the request cannot be blocked indefinitely and should be returned in a reasonable time. A reasonable response means that the system should explicitly return the result and the result is correct, where correct means that it should return 50 instead of 40, for example.
P (Partition Tolerance): The system can continue to work when a network partition occurs. For example, there are multiple machines in a cluster, and there is a problem with the network of one machine, but the cluster can still work normally.

Anyone who is familiar with CAP knows that the three cannot be shared. If you are interested, you can search for the proof of CAP. In a distributed system, the network cannot be 100% reliable, and partitioning is actually an inevitable phenomenon. If we choose CA and give up P, Then when a partition occurs, in order to ensure consistency, the request must be rejected at this time, but A does not allow it, so it is theoretically impossible for a distributed system to choose the CA architecture, only the CP or AP architecture.

For CP, to give up availability and pursue consistency and partition fault tolerance, our zookeeper is actually the pursuit of strong consistency.

For AP, giving up consistency (the consistency mentioned here is strong consistency) and pursuing partition fault tolerance and availability is the choice of many distributed systems when designing, and the latter BASE is also extended according to AP.

By the way, CAP theory ignores network latency, that is, when a transaction is committed, it is copied from node A to node B, but in reality this is obviously impossible, so there will always be some time inconsistency. At the same time, choosing two of the CAP, for example, you choose CP, does not tell you to give up A. Because the probability of P appearing is so small, you still need to guarantee CA most of the time. Even if the partition appears, you have to prepare for the later A, for example, by means of some logs, other machines will be restored to usable.

BASE

BASE is an abbreviation for the three phrases Basicly Available, Soft state and Eventually consistent. It is an extension to AP in CAP

Basic availability: When a distributed system fails, part of the available functions are allowed to be lost, and the core functions are guaranteed to be available. Soft state: Allows the existence of intermediate states in the system that do not affect system availability, which refers to inconsistencies in CAP. Eventual consistency: Eventual consistency means that after a period of time, all node data will be consistent.

BASE solves the theory that there is no network delay in CAP, and uses soft state and eventual consistency in BASE to ensure consistency after delay. BASE and ACID are opposites, it is completely different from ACID's strong consistency model, but gains availability by sacrificing strong consistency, and allows data to be inconsistent for a period of time, but eventually reach a consistent state.

transaction message

All our transaction messages can be seen as implementations of the BASE model. The representative transaction message functions in the industry are RocketMQ open sourced by Alibaba and QMQ open sourced by Qunar. Both of their message queues implement the transaction message function, but the implementation methods are different. Analyze how these two message queues implement transactional messages.

2. RocketMQ-transaction message

What exactly is the RocketMQ transaction message?

The basic process is as follows: In the first stage of Prepared message, the address of the message will be obtained. The second phase performs local transactions. The third stage accesses the message through the address obtained in the first stage and modifies the state. The message recipient can use the message. If the confirmation message fails, the RocketMq Broker provides a regular scan for messages that have not been updated. If a message is not confirmed, it will send a message to the sender to determine whether to submit it. In Rocketmq, it is sent to the sender in the form of a listener. , for processing.

If the confirmation message fails, the RocketMq Broker provides a regular scan for messages that have not been updated. If a message is not confirmed, it will send a message to the sender to determine whether to submit it. In Rocketmq, it is sent to the sender in the form of a listener. , for processing.

If the consumption times out, you need to retry all the time, and the message receiver needs to ensure idempotency. If the message consumption fails, this needs to be processed manually, because the probability is low, if you design this complex process for this small probability time, it will outweigh the gains.

You must have seen this picture many times in other places. Many times, you can only understand a little from this picture. Then let's see how the code is implemented.

2.1 Using transactional messages

There is a very important listener called TransactionListener in RocketMQ's transaction message, we need to implement it

There are two methods:

executeLocalTransaction: As the name suggests, our local transaction method is executed. Generally speaking, our local transaction method is called by the upper-level business sequence, but in the transaction message of rocketMQ, it needs to be driven by the Listener. If you want to use the transaction message of RocketMQ, you need to Make some changes to our business. And it should be noted here that we also need to save the correspondence between the transaction ID of the message and the current transaction in the transaction.
checkLocalTransaction: Check our local transaction status according to our previous transaction ID. There are three statuses here: There are three statuses for transaction messages, commit status, rollback status, and intermediate status:
- TransactionStatus.CommitTransaction: Commits the transaction, which allows the consumer to consume this message.
- TransactionStatus.RollbackTransaction: Rollback transaction, which means the message will be deleted and not allowed to be consumed.
- TransactionStatus.Unknown: The intermediate status, which represents the need to check the message queue to determine the status. When returning to this state, RocketMQ will retry the check. In order to prevent frequent checks, the number of checks for a single message is limited to 15 by default.

For our message sending there is the following code:

We found that in the code we bind our previous listener and a thread pool to our producer, where the role of the thread pool is the thread pool used by our checkLocalTransaction.

2.2 Implementation principle

2.2.1 Client

The code here is relatively simple, mainly divided into the following steps

Step 1: Send a message to Broker first.
Step 2: According to the sending result, determine whether to execute the local transaction. If the sending is successful, execute the local transaction.
Step 3: Record the local transaction status. The status here is the three statuses of commit transaction, rollback transaction and intermediate status that we mentioned above.
Step 4: End the transaction and decide whether to commit or roll back according to the local transaction status.

For checkLocalTransaction: In RocketMQ, the request sent by RocketMQ-Broker will be received CHECK_TRANSACTION_STATEto check the local transaction status.

2.3.1 Server

On the Broker, special judgments are made on transaction messages:

If it is a transaction message, then this logic needs to be followed. prepareMessageThe logic of prepareMessage is as follows:

Mainly to replace the topic of the current message with RMQ_SYS_TRANS_HALF_TOPIC. Our one-stage half-message is completed here, and the next step is the commit or rollback of the Broker to process our transaction: the red box in the figure represents our core steps, and there are three steps for commit:

Get semi-messages that require commit
Send the message to the original topic
delete half message

There are two steps for rollback:

Get half messages that need rollback
delete half message

It is relatively simple to obtain messages. It is good to query directly through the recorded offset. The logic of sending messages to the original topic can basically be reused. The focus here is how to delete half messages. We all know that RocketMQ writes sequentially. , we can't really delete the message, so we can only rely on some other ways. We can think that after the message is consumed, as long as the offset is not reset, the message will not be consumed again, then the deletion is actually achieved. Function. RocketMQ also implements a consumer through this idea to consume RMQ_SYS_TRANS_HALF_TOPICthis topic. If the message needs to be deleted, no other operations are required after consumption. If it does not need to be deleted, it will be re-delivered after consumption.

In fact, the core lies in how to record whether the semi-message should be deleted? For this problem, RocketMQ uses a new topic RMQ_SYS_TRANS_OP_HALF_TOPICto save whether the half-message is deleted. In fact, in the above process of deleting half-message, RMQ_SYS_TRANS_OP_HALF_TOPICan op_message is actually delivered, and then the background task operates.

The schematic diagram of the whole process is as follows:

Step1: Send a transaction message, also called halfMessage here, which will replace the Topic with the Topic of HalfMessage.
Step2: Send commit or rollback. If it is commit, the previous message will be queried, then the message will be restored to the original topic, and an OpMessage will be sent to record that the current message can be deleted. If it is rollback, an OpMessage will be sent directly to delete it.
Step3: There is a timing task for processing transaction messages in the Broker. Compare halfMessage and OpMessage regularly. If there is OpMessage and the status is deleted, then the message must be committed or rollback, so the message can be deleted.
Step4: If the transaction times out (the default is 6s) and there is no opMessage, then it is very likely that the commit information is lost, and we will check the local transaction status of our Producer here.
Step5: Do Step2 according to the queried information.

2.3 Summary

The above has described how to use RocketMQ transaction messages and implementation principles. I think everyone already has their own understanding of RocketMQ transaction messages. However, the transaction messages of RocketMQ have never been used in some of my actual business operations. The main reasons are as follows:

The transformation cost is large, such as an order placing operation, the local transaction of creating an order is generally performed synchronously, and the order ID will be obtained after creation, but in RocketMQ, this local transaction has become an operation in the Listener, then It cannot be done by returning parameters, but can only complete this business logic through some other methods, such as ThreadLocal and so on.
Need to record the relationship between TransactionId and local transaction status
Only a single transaction message is supported. If I create an order and need to send 10 kinds of messages, if I want to keep the transaction consistent, then RocketMQ does not support it.

To sum up, the transaction messages of RocketMQ are really tasteless in my opinion, and it is difficult to adapt to the old business. So how to talk about the solution of QMQ's transaction message, and see if this solution can solve the problem we are talking about?

3. QMQ transaction message

The transaction message of QMQ is not as complicated as RocketMQ, and the transformation of the message middleware itself is very small. It depends on the local transaction of the database itself. For example, to create an order, two kinds of messages need to be sent, namely A and B, then there are The following pseudocode:

begin transaction;
createOrder();
commit transaction;

sendMessageA();
snedMessageB();

At this time, we found that both message A and message B are outside the transaction, and their consistency cannot be guaranteed. In fact, when we send a message, we do not necessarily need to really deal with the message middleware. We can do a local storage and save it. Our news:

begin transaction;
createOrder();
saveMessageA();
saveMessageB();
commit transaction;
// 发送消息
sendMessageA();
snedMessageB();

It can be seen that we only add two operations to save messages, so how do we ensure consistency? If it hangs when sending MessageA, then we can use scheduled tasks to pull the messages saved in our database and not sent. , then send again.

In fact, this method can also be extended to other message queues, because there is no intrusion to the message middleware itself. If RocketMQ or Kafka also wants to use this method to ensure transaction messages, it is also possible.

Let's see if this method can solve the problems caused by RocketMQ transaction messages?

For the cost of transformation, you only need to transform the Client once, and rewrite spring's TransactionSynchronization in QMQ, which can directly simplify the code as follows:

begin transaction;
createOrder();
sendMessageA();
snedMessageB();
commit transaction;

The internal logic of send here is actually saveMessage, which will be automatically sent after commit, and there are timed tasks in the background that will compensate for the sending.

No additional binding of transactionId and message is required
Support for sending multiple transaction messages

The problems caused by RocketMQ transaction messages can basically be solved, but it also has shortcomings, because it introduces additional database writes. If there are many transaction messages, there will be many more operations to write to the database, which are more sensitive to response time. need careful consideration

4. Summary

Two kinds of transaction messages are introduced. For me, the solution implemented by QMQ can be more suitable for most businesses. However, it should be noted here that not all distributed consistency can be used for transaction messages. The scenario where transaction messages are used can only be a scenario where sending this message can represent the success of the operation. What does that mean? For example, when we pay, we will deduct points, coupons, etc. If I send a message of deduction of points, does it mean that it will be successful? This is definitely not possible, because the user's points may not be enough, which will lead to the failure of the deduction. If it is to send a message to give away points, then it can represent success, because the gift of points is an addition, and there are not too many restrictions.

If you find that transaction messages cannot satisfy the business scenario well, then you can consider some other transaction strategies, such as TCC, saga, etc., which are described in my previous articles.

If you think this article is helpful to you, your attention and forwarding are the greatest support for me, O(∩_∩)O:

In-depth analysis of how to implement transaction messages