Collation of common solutions for distributed transactions: two-stage, three-stage, TCC, MQ+local transaction+message proofreading

Distributed transactions are to ensure the consistency of multiple database operations under multiple services.

This article takes bank transfer as an example to illustrate common solutions for distributed transactions. For example, service A needs to deduct 100 yuan from user A, and service B needs to add 100 yuan to user B.

1. Two-phase commit

A typical application of two-phase commit is seata of spring cloud alibaba. Its solution to distributed transactions is as follows:
1) Phase 1: The transaction manager TM (service A) initiates a global transaction request, and the transaction coordinator TC produces a globally unique transaction XID. The XID is propagated through the microservice call chain. Therefore, each microservice (service A + service B) registers with TC as a branch in XID.
2) Each microservice performs a transaction operation, and then returns the transaction result to the transaction coordinator TC, and the TC judges whether it is a global commit or a global rollback according to the results of all services.

insert image description here
insert image description here

Reference article:
1: SpringBoot integrates Seata

2. Three-phase submission

2.1 Disadvantages of two-phase commit

Since there is already a two-phase commit, why do we need a three-phase commit? Therefore, it is necessary to focus on understanding the advantages and disadvantages of the two-phase commit.

Disadvantages of two-phase commit :
1) 同步阻塞问题: The database transactions of all participants are blocked. If a participant occupies a common resource, it will cause other services to block when requesting the resource. Therefore, it is not suitable for high concurrency situations.
2) 事务协调者单点故障问题: In the second stage, the transaction coordinator is a very important link. If the coordinator goes down in the second phase, the related database transactions will not be able to commit or rollback, and will be blocked consistently.
3) 数据不一致的问题: During the two-phase commit process, there may be local network problems due to the two-phase. There may be a commit or rollback request sent by the coordinator TC, but the participant did not receive it. Or after the message is sent, the participant is down and so on. Then it will also lead to the unsuccessful submission of related transactions, resulting in the problem of data inconsistency. After the participant fails to receive the request after a timeout in the second phase, it is recommended to rollback, of course, there will also be data inconsistencies, that is, it should be a commit.

2.2 Three-phase commit

The three phases include: CanCommit preparation phase, PreCommit pre-submission phase, and DoCommit submission phase. The idea is basically the same as the second stage.

1. Introduced the CanCommit stage: In this stage, the pre-check of the service will be performed first (for example, when placing an order, it will first determine whether the inventory is sufficient), so this step will not lock resources.

I understand that the advantages of introducing this stage are:
1) The verification can be done in advance, which reduces the locking time of resources. Therefore, the concurrency can be increased . 可以一定程度上解决二阶段中的同步阻塞问题.
2) If the notification of the second stage PreCommit is not received within the timeout, it will be automatically canceled . 可以一定程度解决二阶段的事务协调者单点故障问题.

2. After the PreCommit stage, participants introduced a timeout mechanism. If the DoCommit information released by the coordinator is not received, the commit stage will be executed overtime .

The advantages of this stage are 可以一定程度上解决事务协调者单点故障的问题.

You can see that the three-phase commit is the same as the two-phase commit, 依然都存在数据不一致的问题. For this kind of transaction exception, when a transaction exception is detected, the difference information can be compensated through scripts or asynchronous tasks, and an alarm can be issued .

3. For the data inconsistency problem in the second phase, a timeout retry mechanism can be used.
insert image description here

Reference articles:
1. Distributed two-phase commit and three-phase commit
2. Distributed transactions - two-phase commit and three-phase commit
3. Detailed explanations of seven common distributed transactions

3. Submitted by TCC

TCC can also be understood as a two-phase submission, but it is based on application-level submission: Try Confirm Cancer.
1) Preparation stage: Try, the business system detects and reserves resources (locking, locking resources), such as common ordering, in the try stage, we do not really reduce inventory, that is, we do not perform database transaction operations . Instead, lock the ordered inventory, such as locking the corresponding resources through redis.
2) Determine whether to execute confirm or cancel according to the result of the first stage. Confirm: Execute real business (execute business, perform database transaction operations, release locks). Cancel: It is the release of resources reserved in the Try phase (if there is a problem, the transaction operation of the database will not be performed, and the lock will be released).
insert image description here

3.1 Advantages of TCC Submission

1. Concurrency performance improvement : , so as to avoid the problem of blocking low performance caused by lock conflicts and long transactionsTCC的本质原理是把数据库的二阶段提交上升到微服务来实现 in the database during the two-phase commit . .可以一定程度上解决二阶段的同步阻塞问题

That is, the blocking waiting of transactions in the database phase is transformed into the blocking of calls between microservices . Taking transfer as an example, for example, in the second-phase commit, service A deducts 100 yuan for database transaction operations, and then waits for service B to perform database transaction operations to increase 100 yuan. At this time, the database transaction of service A is in a blocked state . If TCC commits, after service A executes the database transaction in Confirm, the database transaction of service A can be committed without being in a blocked state . If service B fails to return after timeout, service A can call Cancer again to return roll.

Of course, the premise of TCC is that the default Confirm stage and Cancer stage must be executed successfully.

2. Data final consistency : Based on the idempotency of Confirm and Cancel, it ensures that the transaction is finally confirmed or canceled, and the data consistency is guaranteed. 可以解决二阶段提交中的数据不一致的问题.
3. Reliability : Solve the single point of failure problem of the coordinator of the XA protocol. Since the microservices of the main business side are generally deployed in clusters, the microservices initiate and control the entire business activity 可以解决二阶段提交中的事务协调者单点故障问题.

3.2 Disadvantages of TCC submissions

1. 对微服务的侵入性强: Each transaction of the microservice must implement three methods such as try, confirm, and cancel. The high degree of business coupling increases the development cost, and the cost of future maintenance and transformation is also high.

3.3 Considerations for TCC Submission

1) 允许空回滚: Since the try step may fail, an empty rollback is allowed;
2) 防悬挂控制: The coordinator issues a cancel for the unanimous unsuccessful execution of the try. As a result, the participant receives the cancel first, and then the try request. In this case, it is necessary to record that the id has been canceled in the local transaction, so when trying again, it will not succeed.
3) Idempotent issues: Be sure to consider idempotent situations.

4. MQ message + local transaction + message proofreading

The core of MQ message + local transaction is that the caller needs to ensure the consistency of the local transaction, and then ensure that the message must be sent successfully. Secondly, the called party needs to ensure that the message can be received, and it must also ensure the consistency of the local transaction.
insert image description here
The following takes transfer as an example.

4.1 First of all, why add a message queue?

Adding a message queue mainly considers the following two problems :
1. Service A calls service B, which may take a long time, and service A is always in a blocked state;
2. The traffic is not well controlled. If service A has high traffic, it may crush service B;

4.2 Issues requiring attention

The general steps are: service A first deducts 100 successfully, then sends a message to mq, then service B receives the message and adds 100, and finally service A completes the call.

4.2.1 Service A has successfully deducted 100 first, how to ensure that the message can be sent to mq?

You can consider adding a table to service A: transfer flow table . Take the deduction and writing into the transfer flow table as a local transaction. If the deduction is successful, the status of the transfer record will be changed to 待处理.

Then add a timing task in the background to periodically check whether there is a record status in the transfer flow table to be processed , and update the time at the same time - the current time is greater than the threshold, indicating that this data has not received a result, and it needs to be put into the message queue again middle. 这样就可以保证服务A只要扣款成功,就一定能将消息成功发送给服务B.

Of course, if service B successfully returns ACK, the status can be changed 处理成功; if the ACK fails to return, the status can be changed 处理失败.

4.2.2 Now that the message has been guaranteed to be sent out, how can service B add 100 to ensure idempotence?

You can also create a transfer log table in service B, so that service B adds money and writes the transfer record as a local transaction . If the money is added successfully, it will be written successfully. In this way, every time a message comes, you can first check whether the flow id already exists. If it exists, you don’t need to consume it repeatedly.

Of course, there is still a problem here. Assuming that two repeated messages arrive at the same time, a locking step is also involved. For example, two messages with a serial number of 202209200000001 have come, so you can first go to redis to see if there is a lock of 202209200000001, if so, it means that the previous thread has acquired the lock, is adding money and writing it into the transfer log table .

For example, thread 1 seizes the lock first, then first adds money and writes it into the transfer log table, and then releases the lock. At this time, thread 2 grabs the lock again, and first judges whether this record already exists in the transfer log table, and if so, then do not repeat the operation. If not, thread 2 adds money and writes it to the transfer log table.

4.3 Introduce message verification to ensure final consistency

Timing tasks can be used for message verification/message reconciliation to ensure eventual consistency.

Reference articles:
1. Transfer triggers data consistency thinking

Guess you like

Origin blog.csdn.net/xueping_wu/article/details/127143322