Common solutions for distributed transactions

Common solving program

  1. Distributed transaction solution can use global transaction 2pc (two-stage submission protocol), 3pc (three-stage submission protocol), tcc compensation mechanism, provide rollback interface, distributed database
  2. LCN core uses 3PC+ TCC compensation mechanism

 

What is XA interface 

XA-eXtended Architecture means distributed transaction
XA in transaction is completed  by coordinator (coordinator, generally transaction manager) and participant (participants, generally each resource has its own resource manager). In MySQL, there are two types of XA transactions.

What is JTA

As a transaction specification on the Java platform, JTA (Java Transaction API) also defines the support for XA transactions. In fact, JTA is modeled on the XA architecture. In JTA, the transaction manager is abstracted as the javax.transaction.TransactionManager interface. And through the underlying transaction service (ie JTS) to achieve. Like many other Java specifications, JTA only defines interfaces, and the specific implementation is provided by vendors (such as J2EE vendors). The current implementation of JTA is mainly composed of the following:


1. JTA implementation provided by J2EE container (JBoss)
2. Independent JTA implementation: such as JOTM, Atomikos. These implementations can be used in environments that do not use J2EE application servers to provide distributed transaction guarantees. Such as Tomcat, Jetty and ordinary java applications.

 

2PC two-stage submission

The so-called two phases refer to: the first phase: preparation phase (voting phase) and the second phase: submission phase (execution phase) .

XA is generally completed in two phases, called two-phase commit (2PC). 
Phase one is the preparation phase, that is, all participants prepare to execute the transaction and lock the required resources. When participants are ready, they report to the transaction manager that they are ready. 
Phase two is the submission phase. When the transaction manager confirms that all participants are ready, it sends a commit command to all participants. 
As shown below: 

 

XA's performance problem 
XA's performance is very low. A comparison of the performance of one database transaction and the XA transaction performance between multiple databases shows that the performance is about 10 times worse. Therefore, XA transactions should be avoided as much as possible, for example, data can be written to local, and data can be distributed with a high-performance messaging system. Or use techniques such as database replication. 
XA should only be used when none of these can be achieved and performance is not a bottleneck.

 

3PC three-stage submission

Three-phase commit (Three-phase commit), also called three-phase commit protocol (Three-phase commit protocol), is an improved version of two-phase commit (2PC).

 

Unlike the two-phase commit, the three-phase commit has two changes.

1. Introduce a timeout mechanism. At the same time, a timeout mechanism is introduced in both the coordinator and the participants.
2. Insert a preparation phase in the first and second phases. It is ensured that the state of each participating node is consistent before the final submission stage.

In other words, in addition to introducing a timeout mechanism, 3PC divides the preparation phase of 2PC into two again, so that the three-phase submission has three phases: CanCommit, PreCommit, and DoCommit.

CanCommit stage

The CanCommit phase of 3PC is actually very similar to the preparation phase of 2PC. The coordinator sends a commit request to the participant, and the participant returns a Yes response if it can submit, otherwise it returns a No response.

1. Transaction inquiry The coordinator sends a CanCommit request to the participant. Ask whether the transaction commit operation can be performed. Then began to wait for the participant's response.

2. Response feedback After the participant receives the CanCommit request, under normal circumstances, if it thinks it can execute the transaction smoothly, it will return a Yes response and enter the ready state. Otherwise feedback No

PreCommit stage

The coordinator decides whether the PreCommit operation of the memorable transaction can be determined based on the reaction of the participants. According to the response, there are the following two possibilities.

If the coordinator's feedback from all participants is a Yes response, then the pre-execution of the transaction will be executed.

1. Send a pre-commit request. The  coordinator sends a PreCommit request to the participant and enters the Prepared phase.

2. After the transaction pre-commit  participant receives the PreCommit request, it will execute the transaction operation and record the undo and redo information in the transaction log.

3. Response feedback  If the participant successfully executes the transaction operation, it will return an ACK response and start waiting for the final instruction.

If any participant sends a No response to the coordinator, or after waiting for a timeout, the coordinator does not receive a response from the participant, then the transaction is interrupted.

1. Send an interrupt request The  coordinator sends an abort request to all participants.

2. After the interrupt transaction  participant receives the abort request from the coordinator (or after the timeout, the coordinator's request has not yet been received), the transaction is interrupted.

doCommit stage

The real transaction commit at this stage can also be divided into the following two situations.

Execute commit

1. Send the submission request to  coordinate the receipt of the ACK response sent by the participant, then he will enter the submission state from the pre-commit state. And send a doCommit request to all participants.

2. After the transaction submission  participant receives the doCommit request, it executes the formal transaction submission. And release all transaction resources after completing the transaction commit.

3. After the response feedback  transaction is submitted, an Ack response is sent to the coordinator.

4. After the completion of the transaction, the  coordinator completes the transaction after receiving the ack response from all participants.

The interrupt transaction  coordinator does not receive the ACK response sent by the participant (maybe the receiver did not send an ACK response, or the response timed out), then the interrupt transaction will be executed.

1. Send an interrupt request The  coordinator sends an abort request to all participants

2. After the transaction rollback  participant receives the abort request, it uses the undo information recorded in phase 2 to perform the transaction rollback operation, and releases all transaction resources after the rollback is completed.

3. After the feedback result  participant completes the transaction rollback, it sends an ACK message to the coordinator

4. After the interrupt transaction  coordinator receives the ACK message fed back by the participant, the interruption of the transaction is executed.

In the doCommit phase, if the participant cannot receive the doCommit or rebort request from the coordinator in time, the transaction will continue to be submitted after the waiting timeout. (Actually, this should be determined based on probability. When entering the third stage, it means that the participant has received the PreCommit request in the second stage. Then the precondition for the coordinator to generate the PreCommit request is that he receives before the second stage starts. The CanCommit response to all participants is Yes. (Once the participant receives the PreCommit, it means that he knows that everyone has actually agreed to modify it) So, in one sentence, when entering the third stage, due to network timeout and other reasons, although The participant did not receive a commit or abort response, but he has reason to believe that the probability of a successful submission is very high.)

The difference between 2PC and 3PC

Compared with 2PC, 3PC mainly solves the single point of failure problem and reduces congestion, because once the participant cannot receive the information from the coordinator in time, he will execute commit by default. It does not always hold transaction resources and is in a blocking state. However, this mechanism can also cause data consistency problems, because, due to network reasons, the abort response sent by the coordinator is not received by the participant in time, so the participant performs the commit operation after the waiting timeout. In this way, there is a data inconsistency with other participants who received the abort command and performed the rollback.

TCC

The TRYING stage is mainly to check the business system and reserve resources

The CONFIRMING phase is mainly to confirm and submit the business system. When the TRYING phase is executed successfully and the CONFIRMING phase starts, the default CONFIRMING phase will not make mistakes. That is: as long as TRYING succeeds, CONFIRMING must succeed.

The CANCELING phase is mainly to cancel the business executed in the state of business execution error and need to be rolled back, and the reserved resources are released.

Idempotence means that the results of the execution of a business method call once and multiple times are the same.

Give an example of payment items:

 

After the payment system receives the member's payment request, it needs to deduct the member account balance and increase the member points (for the time being, it is assumed that synchronization is required) to increase the merchant account balance

Assume again: the membership system, the merchant system, and the point system are three independent subsystems, which cannot be processed by traditional business methods.

TRYING stage: What we need to do is to reserve the funds of the member's fund account, that is: freeze the amount of the member's account (order amount)

CONFIRMING stage: What we need to do is to increase the point balance of the member points account and increase the account balance of the merchant account.

CANCELING stage: What needs to be executed at this stage is to unfreeze and release our deducted member balance

MQ distributed things

 Using MQ with high timeliness, the other party subscribes to messages and monitors them, and automatically triggers events when there are messages.
Regular polling and scanning are used to check the data in the message table.

Other compensation

Students who have done Alipay transaction interface know that we usually decrypt the parameters in the callback page and interface of Alipay, and then call the service related to update the transaction status in the system to update the order to the payment success. At the same time, Alipay will stop the callback request only when the word "success" is output on our callback page or the corresponding status code that indicates that the business has been successfully processed. Otherwise, Alipay will initiate a callback request to the client after a period of time until the successful identification is output.
In fact, this is a very typical compensation example, similar to some MQ retry compensation mechanisms.

In a generally mature system, the overall availability of higher-level services and interfaces is usually very high. If some services are due to transient network failures or call timeouts, then this retry mechanism is actually very effective.

Of course, consider a more extreme scenario. If the system itself has a bug or there is a problem with the program logic, it will not help to retry 1W times. Wouldn't it be a tragedy like "Obviously paid, but it shows that the payment is not delivered"?

In fact, in order to make the trading system more reliable, we generally add detailed log records to high-level service codes such as transactions. Once a fatal exception occurs in the system, there will be email notifications. At the same time, there will be regular tasks in the background to scan and analyze such logs, check out this special situation, try to compensate through the program and notify the relevant personnel by email.

In some special cases, there will be "artificial compensation", which is also the last barrier.

Guess you like

Origin blog.csdn.net/qq_27828675/article/details/105563395