How does Seata-AT ensure the consistency of distributed transactions

Head picture.png
Author | Chen Jianbin (funkye) github id: a364176773
Source | Alibaba Cloud Native Official Account

Seata is an open source distributed transaction solution. The star is as high as 18100+, and the community is extremely active. It is committed to providing high-performance and easy-to-use distributed transaction services under the microservice architecture. This article will analyze the Seata-AT The realization principle allows users to have a deeper understanding of AT mode.

What is the Seata transaction model?

1. Seata's definition of affairs

Seata defines the framework of global transactions.

The global transaction is defined as the overall coordination of several branch transactions:

  1. TM requests the TC to initiate (Begin), submit (Commit), and roll back (Rollback) global transactions.
  2. TM binds the XID representing the global transaction to the branch transaction.
  3. RM registers with TC and associates branch transactions with global transactions represented by XID.
  4. RM reports the execution result of branch transaction to TC. (Optional)
  5. TC sends branch commit (Branch Commit) or branch rollback (Branch Rollback) commands to RM.

1.png

Seata's global transaction processing process is divided into two stages:

Execution stage >: Execute branch transactions, and ensure that the execution results meet the requirements of rollbackable (Rollbackable) and durable (Durable).
Completion stage >: According to the decision formed as a result of the execution stage, apply the global commit or rollback request issued by TM to TC,> TC instructs RM to drive branch transactions to Commit or Rollback.

Seata's so-called transaction mode refers to the behavior mode of branch transactions running under the Seata global transaction framework. >> To be precise, it should be called> branch transaction mode >.

The difference between different transaction modes is that branch transactions use different methods to achieve the goals of the two phases of global transactions. >> That is, answer the following two questions:

Execution stage >: How to execute and ensure that the execution result is rollbackable and durable.
Completion stage >: After receiving the TC command, the transaction is rolled back/committed.

2. How other two-phase transactions operate under the Seata transaction framework

1) TCC transaction mode

Let's first look at how TCC transactions are integrated into the Seata transaction framework:

2.png

It can be found that it actually looks very similar to Seata's transaction framework, but the difference is that the management of RM is the first-stage try execution and the second-stage confirm/cancel. The same is the Begin (initiated) of the transaction by TM, and RM is TM After calling, execute the first-stage Try method. When the calling link is finished, TM informs the TC of the second-stage decision. At this time, TC drives the second-stage execution to the RM (issues a notification, and the RM executes confirm/cancel).

2) XA transaction mode

3.png

As shown in the figure, the XA mode actually uses the XA interface at the bottom of Seata, which is automatically processed in the first stage and the second stage. For example, in the first stage, XA's RM creates XAConnection through proxy user data source, starts XA transaction (XA start) and XA-prepare (at this time, any operation of XA will be persisted and can be restored even if it is down). During the phase, TC informs RM to carry out Commit/Rollback operation of XA branch.

What is AT mode?

Let's first look at an example.

1. One stage

业务 sql:update product set name = 'GTS' where name = 'TXC'。

The execution process of the first stage is unaware to users, and the business SQL on the user side remains unchanged. What happens in the next stage of AT mode? Next, let's talk briefly.

  • Analyze sql and query to get the front mirror: select id, name, since from product where name ='TXC'.
  • Execute business sql.
  • The data after query execution is used as the post-mirror: select id, name, since from product where id = 1.

2. Second stage

Submit: Just delete the transaction-related information (theoretically, there is no problem if it is not deleted).

Rollback: Take out the previous image for rollback.

Through the above simple example, we can actually find that the AT mode is an automatic compensation transaction. What exactly does AT do? It will be described below.

How does AT ensure the consistency of distributed transactions?

First look at this picture:

4.png

Many people may have questions just after seeing the above picture. In fact, this is a schematic diagram of the non-intrusive AT mode. First, the user still enters from the interface and reaches the transaction initiator. At this time, for the business developer, the initiator entry is just a business interface. The business SQL is executed the same, and the response message is returned to the client without any change. And behind it is that the user's SQL is hosted by the Seata agent. The Seata-AT mode can perceive all the user's SQL and operate on it to ensure consistency.

How is Seata-AT non-intrusive?

5.jpeg

As shown in the figure, Seata will automatically proxy the user's DataSource when the application is started. Users who are familiar with JDBC operations are actually familiar with DataSource. Once you get a DataSource, you have mastered the data source connection, and you can do something behind the scenes. "Small actions" are also imperceptible to users at this time.

After a business request comes in, when the business SQL is executed, Seata will parse the user's SQL, extract the table metadata, generate the pre-image, and then execute the business sql to save the post-image after executing the sql (as for the introduction of the post-mirror, I will talk about it later) To), after the row lock is generated, it is carried to the Seata-Server, which is the TC side, when the branch is registered.

So far, the first stage of operation on the client side has been completed, no perception and no ***. If you think about it at this time, you will find that there is actually a row lock. What is this row lock for? This is the next step to talk about how Seata-AT guarantees transaction isolation under distributed conditions. Let's take the official website example directly.

1. Write isolation

  • Before committing the one-phase local transaction, you need to make sure to get the global lock first .
  • Cannot get the global lock and cannot commit local affairs.
  • Attempts to obtain a global lock are restricted to a certain range, and if the range is exceeded, the local transaction will be rolled back and the local lock will be released.

Take an example to illustrate:

Two global transactions, tx1 and tx2, respectively update the m field of table a, and the initial value of m is 1000.

tx1 starts first, opens the local transaction, gets the local lock, and updates m = 1000-100 = 900. Before the local transaction is committed, the global lock of the record is obtained first , and the local commit is released to release the local lock. After tx2, start the local transaction, get the local lock, and update m = 900-100 = 800. Before the local transaction is committed, try to take the global lock of the record. Before tx1 is globally committed, the global lock of the record is held by tx1, and tx2 needs to retry and wait for the global lock  .

6.png

tx1 global commit in two phases, release the global lock . tx2 gets the global lock and commits the local transaction.

7.png

If the second-stage global rollback of tx1 is performed, tx1 needs to reacquire the local lock of the data, perform the reverse compensation update operation, and realize the rollback of the branch.

At this time, if tx2 is still waiting for the global lock of the data and holds the local lock at the same time, the branch rollback of tx1 will fail. The rollback of the branch will be retried until the locks such as the global lock of tx2 expire, the global lock is abandoned and the local transaction is rolled back to release the local lock, and the branch rollback of tx1 is finally successful.

Because the global lock in the whole process is held by tx1 until the end of tx1, the problem of dirty writing will not occur .

At this time, everyone must understand the isolation better. At this time, most of the operations in the first stage are believed to be better understood by everyone. Next, we will continue to analyze the next stage.

2. Two-stage processing in AT mode

8.jpeg

As can be seen from the above figure, in the second-phase commit, TC only issues a notification: delete the undoLog recorded in the previous phase, and delete related transaction information such as row locks, and then let the transactions blocked due to competition locks Go smoothly.

When the second stage is a rollback, more processing is required.

9.jpeg

First, when the second phase notified by TC on the client side is rollback, it will check the undolog of the corresponding transaction, take it out and mirror it, and compare the current data (because SeataAT protects distributed transactions from the business application level. When the information in the database is directly modified at the database level, SeataAT’s row lock does not have an isolation effect). If there is a data modification outside of the global transaction, it is judged as dirty writing, and Seata cannot perceive this dirty writing. How does it happen? At this time, you can only print the log and trigger an exception notification to inform the user that manual intervention is required (standard modification of the data entry can avoid dirty writing).

And if there is no dirty write, it is relatively simple. Take out the front mirror. Everybody knows that the transaction needs to be atomic. Either happen together or not. At this time, the front mirror records the data before the occurrence and returns. After rolling, it achieves an atomic effect similar to a local transaction. After the rollback, delete transaction-related information, such as undolog and row locks. The second stage rollback is over.

Now that we have introduced the principles and ways of thinking of the first and second stages of AT mode, how does AT look like in Seata's distributed transaction framework?

10.png

It can be seen that in the Seata transaction framework of AT and other transaction modes, there will be an extra undolog table (relative to the *** point of other modes), but other than that, it is almost zero for the business. This is why the AT model is widely used in Seata.

3. The difference between AT mode and other two-stage modes supported by Seata

First of all, it should be understood that so far, there is no distributed transaction that can satisfy all scenarios.

Regardless of the AT mode, TCC mode or Saga mode, these modes are essentially derived from the inability of the XA specification to meet the requirements of certain scenarios.

There are currently 3 points for comparison:

  • Data lock

The AT mode uses a global lock to ensure basic write isolation. In fact, it also locks data, but the lock is centrally managed on the TC side. The unlocking efficiency is high and there is no blocking problem.

TCC mode is lock-free, and uses the exclusive lock feature of local transactions to reserve resources and perform corresponding operations after global transaction resolution.

In XA mode, the data involved is locked before the entire transaction process ends, and reads and writes are restricted according to the definition of the isolation level.

  • Deadlock (protocol blocking)

After XA mode prepare (in the old version of the database, XA END is required, and then prepare <three-phase origin>), the branch transaction enters the blocking phase, and the waiting must be blocked before receiving XA commit or XA rollback.

AT can support downgrade, because the lock is stored on the TC side. If Seata has a bug or other problems, it can be downgraded directly without any impact on the subsequent business call chain.

TCC does not have this problem.

  • performance

Performance loss mainly comes from two aspects: on the one hand, transaction-related processing and coordination processes increase the RT of a single transaction; on the other hand, the lock conflicts of concurrent transaction data reduce throughput. In fact, the main reason is the above protocol blocking and data locking.

The XA mode does not commit in one stage. In large concurrency scenarios, locks are stored in multiple resource parties (databases, etc.), which intensifies energy consumption.

AT mode lock granularity is fine to row level (primary key is required), and all transaction locks are stored on the TC side, unlocking is efficient and fast.

The TCC mode has the best performance, requiring only a small amount of RPC overhead and the performance overhead of 2 local transactions, but it needs to meet the resource reservation scenario and is more intrusive to the business (requires business developers to divide each interface into 3, One try, two confirm and cancel used in the second stage).
Many students may not have a special understanding of XA and AT's lock & protocol blocking, so just look at the following picture:

11.png

Can you try to guess which is XA? In fact, the picture below is XA, because it brings larger lock granularity and longer lock time, which leads to a much worse concurrency performance than the AT transaction model, so the popularity of XA mode is not very high so far. high.

Seata's recent plans

  • Console

First of all, the console is a problem that Seata users have exposed for a long time. The lack of a visual interface makes users doubt the reliability of Seata, and the lack of a console limits the possibility of manual intervention in distributed transactions on Seata. And so on, so in the future, the 1.5.0 version will bring the addition of the console, and more students are welcome to join in and build together!

  • Raft integration

The reason for Raft integration may not be particularly known to most users. First of all, you must know that the current transaction information on the TC side is stored in external storage, such as database, redis, and mongodb (PR phase). This will cause Seata if the external storage is down. -Server cluster is completely unavailable. Even if the Server is deployed in a cluster, with 10 or more nodes, it will be unavailable for this reason, which is unacceptable.

Therefore, Raft is introduced to make the transaction information of each Seata-Server consistent. Even if a node is down, the accuracy of the transaction information will not be destroyed, so that the consistency of distributed transactions is better guaranteed. (The implementation of Seata-Server raft will be shared in a new chapter later.)

  • undoLog compression

This is a relatively large performance optimization of the 1.5.0 AT mode. Due to the large and large data in the first stage of operation, because Seata inserts undolog information for the user behind it, it may also become larger, causing slow storage It is possible, so the undolog should be compressed so that the insertion of the undolog no longer becomes an AT transaction, which becomes a big myocardial overhead when the branch data volume is large.

to sum up

In the final analysis, AT is an agent that implements resource operations, records the original & changed state, and uses locks to ensure the isolation of the data. When an exception occurs in the call chain, all branch data are restored to achieve "atomicity" under distributed transactions.

What about the future? redis, mongodb, mq? Look forward to it.

The core value of the Seata project is to build a standardized platform for comprehensively solving distributed transaction problems.

Based on Seata, the upper-level application architecture can flexibly choose a suitable distributed transaction solution according to the needs of the actual scenario. Everyone is very welcome to participate in the construction of the project and jointly create a standardized distributed transaction platform.

12.png

Guess you like

Origin blog.51cto.com/13778063/2575768