[springcloud micro service] detailed explanation of the use of distributed transaction framework Seata

Table of contents

I. Introduction

2. Business Introduction

2.1 Atomicity

2.2 Consistency

2.3 Isolation

2.4 Persistence

3. Distributed transaction scenario

3.1 Origin of distributed transactions

3.2 Typical Scenarios of Distributed Transactions

3.2.1 Cross-database transactions

3.2.2 Sub-database and sub-table

3.2.3 Servicing

4. Common solutions for distributed transactions

4.1 Theoretical Basis of Distributed Transactions

4.1.1 2PC two-phase commit protocol

4.1.2 prepare stage

4.1.3 commit stage

4.1.4 Two-stage problem

5. Distributed transaction Seata

5.1 Introduction to Seata

5.1.1 What is Seata

5.1.2 Why choose Seata

5.2 Several distributed transaction modes commonly used by Seata

5.2.1 AT mode

5.2.2 TCC mode

5.2.3 SAGA mode

5.2.4 XA mode

5.2.5 Analysis of four modes

5.3 Three roles of Seata

5.4 Implementation process in Seata AT mode

5.4.3 Overall process

5.4.4 Advantages of Seata

5.4.5 Problems with Seata


I. Introduction

One of the problems brought about by the large-scale application of the microservice architecture is that traditional single transactions evolve into distributed transactions, because each microservice will have its own independent database. In terms of the architecture of the springcloud technology stack, microservices When calling each other, the current service caller cannot guarantee whether the transaction processing of the called party will be successful, which is the origin of the problem of distributed transactions.

2. Business Introduction

In order to better understand the principle of distributed transactions, review several characteristics of transactions again

The four characteristics of transactions: atomicity, consistency, isolation, and durability, these four attributes are often called ACID characteristics

2.1 Atomicity

A transaction is an indivisible unit of work, and the operations included in the transaction are either done or not done

2.2 Consistency

The transaction must change the database from one consistent state to another, and the intermediate state of the transaction cannot be observed

2.3 Isolation

The execution of a transaction cannot be interfered by other transactions. That is, the operations and data used within a transaction are isolated from other concurrent transactions, and the concurrently executed transactions cannot interfere with each other.

There are four levels of isolation:

  • read uncommitted (read uncommitted);
  • Read committed (read committed) to solve the problem of dirty reading;
  • Repeatable read (repeatable read) to solve the problem of virtual reading;
  • Serializable (serializable) to solve the problem of phantom reading;

2.4 Persistence

Persistence, also known as permanence, means that once a transaction is committed, its changes to the data in the database should be permanent. Subsequent other operations or failures should not have any effect on it.

summary

When implementing any transaction mechanism, you should consider the ACID characteristics of the transaction, including: local transactions and distributed transactions, which cannot be well satisfied in time, and the extent to which they are supported should also be considered.

3. Distributed transaction scenario

3.1 Origin of distributed transactions

In the following scenario, an order needs to be created in the logic of the order service, and three methods for operating the db need to be called at the same time. If it is placed in a single application and uses a library together, all can be completed in a single transaction Operation, just pass the transaction annotation.

With the evolution and upgrade of the architecture, the single architecture can no longer meet the requirements of high concurrency and high throughput, and it will inevitably face the demolition of the library. The architecture under the microservice system will evolve into the following,

From the above call link, it is obvious that in a distributed environment, each service application is an independent database. In this case, in order to create an order, it is necessary to call multiple microservices in the logic of the order interface (here only considers calling 3 service interfaces), if it is said that when creating an order and calling the SMS service fails, this will involve a rollback operation, and the logic of order creation can only ensure the data integrity of the order library itself, and cannot guarantee the SMS Whether the data can be rolled back normally brings about the problem of distributed transactions.

In some large-scale distributed applications, it is very common to call 4 or 5 services in one business. Therefore, in a microservice scenario, distributed transactions are a problem that is difficult to avoid.

3.2 Typical Scenarios of Distributed Transactions

Through the above example, we can get such information. In the microservice scenario, usually completing a certain business function may require spanning multiple services and operating multiple databases. This involves distributed transactions. The resources that need to be operated are located on multiple resource servers, and the application needs to ensure that the operations on the data of multiple resource servers either all succeed or all fail. In essence, distributed transactions are to ensure data consistency between different resource servers.

Generally speaking, the scenarios of distributed transactions probably have the following

3.2.1 Cross-database transactions

Cross-database transactions refer to the fact that a certain function of an application needs to operate multiple libraries, and different business data are stored in different libraries. The above case involves cross-library transactions. The following figure demonstrates the situation where a service operates two libraries at the same time:

3.2.2 Sub-database and sub-table

Usually, if a database has a large amount of data or is expected to have a large amount of data in the future, it will be split horizontally, that is, it will be divided into databases and tables. As shown in the figure below, the database B is split into two libraries:

After sub-database and table sub-database, a SQL statement in the case of a single database, such as: insert into user(id,name) values ​​(1, "Zhang San"), (2, "Li Si"), this The consistency of the transaction can be guaranteed through the characteristics of the database itself, but after the database and table are divided, the two data will be inserted into different databases according to certain rules. Since they are different databases, the two data cannot be guaranteed. The sql must be able to execute successfully.

3.2.3 Servicing

The microservice architecture is currently the mainstream service solution. After the service, each microservice will have its own independent database, and each microservice will make remote calls through the RPC framework to achieve mutual communication. The following figure will again Restore the scenario of inter-service calls in the above case

summary

In the distributed transaction scenarios discussed above, without exception, multiple databases are directly or indirectly operated. How to ensure the ACID characteristics of transactions is a very big challenge for distributed transaction implementation solutions. At the same time, the distributed transaction implementation scheme must also consider performance issues. If the performance is seriously degraded in order to strictly guarantee the ACID characteristics, it is unacceptable for some businesses that require fast response.

4. Common solutions for distributed transactions

Based on the current situation of distributed transaction problems, as the application of microservice architecture becomes more and more mature, many practical solutions have emerged, such as:

  • seata, Alibaba's open source distributed transaction framework;
  • Message queue solutions, such as rocketmq transaction message mechanism;
  • SAGA;
  • SHAH;
  • ...

They have one thing in common, which is the implementation of the "two-phase (2PC)" protocol. The two-phase refers to the completion of the entire distributed transaction, which is divided into two steps to complete.

In fact, these four common distributed transaction solutions correspond to the four modes of distributed transactions: AT, TCC, Saga, and XA. The four distributed transaction modes have their own theoretical foundations, respectively in different time was raised. At the same time, each mode has its applicable scenarios, and each mode also has its own representative products, and these representative products may be our common ones (global transactions, based on reliable messages, best-effort notifications, TCC). Therefore, before specifically learning a distributed transaction solution, it is necessary to systematically understand the theoretical basis of distributed transactions.

4.1 Theoretical Basis of Distributed Transactions

The protocols related to distributed transactions include 2PC and 3PC. Since the three-phase commit protocol 3PC is very difficult to implement, the current mainstream distributed transaction solutions on the market are all 2PC protocols.

4.1.1  2PC two-phase commit protocol

As the name suggests, it is divided into two phases: Prepare and Commit

The 2PC execution flow is as follows:

4.1.2  prepare stage

Prepare is the stage of submitting transaction requests. According to the above figure, its main execution process includes the following steps:

1. Inquiry , the coordinator sends a transaction request to all participants, asks whether the transaction operation can be performed, and then waits for the response of each participant.

2. Execution . After receiving the transaction request from the coordinator, each participant performs transaction operations (such as updating records in a relational database table), and records Undo and Redo information in the transaction log.

3. Response , if the participant successfully executes the transaction and writes Undo and Redo information, it returns a YES response to the coordinator, otherwise it returns a NO response. Of course, it is also possible for the participant to be down and thus not return a response

4.1.3  commit stage

The commit phase is the commit phase of the execution transaction , including normal commit or rollback

Submit normally

According to the above figure, its main execution process includes the following steps:

1. The commit request coordinator sends a Commit request to all participants;

2. After receiving the Commit request, the transaction submission participant executes the transaction submission, and releases all the resources occupied during the transaction execution period after the submission is completed;

3. Feedback result Participants send Ack response to the coordinator after executing the transaction submission;

4. Complete the transaction After receiving the Ack responses from all participants, complete the transaction submission

business interruption

During the execution of the Prepare step, if some participants fail to perform transactions, go down or have a network interruption with the coordinator, the coordinator cannot receive YES responses from all participants, or a participant returns No response, at this point, the coordinator will enter the rollback process and roll back the transaction.

The process is shown in the red part of the figure below (replace the Commit request with a red Rollback request):

According to the above figure, its main execution process includes the following steps:

1. The rollback request coordinator sends a Rollback request to all participants;

2. After the transaction rollback participants receive the Rollback, they use the Undo log in the Prepare phase to execute the transaction rollback, and release all the resources occupied during the transaction execution period after completion;

3. The feedback result participant sends an Ack response to the coordinator after executing the transaction rollback;

4. After the interrupt transaction receives the Ack responses from all participants, the transaction interruption is completed;

4.1.4 Two-stage problem

In fact, the 2PC protocol will also introduce other problems in the practice process. Here are the following points

synchronous blocking

When the participants are waiting for the coordinator's instructions, they are actually waiting for the response of other participants. During this process, the participants cannot perform other operations, that is, their operation is blocked. If the network abnormality between the participant and the coordinator prevents the participant from receiving the information from the coordinator, it will cause the participant to be blocked all the time.

single point problem

In 2PC, all requests come from the coordinator, so the status of the coordinator is very important. If the coordinator goes down, it will keep the participants blocked and occupy transaction resources. If the coordinator is also distributed and uses the master selection method to provide services, then after one coordinator hangs up, another coordinator can be selected to continue subsequent services, which can solve the single point problem. However, the new coordinator cannot know all the status information of the previous transaction (for example, the waiting time for the Prepare response, etc.), so the previous transaction cannot be processed smoothly.

data inconsistency

During the Commit transaction, the Commit request/Rollback request may be lost due to the coordinator downtime or
the network , so some participants did not receive the Commit/Rollback request, while other participants received the execution normally. If the Commit/Rollback operation is executed, the participants who have not received the request will continue to block. At this point, the data between participants is no longer consistent. When the participant executes Commit/Rollback, it will send an Ack to the coordinator. However, no matter whether the coordinator receives the Ack from all participants, there will be no other remedial measures for the transaction. All the coordinator can do is wait for the timeout Afterwards, the transaction initiator returns a "I am not sure if this transaction was successful.

Environmental reliability depends on

After the coordinator Prepare request is sent, it waits for a response. However, if any participant goes down or the network with the coordinator is
interrupted, the coordinator cannot receive the responses from all participants. In 2PC, the coordinator will wait for a certain amount of time. time, and then after a timeout, a transaction interruption is triggered, during which the coordinator and all other participants are out of blocking. This mechanism is too harsh for real-world environments where network problems are common. 

5. Distributed transaction Seata

5.1 Introduction to Seata

5.1.1 What is Seata

Seata is an open source distributed transaction solution from the Alibaba team, dedicated to providing high-performance and easy-to-use distributed transaction services. Seata will provide users with AT, TCC, SAGA and XA transaction modes to create a one-stop distributed solution for users.

5.1.2 Why choose Seata

  • There is a CAP problem in distributed transactions
  • The Ali team is open source, maintained by programmers all over the world, and has technical support
  • Mature distributed applications, many use cases of large factories
  • Provide a variety of different distributed transaction implementation modes (AT, TCC, Aaga)

5.2 Several distributed transaction modes commonly used by Seata

Seata provides several common distributed transaction implementation modes, which are introduced next

5.2.1  AT mode

AT mode is a non-intrusive distributed transaction solution. Seata implements this mode. In AT mode, users only need to pay attention to their own "business SQL". Generates two-phase commit and rollback operations for transactions.

How the AT mode can achieve non-intrusion to the business, according to the above figure, the summary is as follows:

first stage

In the first stage, Seata will intercept the "business SQL", first parse the SQL semantics, find the business
data save it as a "before image" before the business data is updated, and then execute the "business SQL "Update the business data.
After the business data is updated, save it as an "after image", and finally generate a row lock. All the above operations
are completed within one database transaction, which ensures the atomicity of the first-stage operation.

two phase commit

If the second stage is submitted, because the "business SQL" has been submitted to the database in the first stage, the Seata framework only needs to
delete the snapshot data and row locks saved in the first stage to complete the data cleaning.

two phase rollback

If the second stage is a rollback, Seata needs to roll back the "business SQL" that has been executed in the first stage to restore the business data. The rollback
method is to use the "before image" to restore the business data; but before the restoration, the dirty write must be verified first, and the "database current business data" and the "after image" are compared. If the two data are completely consistent, it means that there is no dirty write. Write, you can restore business data. If it is inconsistent, it means that there is dirty writing. If there is dirty writing, it needs to be processed manually.

The one-phase and two-phase commits and rollbacks of the AT mode are automatically generated by the Seata framework. Users only need to write "business SQL" to easily access distributed transactions. The AT mode is a distributed transaction without any intrusion into the business. Business solution.

5.2.2  TCC mode

The TCC mode is a kind of two-phase protocol implementation, which requires users to implement the three operations of Try, Confirm and Cancel according to their own business scenarios; the transaction initiator executes the Try method in the first phase, submits and executes the Confirm method in the second phase, and rolls back in the second phase Execute the Cancel method. When using this mode, you need to pay attention to the following two points:

1. It is relatively intrusive, and you have to implement the relevant transaction control logic yourself;
2. There are basically no locks in the whole process, and the performance is stronger;

The entire execution process of TCC is shown in the figure below:

In TCC mode, there are 3 important methods

  • try, resource detection and reservation;
  • Confirm , the business operation to be executed is submitted; if the Try is required to be successful, the Confirm must be successful;
  • Cancel : Release reserved resources;

Since the two-phase commit mode has been explained in detail above, I won't go into details here.

5.2.3 SAGA mode

The implementation of saga mode is a long transaction solution.

Saga is a compensation protocol. In the Saga mode, there are multiple participants in a distributed transaction, and each participant is a compensation service for offsetting. Users are required to implement forward operations and reverse rollback operations according to business scenarios. As shown in the figure below: T1~T3 are all forward business processes, and they all correspond to a reverse operation C1~C3

Main features of SAGA mode

  • During the execution of the distributed transaction, the forward operations of each participant are executed sequentially. If all the forward operations are executed successfully, the distributed transaction is submitted;
  • If any forward operation fails, the distributed transaction will go back to perform the reverse rollback operation of the previous participants, roll back the submitted participants, and return the distributed transaction to the initial state;
  • Saga forward service and compensation service also need to be implemented by business developers. is therefore business intrusive;
  • Distributed transactions in Saga mode are usually event-driven, and are executed asynchronously between participants. Saga mode is a long-term transaction solution;

Saga mode usage scenarios

1) The Saga mode is suitable for business systems that have long business processes and need to ensure the final consistency of transactions. In the Saga mode, local transactions will be submitted in the first stage, and performance can be guaranteed in the case of no locks and long processes.

2) Transaction participants may be services of other companies or legacy systems, which cannot be transformed and provide interfaces required by TCC, and Saga mode can be used.

Advantages of Saga mode

1) One-stage submission of local database transactions, lock-free, high performance;
2) Participants can use transaction-driven asynchronous execution, high throughput;
3) Compensation service is the "reverse" of forward service, easy to understand and implement;

Disadvantages of Saga mode

The Saga mode cannot guarantee isolation because the local database transaction has been submitted in the first stage and the "reserve" action has not been performed. Responses to lack of isolation will be discussed later.

Similar to the practical experience of TCC, in the Saga mode, the reverse and reverse operations of each transaction participant need support, namely:

  • Empty compensation: reverse operation is earlier than forward operation;
  • Anti-suspension control: reject forward operation after empty compensation;
  • idempotent;

5.2.4 XA mode

XA is a two-phase commit protocol defined by the X/Open DTP group (X/Open DTP group). XA is
natively supported by many databases (such as Oracle, DB2, SQL Server, MySQL) and middleware tools (such as CICS and Tuxedo).

The following points about XA

1. XA interface functions are provided by database vendors. The basis of the XA specification is the two-phase commit protocol 2PC;

2. JTA (Java Transaction API) is an enhanced interface of the XA specification implemented by Java;

In XA mode, there needs to be a [global] coordinator. After each database transaction is completed, the first phase of pre-commit is performed, and the coordinator is notified to give the result to the coordinator. After all branch transaction operations such as the coordinator are completed and pre-committed, proceed to the second step; the second step: the coordinator notifies each database to commit/rollback one by one, where the global coordinator is the TM role in the XA model. The database for each branch transaction is RM.

The XA implementation provided by MySQL (https://dev.mysql.com/doc/refman/5.7/en/xa.html), the open source framework under the XA mode includes atomikos, and its development company also has a commercial version.

Disadvantages of XA mode

The transaction granularity is large. Under high concurrency, system availability is low. therefore rarely used

5.2.5 Analysis of four modes

Four different distributed transaction modes, AT, TCC, Saga, and XA, were proposed at different times, and each mode has its applicable scenarios.

  • AT mode is a non-intrusive distributed transaction solution, suitable for scenarios that do not want to modify the business, with almost zero learning costs;
  • TCC mode is a high-performance distributed transaction solution, suitable for scenarios with high performance requirements such as core systems;
  • The Saga mode is a long-term transaction solution, which is suitable for business systems that have long business processes and need to ensure the final consistency of transactions. The Saga mode will submit local transactions in the first stage, without locks, and can guarantee performance under long-term process conditions. It is mostly used in the channel layer, Integration layer business system. Transaction participants may be services of other companies or legacy systems, which cannot be transformed and provided;
  • The interface required by TCC can also use Saga mode;
  • XA mode is a solution for distributed strong consistency, but it has low performance and less use;

Summarize

Distributed transaction itself is a technical problem. Which solution to use in the business still needs to be selected according to different business characteristics. However, we will also find that distributed transactions will greatly increase the complexity of the process and bring a lot of extra overhead. Work, the amount of code has increased, the business has become more complicated, and the performance has declined. Therefore, when we are actually developing, we can not use distributed transactions without using

5.3 Three roles of Seata

Before coding, it is necessary to have a comprehensive understanding and study of the relevant terms in Seata, so as not to be confused later.

In Seata's architecture, there are three roles:

TC 

Transaction Coordinator, the transaction coordinator maintains the state of global and branch transactions, and drives global transaction commit or rollback.

TM

 (Transaction Manager) - The transaction manager defines the scope of a global transaction: start a global transaction, commit or rollback a global transaction.

RM 

(Resource Manager) - The resource manager manages resources for branch transactions, talks to TCs to register branch transactions and report the status of branch transactions, and drives branch transaction commits or rollbacks. Among them, TC is a separately deployed Server TM and RM is a Client embedded in the application.

In Seata, the life cycle of a distributed transaction is as follows:

 Combined with the above figure, the specific execution process is as follows

1. TM requests TC to start a global transaction. TC will generate an XID as the number of the global transaction. XID will be propagated in the calling link of microservices to ensure that the sub-transactions of multiple microservices are associated together. When entering the transaction method, XID will be generated. global_table is used to store global transaction information;

2. RM requests TC to register the local transaction as a branch transaction of the global transaction, and associate it through the XID of the global transaction. When running database operation methods, branch_table stores transaction participants;

3. TM requests TC to tell XID whether to commit or roll back the global transaction corresponding to it;

4. TC drives RMs to commit or roll back their own local transactions corresponding to XID;

5.4 Implementation process in Seata AT mode

Combined with the process of implementing Seata on the official website above, here is a detailed introduction to the complete process of implementing distributed transactions in Seata's AT mode. Official Documentation: Official Documentation Description

The core of the AT mode is no intrusion into the business. It is an improved two-stage submission. Its design idea is as shown in the figure

5.4.1 Phase 1

Business data and rollback log records are committed in the same local transaction, releasing local locks and connection resources. The core is to analyze the business sql, convert it into undolog, and store it in the database at the same time. How is this done? First throw out a concept DataSourceProxy proxy data source, you can probably basically guess what the operation is through the name, and then do a specific analysis later.

5.4.2 Second stage

If the distributed transaction operation is successful, the TC notifies the RM to delete the undolog asynchronously

Distributed transaction operation fails, TM sends a rollback request to TC, RM receives the rollback request from the coordinator TC, finds the corresponding rollback log record through XID and Branch ID, and generates a reverse update SQL through the rollback record And execute to complete the rollback of the branch.

5.4.3 Overall process

If described with a piece of pseudocode, it can be described as follows

business service createOrder {     inventory service-deduct inventory     points service-increase points }


From the code point of view, the implementation steps are as follows

  • The TM end uses the annotation @GlobalTransactional to open, commit, and roll back global transactions;
  • Seata on the RM side completes DataSourceProxy by extending DataSource, which automatically realizes undo_log and TC reporting;
  • The TC end is implemented through seata-server (that is, a java service, which can be downloaded from the official website);

5.4.4 Advantages of Seata

Compared with other distributed transaction frameworks, there are several highlights of the Seata architecture:

1. The application layer realizes automatic compensation based on SQL parsing, thereby minimizing business intrusion;
2. Independently deploy TC (transaction coordinator) in distributed transactions, responsible for transaction registration and rollback;
3. Through global locks Implemented write isolation and read isolation.

5.4.5 Problems with Seata

performance loss

An Update SQL requires global transaction xid acquisition (communication with TC), before image (parse SQL, query database once), after image (query database once), insert undo log (write database once), before commit (communicate with TC communication, judging lock conflicts), these operations require a remote communication RPC, and
are synchronous. In addition, the insertion performance of the blob field is not high when the undo log is written. Every write of SQL will increase so much overhead, and it is roughly estimated that it will increase the response time by 5 times.

Value for money evaluation

In order to perform automatic compensation, all transactions need to be mirrored before and after and persisted, but in actual business scenarios, how high is the success rate, or what is the percentage of distributed transaction failures that need to be rolled back? According to the 28th principle, in order to roll back 20% of transactions, the response time of 80% of successful transactions needs to be increased by 5 times. Is this cost worthwhile compared to letting the application develop a compensation transaction?

global lock

Compared with XA, although Seata will release the database lock after the first phase is successful, the judgment of the global lock before the commit in the first phase also prolongs the possession time of the data lock. How much lower this overhead is than XA's prepare depends on the actual business scenario. carry out testing. The introduction of global locks achieves isolation, but the problem is blocking, which reduces concurrency, especially for hot data, and this problem will be more serious.

Long release time when rolling back lock

When Seata rolls back, it needs to delete the undo log of each node before releasing the lock in the TC memory, so if the second stage is rollback, it will take longer to release the lock.

deadlock problem

Seata's introduction of global locks will additionally increase the risk of deadlock, but if a deadlock occurs, it will continue to retry, and finally rely on waiting for the global lock to time out. This method is not elegant, and it also prolongs the possession time of the database lock.

Sixth, write at the end of the text

This article introduces the theoretical knowledge of the distributed transaction framework seata in detail in a large space, which is still necessary for the real use of seata for technical integration and understanding of other distributed transaction frameworks. I hope it will be useful to the small partners who see it.

Guess you like

Origin blog.csdn.net/congge_study/article/details/130024795