[Microservices] Introduction to Distributed Transactions

1. Introduction to the transaction

A transaction is a program execution unit that accesses and possibly updates various data items in the database. In relational databases, a transaction consists of a set of SQL statements. Transactions should have 4 properties: atomicity, consistency, isolation, and durability. These four properties are often referred to as ACID properties.

Atomicity : A transaction is an indivisible unit of work, and the operations included in the transaction are either done or not done.
Consistency : A transaction must be such that the database changes from one consistent state to another, and the intermediate state of the transaction cannot be observed.
Isolation : The execution of a transaction cannot be interfered with by other transactions. That is, the operations within a transaction and the data used are isolated from other concurrent transactions, and the concurrently executed transactions cannot interfere with each other. Isolation is divided into four levels: read uncommitted (read uncommitted), read committed (read committed, solving dirty read), repeatable read (repeatable read, solving virtual read), serializable (serializable, solving phantom read) ).
Durability : Durability, also known as permanence, means that once a transaction is committed, its changes to the data in the database should be permanent. Other operations or failures that follow should not have any effect on it.
When implementing any transaction mechanism, the ACID characteristics of the transaction should be considered, including: local transactions and distributed transactions, which can not be satisfied in time, and the degree of support should also be considered.

2. Local affairs

In most scenarios, our application only needs to operate a single database, and the transaction in this case is called a local transaction ( Local Transaction). The ACID feature of local transactions is that the database provides support directly. The local transaction application architecture is as follows:
insert image description here

In JDBC programming, we use java.sql.Connectionobjects to open, close or commit transactions. The code looks like this:

Connection conn = ... //获取数据库连接
conn.setAutoCommit(false); //开启事务
try{
    
    
   //...执行增删改查sql
   conn.commit(); //提交事务
}catch (Exception e) {
    
    
  conn.rollback();//事务回滚
}finally{
    
    
   conn.close();//关闭链接
}

3. Typical scenarios of distributed transactions

At present, the development of the Internet is in full swing, and most companies have carried out database splitting and service-oriented (SOA). In this case, completing a business function may need to span multiple services and operate multiple databases. This involves distributed transactions. The resources that need to be operated are located on multiple resource servers, and the application needs to ensure that the operations on the data of multiple resource servers either all succeed or all fail. In essence, distributed transactions are to ensure data consistency between different resource servers.
Typical distributed transaction scenario:

3.1) Cross-database transactions

Cross-database transactions refer to the fact that an application needs to operate multiple libraries for a certain function, and different libraries store different business data. The author has seen a relatively complex business, in which 9 libraries are operated at the same time. The following figure demonstrates a service operating 2 libraries at the same time:
insert image description here

3.2) Sub-library and sub-table

Usually a database with a large amount of data or a large amount of data expected in the future will be split horizontally, that is, sub-database and sub-table. As shown in the figure below, database B is split into 2 libraries:
insert image description here

For the case of sub-database and sub-table, general developers will use some database middleware to reduce the complexity of SQL operations. For example, for sql: insert into user(id, name) values ​​(1, "Zhang San"), (2, "Li Si"). This sql is the syntax for operating a single database. In the case of a single database, transaction consistency can be guaranteed.

However, since the sub-database and sub-table are now performed, the developer hopes to insert the No. 1 record into the sub-database 1, and the No. 2 record into the sub-database 2. Therefore, the database middleware should rewrite it into two sqls and insert them into two different sub-databases. At this time, it is necessary to ensure that both databases succeed or fail. Therefore, basically all database middleware is faced with Problems with distributed transactions.

3.3) Servicing

Microservice architecture is a relatively popular concept at present. For example, in the case mentioned by the author above, an application operates 9 libraries at the same time. The business logic of such an application must be very complex, which is a great challenge for developers. It should be split into different independent services to simplify business logic. . After the split, the independent services use the RPC framework to make remote calls to communicate with each other. The following diagram demonstrates an architecture where 3 services call each other:
insert image description here

Service A needs to directly operate the database to complete a certain function, and needs to call Service B and Service C at the same time, and Service B operates two databases at the same time, and Service C also operates a library. It is necessary to ensure that these cross-service operations on multiple databases either succeed or fail. In fact, this may be the most typical distributed transaction scenario.

Summary: In the distributed transaction scenarios discussed above, without exception, multiple databases are directly or indirectly operated. How to ensure the ACID characteristics of transactions is a very big challenge for distributed transaction implementation solutions. At the same time, the distributed transaction implementation solution must also consider the performance issue. If the performance is seriously degraded in order to strictly ensure the ACID characteristics, it is unacceptable for some services that require fast response.

4. X/Open DTP model and XA specification

X/Open, now the open group, is an independent organization mainly responsible for formulating various industry technical standards. As far as Distributed Transaction Processing (DTP) is concerned, X/Open mainly provides the following reference documents:
DTP Reference Model: Distributed Transaction Processing: Reference Model
DTP XA Specification: Distributed Transaction Processing: The XA Specification

4.1 DTP model

The five basic elements that make up the DTP model:
Application Program (AP for short) : used to define transaction boundaries (that is, to define the start and end of a transaction), and to operate on resources within the transaction boundaries.
Resource Manager (Resource Manager, referred to as RM) : such as database, file system, etc., and provides a way to access resources.
Transaction Manager (Transaction Manager, referred to as TM) : responsible for allocating the unique identifier of the transaction, monitoring the execution progress of the transaction, and responsible for the submission and rollback of the transaction.
Communication Resource Manager (CRM for short) : Controls communication between distributed applications within a TM domain (TM domain) or across TM domains.
Communication Protocol (CP for short) : Provides the underlying communication service between distributed application nodes provided by CRM.

4.2 XA Specifications

 在DTP本地模型实例中,由AP、RMs和TM组成,不需要其他元素。AP、RM和TM之间,彼此都需要进行交互,如下图所示: 

insert image description here

In this figure, (1) represents the interaction interface of AP-RM, (2) represents the interaction interface of AP-TM, and (3) represents the interaction interface of RM-TM.

The main function of the XA specification is to define the RM-TM interaction interface. In addition to the defined RM-TM interaction interface (XA Interface), the XA specification also optimizes the two-phase commit protocol.

The two-phase commit protocol is proposed in the OSI TP standard; in the DTP reference model (<>), the two-phase commit protocol is specified for the submission of global transactions; while the XA specification (<< Distributed Transaction Processing: The XA Specification>>) only defines the interface that needs to be used in the two-phase commit protocol, which is the interface of the RM-TM interaction mentioned above, because the participants in the two-phase commit process are only TM and RMs.

5. Two-Phase Commit Protocol (2PC)

Two Phase Commit (Two Phase Commit) is not proposed in the XA specification, but the XA specification optimizes it. From the literal meaning, Two Phase Commit is to divide the commit process into two phases (Phase):
Phase 1:
TM notifies each RM to prepare to commit their transaction branch. If the RM judges that the work it has performed can be submitted, it will persist the work content and give a positive answer to the TM; if other situations occur, it will give the TM a negative answer. After sending a negative reply and rolling back the work already done, the RM can discard the transaction branch information.

Taking the mysql database as an example, in the first stage, the transaction manager sends a prepare "ready to submit" request to all involved database servers. After the database receives the request, it performs data modification and logging processing. After the processing is completed, it only stores the transaction. The status is changed to "Committable", and the result is returned to the transaction manager.

Phase 2
TM decides whether to commit or roll back the transaction according to the results of each RM prepare in Phase 1. If all RMs prepare successfully, TM informs all RMs to commit; if any RM prepare fails, TM informs all RMs to roll back their transaction branches.
Taking the mysql database as an example, if all the databases are successfully prepared in the first stage, the transaction manager sends a "confirm commit" request to the database server, and the database server changes the "committable" status of the transaction to "commit complete" status, and then Return an answer. If there is an error in the operation of any database in the first stage, or the transaction manager does not receive a response from a database, the transaction is considered to have failed and all database transactions are rolled back. The database server does not receive the confirmation commit request in the second stage, and will also withdraw the "committable" transaction.
insert image description here
insert image description here

XA is a distributed transaction at the resource level with strong consistency. During the entire process of two-phase commit, the lock of the resource will always be held.
TCC is a distributed transaction at the business level with eventual consistency and does not always hold resource locks.
For details of TCC, please refer to this article: Flexible transaction: TCC two-stage compensation type

Problems with
Two-Phase Commit Protocol (2PC) Two-phase commit seems to be able to provide atomic operations, but unfortunately, two-phase commit still has several disadvantages:
1. Synchronous blocking problem.
The ACID properties of global transactions under the two-phase commit scheme depend on RM. A global transaction contains multiple independent transaction branches, and this group of transaction branches either succeed or fail. The ACID properties of each transaction branch together constitute the ACID properties of the global transaction. That is, the ACID feature supported by a single transaction branch is upgraded to the category of distributed transactions. Even in a local transaction, if it is sensitive to operational reads, we need to set the transaction isolation level to SERIALIZABLE. This is especially true for distributed transactions, where the repeatable read isolation level is not sufficient to ensure distributed transaction consistency. If we use mysql to support XA distributed transactions, then it is best to set the transaction isolation level to SERIALIZABLE, but SERIALIZABLE (serialization) is the highest level of the four transaction isolation levels and the lowest level of execution efficiency.
2. Single point of failure.
Due to the importance of the coordinator, once the coordinator TM fails, the participant RM will always block. Especially in the second stage, if the coordinator fails, all participants are still in the state of locking transaction resources and cannot continue to complete transaction operations. (If the coordinator is down, a coordinator can be re-elected, but it cannot solve the problem that the participants are blocked due to the downtime of the coordinator.)
3. Data inconsistency.
In the second stage of the two-stage submission, after the coordinator sends the commit request to the participants, a local network exception occurs or the coordinator fails during the process of sending the commit request, which will cause only a part of the participants to receive the commit request. After this part of the participants receives the commit request, the commit operation will be performed, but other machines that have not received the commit request cannot perform the transaction commit. As a result, data inconsistency occurs in the entire distributed system.

Since the two-phase commit has defects such as synchronous blocking and single-point problem, the researchers made improvements on the basis of the two-phase commit and proposed a three-phase commit.

6. Three-phase commit

Three-phase commit (3PC) is an improved version of two-phase commit (2PC).
Unlike two-phase commit, three-phase commit has two change points:

1. Introduce a timeout mechanism. At the same time, a timeout mechanism is introduced in both the coordinator and the participants.
2. Insert a preparation phase between the first and second phases. It is guaranteed that the state of each participating node is consistent before the final commit phase.
That is to say, in addition to the introduction of the timeout mechanism, 3PC divides the preparation stage of 2PC into two again, so that the three-stage submission has three stages: CanCommit, PreCommit, and DoCommit.
insert image description here

CanCommit stage

The CanCommit phase of 3PC is actually very similar to the preparation phase of 2PC. The coordinator sends a commit request to the participant, and the participant returns a Yes response if the participant can submit, otherwise returns a No response.
1. The transaction asks the coordinator to send a CanCommit request to the participant. Ask if a transaction commit operation can be performed. Then start waiting for the participant's response.
2. Response feedback After the participant receives the CanCommit request, under normal circumstances, if it thinks that the transaction can be executed smoothly, it will return a Yes response and enter the ready state. Otherwise feedback No

PreCommit stage

The coordinator decides whether the PreCommit operation of the transaction can be remembered according to the reaction of the participants. Depending on the response, there are the following two possibilities.
If the coordinator gets a Yes response from all participants, then pre-execution of the transaction is performed.
1. Send a pre-commit request The coordinator sends a PreCommit request to the participant and enters the Prepared stage.
2. Transaction pre-commit After the participant receives the PreCommit request, it will execute the transaction operation and record the undo and redo information in the transaction log.
3. Response feedback If the participant successfully performs the transaction operation, it returns an ACK response and starts to wait for the final command.
If any participant sends a No response to the coordinator, or after waiting for a timeout, the coordinator does not receive a response from the participant, then the transaction is interrupted.
1. Send interrupt request The coordinator sends an abort request to all participants.
2. Interrupt transaction After the participant receives the abort request from the coordinator (or after the timeout, the request from the coordinator has not been received), the transaction is interrupted.

doCommit stage

This phase performs real transaction submission, which can also be divided into the following two situations.
Case 1: Execute commit
1. Send a commit request and coordinate and receive the ACK response sent by the participant, then he will enter the commit state from the pre-commit state. and send a doCommit request to all participants.
2. Transaction submission After the participant receives the doCommit request, a formal transaction submission is performed. And release all transaction resources after transaction commit is done.
3. Response feedback After the transaction is submitted, send an Ack response to the coordinator.
4. After the transaction coordinator receives the ack responses from all participants, the transaction is completed.
Case 2: Interrupt transaction The coordinator does not receive the ACK response sent by the participant (maybe the receiver sent an ACK response other than the response, or the response timed out), then the interrupt transaction will be executed.
1. Send interrupt request The coordinator sends an abort request to all participants
2. Transaction rollback After receiving the abort request, the participant uses the undo information recorded in phase 2 to perform the transaction rollback operation, and after the rollback is completed Release all transaction resources.
3. Feedback results After the participant completes the transaction rollback, it sends an ACK message to the coordinator.
4. Interrupting the transaction After the coordinator receives the ACK message fed back by the participant, the transaction is interrupted.

In the doCommit phase, if the participant cannot receive the doCommit or rebort request from the coordinator in time, it will continue to commit the transaction after waiting for a timeout. (Actually, this should be determined based on probability. When entering the third stage, it means that the participant has received the PreCommit request in the second stage, so the pre-condition for the coordinator to generate the PreCommit request is that he receives the PreCommit request before the second stage starts. The CanCommit response to all participants is Yes. (Once the participant receives the PreCommit, it means that he knows that everyone agrees to the modification.) Therefore, in one sentence, when entering the third stage, due to network timeout and other reasons, although the The participant did not receive a commit or abort response, but he had reason to believe that there was a high chance of a successful commit.)

The difference between 2PC and 3PC

Compared with 2PC, 3PC mainly solves the single point of failure problem and reduces blocking, because once the participant cannot receive the information from the coordinator in time, he will execute commit by default. Instead of holding transaction resources all the time and blocking. However, this mechanism also leads to data consistency problems, because, due to network reasons, the abort response sent by the coordinator is not received by the participant in time, and the participant performs the commit operation after waiting for the timeout. In this way, there is data inconsistency between other participants who receive the abort command and perform rollback.

After understanding 2PC and 3PC, we can find that neither the two-phase commit nor the three-phase commit can completely solve the distributed consistency problem. Therefore, in distributed scenarios, we often can only say that we can only guarantee 99.9% or 99.99%... such data consistency.

Guess you like

Origin blog.csdn.net/haohaoxuexiyai/article/details/123744746