Distributed theory of distributed things

If there is no distributed transaction

In a series of microservice systems, what would happen if there were no distributed transactions? Let's take the commonly used transaction business in the Internet as an example:

img

The figure above contains two independent microservices for inventory and order, and each microservice maintains its own database. In the business logic of the trading system, a commodity needs to call the inventory service to deduct inventory before placing an order, and then call the order service to create an order record.

Under normal circumstances, the two databases are updated successfully, and the data on both sides maintains consistency.

img

However, under abnormal circumstances, it is possible that the deduction of inventory is completed, and the subsequent order record fails to be inserted for some reason. At this time, the data on both sides has lost its due consistency.

img

 

What is a distributed transaction?

Distributed transactions are used to ensure data consistency between different nodes in a distributed system. There are many implementations of distributed transactions, the most representative of which is the XA distributed transaction protocol proposed by the Oracle Tuxedo system .

The XA protocol includes two implementations: two-phase commit (2PC / tow phase commit) and three-phase commit (3PC) . Here we focus on the specific process of the two-phase commit.

XA two-phase submission

There are two roles in the XA protocol: transaction coordinator (coordinator) and transaction participant (participants, or cohorts) . Let's take a look at the interaction process between them:

The first stage:

 

 

img

 

In the first phase of XA distributed transactions, the node acting as the transaction coordinator will first send Prepare requests to all participant nodes.

After receiving the Prepare request, each participant node will perform data update related to the transaction and write it to Undo Log and Redo Log. If the participant executes successfully, he does not submit the transaction temporarily, but returns a "complete" message to the transaction coordination node.

When the transaction coordinator receives the return messages from all participants, the entire distributed transaction will enter the second stage.

second stage:

 

 

img

 

 

In the second phase of XA distributed transactions, if the transaction coordinating node has received a positive return before, then it will issue a Commit request to all transaction participants.

After receiving the Commit request, the transaction participant nodes will each commit local transactions and release the lock resources. When the local transaction completes the commit, it will return a "complete" message to the transaction coordinator.

When the transaction coordinator receives "completion" feedback from all transaction participants, the entire distributed transaction is completed.

 

The above describes the forward process of XA two-phase submission. Next, let's take a look at the process of failure:

The first stage:

 

img

 

 

second stage:

 

img

 

In the first stage of XA, if a transaction participant reports a failure message, it means that the node's local transaction was not successful and must be rolled back.

So in the second phase, the transaction coordination node sends Abort requests to all transaction participants. After receiving the Abort request, each transaction participant node needs to perform the transaction rollback operation locally, and the rollback operation is performed in accordance with Undo Log.

 

In the asynchronous environment (asynchronous) and no node downtime (fail-stop) model, 2PC can meet full recognition, legal value, can be ended, is a protocol to solve the consistency problem. But if we consider the failure of the node (fail-recover), can 2PC still solve the consistency problem?

  • If the coordinator goes down after the proposal is initiated, then the participant will enter the block state and wait for the coordinator to respond to complete the resolution. At this time, another role is needed to bring the system out of the unendable state. We call this newly added role the coordinator watchdog. After the coordinator is down for a certain period of time, the watchdog takes over the work of the original coordinator. By querying the status of each participant, it is decided whether to submit or abort the phase 2. This also requires the coordinator / participant to record the historical status in case the watchdog queries the participant after the coordinator is down, and retrieves the status after the coordinator is down and recovered. After receiving a transaction request from the coordinator, initiating a proposal, and completing the transaction, RTT (propose + commit) is added 2 times after the 2PC protocol, resulting in a relatively small increase in latency.

XA three-phase submission

3PC (three phase commit) is a three-phase commit. Since 2PC can achieve consistency under the model of asynchronous network + node downtime recovery, what else does 3PC need to do, what the hell is 3PC?

In 2PC, the status of a participant is known only by itself and the coordinator. If the coordinator proposes itself to be down, and before the watchdog is enabled, a participant is also down. . Other participants will enter a blocking state that cannot be rolled back and cannot be forced to commit until the participant is down and recovered. This raises two questions:

Can the blocking be removed, so that the system can rollback before commit / abort to the initial state before the resolution is initiated? In the current resolution, can the participants know each other's status, or the participants do not depend on the status of the other party

 

 

After receiving feedback from the participant, the coordinator enters phase 2 and sends prepare to commit instructions to each participant. Participants can lock resources after receiving instructions to submit, but require related operations to be rolled back. After receiving the acknowledgment (ACK), the coordinator enters phase 3 and performs commit / abort. Phase 3 of 3PC is the same as phase 2 of 2PC. The coordinator watchdog and status logging also apply to 3PC. If the participant goes down at different stages, let's take a look at how 3PC responds:

  • Phase 1: The coordinator or watchdog did not receive the vote of the downtime participant, and directly aborted the transaction; after the downtime participant recovered, read the record and found that no vote was issued, and the transaction was aborted on its own.

  • Phase 2: The coordinator has not received the precommit ACK of the downtime participant, but because it has received the approval feedback of the downtime participant (or it will not enter phase 2), the coordinator commits; the watchdog can obtain these by asking other partners Information, the process is the same; after the downtime participant recovers and finds that it has received a precommit or has issued a vote in favor, it commits the transaction itself

  • Phase 3: Even if the coordinator or watchdog does not receive the commit ACK of the downtime participant, the transaction is ended; after the downtime partner recovers and finds that it has received a commit or precommit, it will commit the transaction by itself

Because of the prepare to commit phase, the transaction delay of the 3PC has also increased by 1 RTT to 3 RTT (propose + precommit + commit), but it prevents the entire system from entering a blocked state after the participant is down , Which enhances the usability of the system, which is very worthwhile for some realistic business scenarios.

 

Two-phase commit of HBase Snapshot

When hbase performs a snapshot operation for a specified table, it is actually all regions of the corresponding table that actually execute the snapshot. Because these regions are distributed on multiple RegionServers, a mechanism is needed to ensure that all regions participating in the snapshot are either completed or have not started to do so, and intermediate states cannot occur. For example, some regions are completed and some regions are not completed. .

 

HBase uses a two-phase commit protocol (2PC) to ensure the distributed atomicity of snapshots. In the prepare phase, the coordinator sends prepare commands to all participants. All participants start to obtain the corresponding resources (such as lock resources) and perform the prepare operation to confirm that they can be successfully executed. Usually, the core work is completed in the prepare operation. And return to the coordinator prepared response. After the coordinator receives the prepared responses returned by all participants (indicating that all participants are ready to submit), the commit state is persisted locally and enters the commit phase. The coordinator sends a commit command to all participants, and the participants receive After the commit command, the commit operation is performed and resources are released. Usually the commit operation is very simple.

Next, let's see how hbase uses the 2PC protocol to build a snapshot architecture. The basic steps are as follows:

  1. Preparation stage: HMaster creates a '/ acquired-snapshotname' node in zookeeper and writes snapshot related information (snapshot table information) on this node. After all regionservers have monitored this node, check whether the target table exists on the current regionserver according to the snapshot table information carried by the / acquired-snapshotname node. If it does not exist, ignore this command. If it exists, traverse all regions in the target table and perform snapshot operations for each region separately. Note that the results of the snapshot operation are not written to the final folder, but to the temporary folder. After the execution of the regionserver is completed, a new child node / acquired-snapshotname / nodex will be created under the / acquired-snapshotname node, indicating that the nodex node has completed snapshot preparations for all relevant regions on the regionserver.

  1. Commit phase: Once all regionservers have completed snapshot preparation, that is, they have created corresponding child nodes under the / acquired-snapshotname node, hmaster thinks that the preparation of snapshot is complete. The master will create a new node / reached-snapshotname, which means to send a commit command to the participating regionservers. After all regionservers have detected the / reached-snapshotname node, the snapshot commit operation is performed. The commit operation is very simple. You only need to move the results generated in the prepare phase from the temporary folder to the final folder. After the execution is completed, a new child node / reached-snapshotname / nodex is created under the / reached-snapshotname node, indicating that node nodex has completed the snapshot work.

  1. abort stage: If the number of / acquired-snapshotname nodes does not meet the conditions within a certain period of time (and the preparation of the regionserver has not been completed), hmaster thinks that the preparation of the snapshot has timed out. hmaster will create another new node / abort-snapshotname, and all regionservers will clean up the results of snapshot in the temporary folder after listening to this command.

 

It can be seen that in this system, HMaster acts as a coordinator, and RegionServer acts as a participant. The communication between HMaster and RegionServer is done through Zookeeper. At the same time, the transaction status is also recorded on the nodes on Zookeeper. When the HMaster is highly available, the main HMaster is down. After switching from the HMaster to the main, the transaction can continue to be committed or aborted according to the status on Zookeeper.

Giant's shoulder:

https://github.com/wangzhiwubigdata/God-Of-BigData/blob/master/

http://hbasefly.com/2017/09/17/hbase-snapshot/

https://mp.weixin.qq.com/s?__biz=MzIxMjE5MTE1Nw==&mid=2653193461&idx=1&sn=d69ccec780ae6d3b0c722cf09fa154d1&chksm=8c99f62fbbee7f39cd221bd0ecc9105a5c16e353d82d2407e7f295da9f9172cfd4889d3f12c8&scene=21#wechat_redirect

 

Guess you like

Origin www.cnblogs.com/zz-ksw/p/12727415.html