The realization principle of TCC distributed transaction (compensation mechanism)

核心思想Yes: For each operation, a corresponding confirmation and compensation (cancellation) operation must be registered. It is divided into three stages:

  • Try stage: mainly to check the business system (consistency) and resource reservation (quasi-isolation)
  • Confirm phase: It is mainly to confirm and submit the business system. When the Try phase is executed successfully and the Confirm phase starts, the default Confirm phase will not make mistakes. That is: As long as the Try is successful, Confirm must be successful. (Confirm operation satisfies idempotence. Idempotent design is required. Retry is required after Confirm fails.
    )
  • The Cancel phase is mainly to cancel the business executed in the state of business execution error and need to be rolled back, and the reserved resources are released. (Cancel operation satisfies idempotence)

TCC distributed transaction
The single point of the coordinator is solved , and this business activity is initiated and completed by the main business party. The business activity manager has also become multipoint, introducing clusters.
Synchronous blocking: Introduce a timeout, compensate after the timeout, and will not lock the entire resource, convert the resource into a business logic form, and reduce the granularity.
Data consistency , after having a compensation mechanism, the business activity manager controls the consistency.

Disadvantages : It may fail in Confirm and Cancel. TCC is a compensation method at the application layer , so programmers need to write a lot of compensation code when implementing it. In some scenarios, some business processes may not be well defined and processed with TCC.


The following is transferred from: Please, please don't ask me about the implementation principle of TCC distributed transaction during the interview! [The Architecture Notes of Shishan]

1. Write in front

In this article, I will use the vernacular + manual drawing, combined with the case practice of an e-commerce system, to explain to everyone exactly what TCC distributed transaction is.

First of all, there may be some Spring Cloud principles involved. If you have students who are not very clear, you can refer to the previous article: " Please, please don't ask me about the underlying principles of Spring Cloud in the interview! ".

Two, business scenario introduction

Let's take a look at the business scenario first. Suppose you now have an e-commerce system with an order payment scenario.

After paying for an order, we need to do the following steps:

  • Change the status of the order to "paid"
  • Deduction of commodity inventory
  • Add points to members
  • Create a sales outbound order to notify the warehouse to deliver goods
    Insert picture description here

Three, further thinking

Realize the effect of a TCC distributed transaction.

  • Order service-modify order status,
  • Inventory service-deduction of inventory,
  • Points service-increase points,
  • Warehousing services-create sales outbound orders.

The above steps, either succeeded together or failed together, must be a holistic transaction.

For example, the status of the order is now changed to "paid", and the inventory service fails to deduct the inventory. The inventory of that product was originally 100 pieces. Now I have sold 2 pieces. It should have been 98 pieces.

The results of it? Due to the abnormality of the inventory service operation database, the inventory quantity is still 100. Isn't this a pitfall? Of course this cannot be allowed to happen!

But if you don't use the TCC distributed transaction solution, just use Spring Cloud to develop such a microservice system, it is very likely to do this kind of thing.

Let's take a look at the following figure, which intuitively expresses the above process.
Insert picture description here
So we有必要使用TCC分布式事务机制来保证各个服务形成一个整体性的事务。

Either all of the above steps are successful. If any service operation fails, all of them are rolled back together and the completed operation is undone .

For example, if the inventory service fails to deduct the inventory, then the order service must cancel the operation of modifying the order status, and then stop performing the two operations of adding points and notifying the delivery.
Insert picture description here

Fourth, implement TCC distributed transactions on the ground

How to implement a TCC distributed transaction so that each service will succeed together? Either fail together?

Let's analyze it step by step. It is explained with a Spring Cloud development system as a background.

1. TCC realization stage 1: Try

First of all, where the order service is, his code roughly looks like this:
Insert picture description here

In fact, after the order service completes the local database operation, it uses Spring Cloud's Feign to call other services. But just relying on this code is not enough to implement TCC distributed transactions? !

First of all, the above 订单服务one changes its own status to: OrderStatus.UPDATING .
In the pay() method, change the order status (paid) to UPDATING( 修改中). It only means that someone is modifying this status.

库存服务In the reduceStock() interface provided directly, don't directly deduct the inventory. You can freeze the inventory .
For example, originally your inventory quantity is 100, you should not directly 100-2 = 98, deduct this inventory!
You can set the saleable inventory: 100-2 = 98, set it to 98 is no problem, and then set a 2 in a separate frozen inventory field. In other words, 2 stocks are frozen.

积分服务The addCredit() interface is the same, don’t directly add member points to the user. You can first add points in a pre-added points field in the points table .
For example: the user points were originally 1190, but now you have to add 10 points, don't directly 1190 + 10 = 1200 points!
You can keep the points unchanged at 1190. In a pre-increase field, for example, prepare_add_credit field, set a 10, which means there are 10 points ready to increase.

仓储服务The saleDelivery() interface is the same. You can create a sales delivery order first, but the status of the sales delivery order is " UNKNOWN ".
In other words, this sales outbound order has just been created, and it is not yet sure what his status is at this time!

The above process of transforming the interface is actually the stage represented by the first T letter in the so-called TCC distributed transaction, which is the Try stage.

Summarizing the above process,如果你要实现一个TCC分布式事务,首先你的业务的主流程以及各个接口提供的业务含义,不是说直接完成那个业务操作,而是完成一个Try的操作。

This operation is generally to lock a certain resource, set a prepared state, freeze part of the data, etc., which are probably all such operations.
Insert picture description here
Then it is divided into two situations:

2. TCC implementation phase two: Confirm

The first situation is ideal, that is, each service executes its own Try operation, and all executes successfully. At this point, you need to rely on TCC分布式事务框架to promote follow-up execution.

Here is a brief mention, if you want to play TCC distributed transactions, you must introduce a TCC distributed transaction framework, such as the domestic open source ByteTCC, himly, tcc-transaction. Otherwise, it is impossible to realize the implementation of each stage and advance the implementation of the next stage by hand, which is too complicated.

If you introduce a TCC distributed transaction framework in each service, the TCC distributed transaction framework embedded in the order service can perceive that the Try operation of each service is successful.

At this time, the TCC distributed transaction framework will control the entry to the next stage of TCC, the first C stage, which is the Confirm stage. In order to achieve this stage, you need to add some code to each service.

For example, 订单服务you can add a Confirm logic to formally set the status of the order to "paid", which is probably similar to the following:
Insert picture description here
库存服务similarly, you can have an InventoryServiceConfirm class that provides a reduceStock( ) The Confirm logic of the interface, here is to deduct the 2 stocks of the previously frozen stock field to 0.
In this case, the saleable inventory has changed to 98 before, and now the two frozen inventories are gone, and the inventory deduction is officially completed.

积分服务Similarly, you can provide a CreditServiceConfirm class in the points service, which has a Confirm logic of the addCredit() interface, which is to deduct the 10 points of the pre-added field, and then add it to the actual member points field, changing from 1190 to 1120.

仓储服务The same is true. You can provide a WmsServiceConfirm class in the warehousing service, provide a confirmation logic of the saleDelivery() interface, and formally modify the status of the sales outbound order to "created", which can be viewed and used by warehousing managers instead of staying "UNKNOWN" in the previous intermediate state.

Ok, the Confirm logic of the above various services is implemented. Once the TCC distributed transaction framework in the order service senses that the Try phase of each service is successful, it will execute the Confirm logic of each service .

The TCC transaction framework in the order service will be responsible for communicating with the TCC transaction framework in each service, and call the Confirm logic of each service in turn. Then, the execution of all business logic of each service is formally completed.
Insert picture description here

3. TCC realization stage three: Cancel

What if it is an abnormal situation? For example: in the Try phase, such as the point service, he executed an error, what will happen at this time?
The TCC transaction framework in the order service can be perceived, and then he will decide to roll back the entire TCC distributed transaction.

In other words, it will execute each service 第二个C阶段,Cancel阶段。

Similarly, in order to achieve this Cancel phase, each service has to add some code.

First of all 订单服务, he has to provide an OrderServiceCancel class, in which there is a Cancel logic of the pay() interface, that is, the status of the order can be set to "CANCELED", that is, the status of the order is cancelled.

库存服务The same is true. The Cancel logic of reduceStock() can be provided, which is to deduct the frozen stock by 2 and add it back to the saleable stock, 98 + 2 = 100.

积分服务It is also necessary to provide the Cancel logic of the addCredit() interface to deduct the 10 points of the pre-added points field.

仓储服务It is also necessary to provide a Cancel logic of the saleDelivery() interface to modify the status of the sales outbound order to "CANCELED" and set it to cancelled.

Then at this time, as long as the TCC distributed transaction framework of the order service senses that the Try logic of any service has failed, it will communicate with the TCC distributed transaction framework in each service, and then call the Cancel logic of each service.
Insert picture description here

Five, summary and thinking

If you want to play TCC distributed transactions:

First, you need to select a certain TCC distributed transaction framework, and each service will have this TCC distributed transaction framework running.

Then your original interface needs to be transformed into three logics, Try-Confirm-Cancel.

  1. First, the service call link executes the Try logic in turn
  2. If everything is normal, the TCC distributed transaction framework will advance the execution of Confirm logic and complete the entire transaction. If everything is normal, the TCC distributed transaction framework will advance the execution of Confirm logic and complete the entire transaction.
  3. If there is a problem with the Try logic of a service, the TCC distributed transaction framework will advance the execution of the Cancel logic of each service after sensing it, and cancel the various operations performed before.

This is the so-called TCC distributed transaction.

The core idea of ​​TCC distributed transaction, to put it bluntly, is that when you encounter the following situations,

  • The database of a certain service is down
  • A certain service hung up by itself
  • The infrastructure of that service such as redis, elasticsearch, and MQ is malfunctioning
  • Some resources are insufficient, such as insufficient inventory.

1. Try it first, don't complete the business logic, first try to see if each service can basically operate normally, and can you freeze the resources I need first.

2. If the Try is ok, that is to say, the underlying database, redis, elasticsearch, and MQ can all write data, and you have reserved some resources that you need to use (such as freezing a part of the inventory).

3. Then, execute the Confirm logic of each service. Basically, Confirm can guarantee the completion of a distributed transaction with a high probability.

4. Then if a service fails in the Try phase, for example, the underlying database is down, or redis is down, and so on.
At this time, the Cancel logic of each service is automatically executed, and the previous Try logic is rolled back. All services do not execute any designed business logic. Ensure that everyone succeeds together or fails together.

Six. Other issues

If something unexpected happens, for example, the order service suddenly hangs and then restarts again, how does the TCC distributed transaction framework ensure that the distributed transactions that have not been executed before continue to execute?

Therefore, the TCC transaction framework 都是要记录一些分布式事务的活动日志的can be recorded in the log file on the disk or in the database. The various stages and states of distributed transaction operation are preserved.

The problem is not over yet, what if the Cancel or Confirm logic execution of a certain service keeps failing?

It is also very simple. The TCC transaction framework records the status of each service through the activity log.

For example, if it is found that the Cancel or Confirm of a certain service has not been successful, it will keep retrying to call its Cancel or Confirm logic, and it must be successful!

Of course, if your code does not write any bugs, has sufficient testing, and you have basically tried it in the Try phase, then in general, Confirm and Cancel can be successful!

Finally, I will give you a picture to take a look at our business, plus the entire execution process after distributed transactions:
Insert picture description here
many large companies actually develop the TCC distributed transaction framework by themselves, specifically in the company internal use.

However, if your company has not developed a TCC distributed transaction framework, it will generally choose an open source framework.

Here the author recommends several good frameworks for everyone, all of which are open sourced by our own domestic sources: ByteTCC, tcc-transaction, himly.

If you are interested, you can go to their github address to learn how to use it and how to integrate it with service frameworks such as Spring Cloud and Dubbo.

As long as those frameworks are integrated into your system, it is easy to achieve the above wonderful TCC distributed transaction effect.

In the next article, let's talk about the distributed transactions implemented by the reliable message eventual consistency solution, and at the same time talk about the high-availability guarantee architecture that uses this solution in actual production.

Guess you like

Origin blog.csdn.net/eluanshi12/article/details/84528393