"Distributed transaction", thoroughly understand this time!

Today, when distributed and microservices are popular, I believe that everyone is familiar with these terms. And when it comes to the benefits of using distributed or splitting microservices, you can definitely think of a lot.


image.png

Picture from Baotu Network


For example, each person only needs to maintain their own separate service, without any previous code conflicts. If you want to test, release, or upgrade, you only need the code written by Care, which is very convenient and caring!


However, things have two sides, and it also brings some problems. Today's article talks about one of the thorny problems brought about by distributed system architecture: distributed transactions!


What is a transaction?


First raise a question: What is a transaction? Some people will say that a transaction is a series of operations that either succeed or fail at the same time; then they will start the narrative from the ACID characteristics of the transaction (atomicity, consistency, isolation, and durability).


Indeed, the transaction is to ensure that a series of operations can be executed normally, and it must also meet the ACID characteristics.


But today, let’s think about it from a different perspective. We not only need to know what (for example, what is a transaction), but also the Why of a transaction (for example, why is there the concept of a transaction? What problem does a transaction solve).


Sometimes, from another angle, there may be different gains.


Look at business from another angle


Just as classic literary works come from life, but are higher than life, the concept of affairs also comes from life. The introduction of "affairs" must be to solve a certain problem, otherwise, who wants to do such boring things?


The simplest and most classic example: bank transfer, we want to transfer 1,000 yuan from account A to account B.


Under normal circumstances, if 1000 is transferred from A to account B, the balance of A account is reduced by 1000 (we use Action1 for this operation), and the balance of B account is increased by 1000 (we use Action2 for this operation)


First of all, we must make it clear that Action1 and Action2 are two operations. Since there are two operations, there must be a sequence of execution.


Then there may be a problem when Action1 is executed and just ready to execute Action2 (for example, the database is overloaded and temporarily denied access).


Analogy to our lives, that is, I transferred 1,000 yuan to a friend, and then the balance in my card was less than 1,000, but my friend did not receive the money.


In order to solve the problem of "Where did Money go", the concept of "transaction" was introduced. In other words, since you cannot guarantee 100% success when I transfer money, for example, the banking system can only guarantee 99.99% high availability, then if the above problem occurs within 0.01% of the time, the banking system directly rolls back the Action1 operation ? (That is, add 1,000 yuan back to the balance)


For the banking system, maybe 0.01% of the time I cannot guarantee that Action1 and Action2 will succeed at the same time, so when something goes wrong, I guarantee that both of them will fail at the same time. (Atomicity of transaction)


Through this example, the two questions raised at the beginning have been answered (Why is there a transaction? What problem does a transaction solve?)


To summarize: transaction is to ensure that a series of operations can be executed safely and correctly under any circumstances through its ACID feature.


Transactions in Java


After understanding the transaction, let's look at something familiar, how do transactions in Java play?


In Java, what we usually use most is to add @Transactional annotations to the addition, deletion and modification methods of the Service layer, so that Spring can help us manage our affairs.


Its bottom layer will generate a corresponding Proxy dynamic proxy for our Service component, so that all methods of the Service component are taken over by its corresponding Proxy.


When the Proxy calls the corresponding business method such as add(), the Proxy will execute setAutoCommit(false) to open the transaction based on the AOP idea before calling the real business method.


Then execute Commit after the business method is executed to commit the transaction, and when an exception occurs during the execution of the business method, Rollback will be executed to roll back the transaction.


Of course, the specific implementation details of @Transactional annotation will not be expanded here. This is not the focus of this article. The topic of this article is "distributed transaction". If you are interested in @Transactional annotation, you can interrupt the Debug source code research by yourself. The source code tells the truth.


What is a distributed transaction?


After so long, I finally reached the first focus of this article! First of all, have you ever thought about it: Since there are transactions, and it is so convenient to use Spring's @Transactional annotation to control transactions, why do you have to develop a concept of distributed transactions?


Furthermore, what is the relationship between distributed transactions and ordinary transactions? What's the difference? What problem does distributed transaction solve?


Various questions follow one after another, don't worry, with these thoughts, let's talk about distributed transactions in detail next.


Since it is called a distributed transaction, it must have something to do with distributed! Simply put, distributed transactions refer to transactions in a distributed system.


Okay, let's continue, first look at the following picture:

image.png

As shown in the figure above, a single-block system has 3 modules: employee module, financial module and leave module. We now have an operation that needs to call and complete the interfaces in these 3 modules in order.


This operation is a whole, contained in a transaction, and either succeeds or fails at the same time and rolls back. If you don't succeed, you will become benevolent, and this is no problem.


But when we split the monolithic system into a distributed system or microservice architecture, transactions are not as fun as above.


First, let's take a look at the architecture diagram after splitting into a distributed system, as shown below:

image.png

The figure above is the execution of the same operation in a distributed system. The staff module, financial module and leave module are split into staff system, financial system and leave system respectively.


For example, a user performs an operation. This operation needs to call the employee system for pre-processing, and then use HTTP or RPC to call the interfaces of the financial system and the leave system for further processing, and their operations need to be implemented in the database.


A series of operations of these three systems actually need to be all wrapped in the same distributed transaction. At this time, the operations of these three systems either succeed or fail at the same time.


Completing an operation in a distributed system usually requires coordinated calls and communication between multiple systems, such as the example above.


The three subsystems: employee system, financial system, and leave system communicate through HTTP or RPC, instead of calling between different modules in a monolithic system. This is the biggest difference between a distributed system and a monolithic system .


Some students who usually don't pay much attention to distributed architecture may say here: I just use Spring's @Transactional annotation and it's OK, so what about it!


但是这里极其重要的一点:单块系统是运行在同一个 JVM 进程中的,但是分布式系统中的各个系统运行在各自的 JVM 进程中。


因此你直接加 @Transactional 注解是不行的,因为它只能控制同一个 JVM 进程中的事务,但是对于这种跨多个 JVM 进程的事务无能无力。


分布式事务的几种实现思路


搞清楚了啥是分布式事务,那么分布式事务到底是怎么玩儿的呢?下边就来给大家介绍几种分布式事务的实现方案。


可靠消息最终一致性方案


整个流程图如下所示:

image.png

我们来解释一下这个方案的大概流程:

  • A 系统先发送一个 Prepared 消息到 MQ,如果这个 Prepared 消息发送失败那么就直接取消操作别执行了,后续操作都不再执行。

  • 如果这个消息发送成功了,那么接着执行 A 系统的本地事务,如果执行失败就告诉 MQ 回滚消息,后续操作都不再执行。

  • 如果 A 系统本地事务执行成功,就告诉 MQ 发送确认消息。

  • 那如果 A 系统迟迟不发送确认消息呢?此时 MQ 会自动定时轮询所有 Prepared 消息,然后调用 A 系统事先提供的接口,通过这个接口反查 A 系统的上次本地事务是否执行成功。

    如果成功,就发送确认消息给 MQ;失败则告诉 MQ 回滚消息。(后续操作都不再执行)

  • 此时 B 系统会接收到确认消息,然后执行本地的事务,如果本地事务执行成功则事务正常完成。

  • 如果系统 B 的本地事务执行失败了咋办?基于 MQ 重试咯,MQ 会自动不断重试直到成功,如果实在是不行,可以发送报警由人工来手工回滚和补偿。


这种方案的要点就是可以基于 MQ 来进行不断重试,最终一定会执行成功的。


因为一般执行失败的原因是网络抖动或者数据库瞬间负载太高,都是暂时性问题。


通过这种方案,99.9% 的情况都是可以保证数据最终一致性的,剩下的 0.1% 出问题的时候,就人工修复数据呗。


适用场景:这个方案的使用还是比较广,目前国内互联网公司大都是基于这种思路玩儿的。


最大努力通知方案


整个流程图如下所示:

image.png

这个方案的大致流程:

  • 系统 A 本地事务执行完之后,发送个消息到 MQ。

  • 这里会有个专门消费 MQ 的最大努力通知服务,这个服务会消费 MQ,然后写入数据库中记录下来,或者是放入个内存队列。接着调用系统 B 的接口。

  • 假如系统 B 执行成功就万事 OK 了,但是如果系统 B 执行失败了呢?

  • 那么此时最大努力通知服务就定时尝试重新调用系统 B,反复 N 次,最后还是不行就放弃。


这套方案和上面的可靠消息最终一致性方案的区别:可靠消息最终一致性方案可以保证的是只要系统 A 的事务完成,通过不停(无限次)重试来保证系统 B 的事务总会完成。


但是最大努力方案就不同,如果系统 B 本地事务执行失败了,那么它会重试 N 次后就不再重试,系统 B 的本地事务可能就不会完成了。至于你想控制它究竟有“多努力”,这个需要结合自己的业务来配置。


比如对于电商系统,在下完订单后发短信通知用户下单成功的业务场景中,下单正常完成,但是到了发短信的这个环节由于短信服务暂时有点问题,导致重试了 3 次还是失败。


那么此时就不再尝试发送短信,因为在这个场景中我们认为 3 次就已经算是尽了“最大努力”了。


简单总结:就是在指定的重试次数内,如果能执行成功那么皆大欢喜,如果超过了最大重试次数就放弃,不再进行重试。


适用场景:一般用在不太重要的业务操作中,就是那种完成的话是锦上添花,但失败的话对我也没有什么坏影响的场景。


比如上边提到的电商中的部分通知短信,就比较适合使用这种最大努力通知方案来做分布式事务的保证。


TCC 强一致性方案


TCC的 全称是

  • Try(尝试)

  • Confirm(确认/提交)

  • Cancel(回滚)


这个其实是用到了补偿的概念,分为了三个阶段:

  • Try 阶段:这个阶段说的是对各个服务的资源做检测以及对资源进行锁定或者预留。

  • Confirm 阶段:这个阶段说的是在各个服务中执行实际的操作。

  • Cancel 阶段:如果任何一个服务的业务方法执行出错,那么这里就需要进行补偿,就是执行已经执行成功的业务逻辑的回滚操作。


还是给大家举个例子:

image.png

比如跨银行转账的时候,要涉及到两个银行的分布式事务,如果用 TCC 方案来实现,思路是这样的:

  • Try 阶段:先把两个银行账户中的资金给它冻结住就不让操作了。

  • Confirm 阶段:执行实际的转账操作,A 银行账户的资金扣减,B 银行账户的资金增加。

  • Cancel 阶段:如果任何一个银行的操作执行失败,那么就需要回滚进行补偿,就是比如 A 银行账户如果已经扣减了,但是 B 银行账户资金增加失败了,那么就得把 A 银行账户资金给加回去。


适用场景:这种方案说实话几乎很少有人使用,我们用的也比较少,但是也有使用的场景。


因为这个事务回滚实际上是严重依赖于你自己写代码来回滚和补偿了,会造成补偿代码巨大,非常之恶心。


比如说我们,一般来说跟钱相关的,跟钱打交道的,支付、交易相关的场景,我们会用 TCC,严格保证分布式事务要么全部成功,要么全部自动回滚,严格保证资金的正确性,在资金上不允许出现问题。


比较适合的场景:除非你是真的一致性要求太高,是你系统中核心之核心的场景,比如常见的就是资金类的场景,那你可以用 TCC 方案了。


You need to write a lot of business logic yourself, judge whether each link in a transaction is OK, and execute the compensation/rollback code if it is not OK.


And it is best that the execution time of your various businesses is relatively short. But to be honest, generally try not to do this. Handwriting rollback logic or compensation logic by yourself is really disgusting. The business code is difficult to maintain.


to sum up


This article introduces what is a distributed transaction, and then also introduces the 3 most commonly used distributed transaction schemes

But in addition to the above scheme, there are actually two-stage submission schemes (XA scheme) and local message tables.


But to be honest, very few companies use these programs, and due to space limitations, we won't introduce them. If there is a chance to publish an article in the future, I will talk about the ideas of these two solutions in detail.


Guess you like

Origin blog.51cto.com/14410880/2550584