Microservices Distributed Transaction Processing

When we migrate to a microservice architecture, how to handle distributed transactions is a problem that must be considered. This article introduces two solutions for distributed transaction processing, and a suitable solution can be adopted in combination with the actual situation. Original text: Handling Distributed Transactions in the Microservice world ^[1]

Everyone (including me) is thinking, building microservices these days, and distributed systems are the core tenets of microservices and the context in which everything is implemented.

What is a distributed transaction?

Transactions that span multiple physical systems or computers on a network are simply called distributed transactions. In the world of microservices, a transaction is split into multiple services that need to be called sequentially to complete the entire transaction.

The following is an example of a monolithic e-commerce system using transactions:

Figure 1: Transactions in a Monolith

In the above system, if a user sends a Checkout request to the platform, the platform will create a local database transaction that operates multiple database tables to process orders and reserve items from inventory. Transactions (including orders and reserved items) can be rolled back if any step fails . This is called ACID (Atomicity, Consistency, Isolation, Durability) and is guaranteed by the database system.

Here's how an e-commerce system breaks down into microservices:

Figure 2: Transactions in Microservices

When we decoupled this system, we created microservices OrderMicroserviceand InventoryMicroservice, each with their own independent database. When a user initiates a Checkout request, both microservices will be called to apply the changes to their respective databases. Because the transaction is across multiple databases through multiple systems, this is now a distributed transaction .

What's wrong with distributed transactions in microservices?

随着微服务体系架构的出现，事务可以跨越多个微服务，从而跨越数据库，因此我们现在无法利用数据库的ACID特性，从而面临以下关键问题：

如何保持事务的原子性？

原子性意味着事务要么完成所有步骤，要么没有完成任何步骤。在上面的例子中，如果InventoryMicroservice方法中的“保留商品”失败，如何回滚OrderMicroservice应用的“处理订单”？

如何处理并发请求？

如果某个微服务的对象被持久化到数据库中，同时有另一个请求读取相同的对象。服务应该返回旧数据还是新数据？在上面的例子中，一旦OrderMicroservice已经完成，那么InventoryMicroservice在执行更新的过程时，客户下单的请求中应该包括当前的订单吗？

如今，系统应该为失败而设计，其中主要的问题就是处理分布式事务。下面引用Pat Helland的话：

一般来说，应用程序开发人员不会简单的就能实现支持分布式事务的大型可伸缩应用系统。—— Pat Helland

可能的解决方案

在设计和构建基于微服务的应用时，上述两个问题非常关键。为了解决这些问题，下面列举几种方法：

两阶段提交（Two-Phase Commit）
最终一致性和补偿（Eventual Consistency and Compensation ）/ SAGA

1. 两阶段提交

顾名思义，这种处理事务的方式有两个阶段，准备阶段和提交阶段，其中起到重要作用的是事务协调器（Transaction Coordinator），负责维护事务的生命周期。

工作方式：

在准备阶段，所有涉及到的微服务都准备提交，并通知协调器已经准备好完成事务。然后在提交阶段，事务协调器向所有微服务发出提交或回滚命令。

以电子商务系统为例：

Figure 3: Successful two-phase commit on a microservice

图3: 在微服务上成功的两阶段提交

在上面的示例中(图3)，当用户发送Checkout请求时，TransactionCoordinator将发起一个带有所有上下文信息的全局事务。首先，向OrderMicroservice发送prepare命令创建订单。然后，向InventoryMicroservice发送prepare命令保留商品。当两个服务都可以执行更改时，它们将锁定对象，不再接受其他更改，并通知TransactionCoordinator。一旦TransactionCoordinator确认所有微服务都已准备好应用更改，就会通过请求事务commit来要求这些微服务持久化所作的更改，然后所有对象才能被解锁。

Figure 4: Failed two-phase commit on microservices

图4: 在微服务上失败的两阶段提交

在失败的场景中(图4)——如果在任何时候有某个微服务没有做好准备，TransactionCoordinator将中止事务并发起回滚流程。图中由于某种原因，OrderMicroservice未能创建订单，但是InventoryMicroservice已经回复说它准备创建订单。TransactionCoordinator将请求InventoryMicroservice中止创建订单，并回滚所做的任何更改、解锁数据库对象。

优点

该方法保证事务是原子的。交易结束时，要么所有微服务都成功，要么所有微服务都没有改变。
其次，允许读写分离，在事务协调器提交更改之前，对象上的更改是不可见的。
这种方法通过同步调用通知客户端成功或失败。

缺点

没什么事情是完美的，两阶段提交与单个微服务的处理时间比起来慢很多，并且高度依赖于事务协调器，在高负载期间，事务协调器确实会降低系统的速度。
另一个主要缺点是数据库行锁定，该锁可能成为性能瓶颈，并且可能出现两个事务相互锁定造成的死锁。

2. 最终一致性和补偿/SAGA

最终一致性的最佳定义之一是microservices.io^[2]描述的：每个服务在更新数据时发布一个事件。其他服务订阅事件，当接收到事件时，更新其数据。

在这种方法中，分布式事务由相关微服务上的异步本地事务来完成，微服务通过事件总线相互通信。

工作方式：

再以电子商务系统为例：

Figure 5: Eventual consistency/SAGA, a successful scenario

图5: 最终的一致性/SAGA，成功的场景

在上面的例子中(图5)，客户端请求系统处理订单。在处理过程中，Choreographer发出一个Create Order事件，表示开始一个事务。OrderMicroservice监听到这个事件并创建一个订单，如果成功，发出一个Order Created事件。Choreographer侦听此事件，并通过发出Reserve items事件继续保留商品。InventoryMicroservice侦听此事件并保留商品，如果成功，发出Items Reserved事件。在这个例子中，这意味着事务的结束。

微服务之间所有基于事件的通信都是通过事件总线进行的，并由另一个系统编排以解决复杂性问题。

Figure 6: Eventual Consistency/SAGA, Failure Scenario

图6: 最终的一致性/SAGA，失败场景

如果由于任何原因InventoryMicroservice未能保留商品(图6)，它会发出Failed to Reserve Items事件。Choreographer侦听此事件，并通过发出Delete Order事件启动补偿事务。OrderMicroservice侦听此事件并删除所创建的订单。

优点

这种方法的一大优点是每个微服务只关注自己的原子事务。如果某个服务花费了更长的时间，其他微服务不会被阻塞，这也意味着不需要数据库锁。由于其基于异步事件的解决方案，这种方法可以使系统在高负载下具有高度的可伸缩性。

缺点

该方法的主要缺点是没有读取隔离。这意味着在上面的示例中，客户端可以看到已创建的订单，但在下一秒中，由于补偿事务，订单会被删除。此外，当微服务的数量增加时，调试和维护就变得更加困难。

结论

首先尽量避免分布式事务，如果正在构建新应用，那么就从单体开始，如Martin Fowler在MonolithFirst^[3]中所描述的那样：

更常见的方法是从单体开始，逐渐剥离边缘的微服务。这种方法可以在微服务体系架构的核心留下一个巨大的单体，大多数新的开发都发生在微服务中，而这个单体相对来说变化不大。— Martin Fowler

当一个事件需要在两个地方更新数据时，与两阶段提交相比，最终一致性/SAGA方案是处理分布式事务的更好的方式，主要原因是两阶段提交在分布式环境中不能伸缩。不过最终一致性方案引入了新问题，例如如何以原子方式更新数据库和发出事件，因此采用这种方案需要开发和测试团队改变思维方式。

References:
[1] Handling Distributed Transactions in the Microservice world: medium.com/swlh/handli…
[2] Event Driven Architecture: microservices.io/patterns/da…
[3] MonolithFirst: martinfowler.com/bliki/Monol…

Hello, I am Yu Fan. I have done R&D at Motorola and now do technical work at Mavenir. I have always maintained a strong interest in communications, networking, back-end architecture, cloud native, DevOps, CICD, blockchain, AI and other technologies , I usually like to read and think, believe in continuous learning and life-long growth, welcome to exchange and learn together.
WeChat public account: DeepNoMind