Distributed Transaction Analysis

1. Problem Analysis In order    
to build a structure, you must identify the problem, that is, who is the problem and what is the problem.
Obviously, the distributed architecture solves the problem of high concurrency, high service availability and data consistency under high concurrency; when the scale is small, a single database HA can meet the request, and when the business scale continues to increase, the single database has Unable to meet business needs, the mainstream practice in the industry is to divide the business into tables and databases. Then some of the original businesses need to be in one transaction to ensure that the two databases operate successfully or unsuccessfully at the same time (one database is successful, one library fails, either retry the failed library operation until it succeeds, or roll back the successful library). The question that follows is how to ensure the data consistency of business operations during database division. Understanding the high concurrency distributed architecture, the issues and origins of data consistency in distributed systems is the first step.
A little more verbose here. After the library is divided, each library can use a different language to provide services to the outside world with popular microservices; however, when the business volume is not large, the use of microservices increases the complexity and technical costs. . Understanding the origin of technology, adopting appropriate architecture and carrying services in the most appropriate way for different business volumes is a must-have capability for architects.
 
2. Interpretation of common concepts:
a. Relational databases usually have ACID characteristics: Atomicity, Consistency, Isolation, Durability.
b.Base (basically available, soft state, eventually consistent): An alternative to Acid, the availability of BASE is achieved by supporting local failures rather than system-wide failures. In chemical theory, ACID is an acid, and Base happens to be a base.
c. CAP Law: In a distributed system, it is impossible to satisfy all three of "Consistency", "Availability" and "Partition Tolerance" in "CAP Law" at the same time.
d. Strong consistency: After the update operation is completed, any access by multiple subsequent processes or threads will return the latest updated value. This is the most user-friendly, that is, what the user wrote last time is guaranteed to be read next time. According to CAP theory, this implementation requires sacrificing availability, common with RDBMS.
e. Weak Consistency: The system does not guarantee that the access of a continuous process or thread will return the latest updated value. After the data is successfully written, the system does not promise to read the newly written value immediately, nor does it promise to read the latest value.
f. Eventual consistency: A specific form of weak consistency. The system guarantees that the system will eventually return the value of the last update operation without subsequent updates. Under the premise that no failure occurs, the time of the inconsistency window is mainly affected by the communication delay, the system load and the number of replicas. DNS is a typical eventual consistency system.
In order to ensure availability, the Internet distributed architecture converts strong consistency requirements into eventual consistency requirements, and implements idempotency guarantees through the system to ensure the eventual consistency of data.
 
Idempotence : The cornerstone of distributed architecture, that is, no matter how many times the same operation is requested, the result will be the same.
典型的是HTTP,Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request.
 
What each concept actually solves is a specific problem encountered by people, and finding the problem behind it is the second step in understanding high-concurrency distributed architecture and data consistency in distributed systems.
 
3. Interpretation of the principle of Most Simple
Suppose there is a remote API for withdrawing money from an account (it may or may not be HTTP), we temporarily use a class function to record it as:

bool withdraw(account_id, amount)
The semantics of withdraw is to deduct the amount from the account corresponding to account_id The amount of money; if the deduction is successful, it returns true, and the account balance decreases by the amount; if the deduction fails, it returns false, and the account balance remains unchanged.
It is worth noting that we cannot easily assume the reliability of the environment .
A typical scenario is that the withdraw request has been correctly processed by the server, but the result returned by the server is lost due to network and other reasons, so that the client cannot know the processing result. If it is on a web page, some inappropriate design may make the user think that the last operation failed, and then refresh the page, which causes withdraw to be called twice and the account to be debited one more time. As shown in Figure 1:

non-idempotent

A more lightweight solution is an idempotent design. We can make withdraw idempotent through some tricks, such as:

int create_ticket() 
bool idempotent_withdraw(ticket_id, account_id, amount)
create_ticket的语义是获取一个服务器端生成的唯一的处理号ticket_id,它将用于标识后续的操作。idempotent_withdraw和withdraw的区别在于关联了一个ticket_id,一个ticket_id表示的操作至多只会被处理一次,每次调用都将返回第一次调用时的处理结果。这样,idempotent_withdraw就符合幂等性了,客户端就可以放心地多次调用。

基于幂等性的解决方案中一个完整的取钱流程被分解成了两个步骤:1.调用create_ticket()获取ticket_id;2.调用idempotent_withdraw(ticket_id, account_id, amount)。虽然create_ticket不是幂等的,但在这种设计下,它对系统状态的影响可以忽略,加上idempotent_withdraw是幂等的,所以任何一步由于网络等原因失败或超时,客户端都可以重试,直到获得结果。如图所示:

idempotent

和分布式事务相比,幂等设计的优势在于它的轻量级,容易适应异构环境,以及性能和可用性方面。在某些性能要求比较高的应用,幂等设计往往是唯一的选择。
 
幂等性是高并发分布式架构、分布式系统数据一致性的底层基本原理,理解这一步,是走上"成金之路"的关键。
 
4.案例分析
a.eBay经典的BASE模式
一个最常见的场景,如果产生了一笔交易,需要在交易表增加记录,同时还要修改用户表的金额。这两个表属于不同的库及远程服务,所以就涉及到分布式事务一致性的问题。
 
核心思想是用两个事务来保证一致性,同时用异步保证了可用性:一个事务处理主要操作"增加交易表记录"与异步消息构建,另外一个事务用来处理构建的异步消息;第一个事务即处理主要业务又记录次要业务,同时还能快速返回,保证了高可用性,第二个事务则用来保证数据的一致性。(即将buyer和seller表更新的处理转为"线下"处理,消息日志可以存储到本地文本、数据库或消息队列,再通过业务规则自动或人工发起重试。人工重试更多的是应用于支付场景,通过对账系统对事后问题的处理,类似与淘宝双11重复支付后续退款)
 
一个经典的解决方法,将主要修改操作以及更新用户表的"异步消息"放在一个本地事务来完成。同时为了达到多次重试的幂等性,避免重复消费用户表消息带来的问题,增加一个更新记录表  updates_applied 来记录已经处理过的消息。
在第一事务中,通过本地的数据库的事务保障,保证"增加交易表记录"、"增加两条异步消息队列记录(一条处理buyer表、一条处理seller表)",同时成功或同时失败。
在第二事务中,分别读出消息队列(但不删除),通过判断更新记录表 updates_applied 来检测相关消息是否被执行,如没执行,则执行相关业务逻辑(保证幂等性,保证即使执行过程中异常,重复执行没有任何问题),执行完所有消息后然后增加一条操作记录到updates_applied,事务到此结束。用事务保证两个异步消息执行及updates_applied的一致性操作(又称为分布式事务框架)。最后删除队列。
 
b.去哪儿网分布式事务方案
i.优先使用异步方案,原理和"a.eBay经典的BASE模式"类似,对业务逻辑处理不能保证"幂等性"的,增加去重表(即a中的updates_applied) 来处理
ii.对于不适合异步消息处理的业务,如A、B、C三方需要同步处理才能返回:在A、B、C三个库中分别维护一个事务记录表recorda/recordb/recordc,当A、B、C业务事务处理完,将结果存到对应的recordx中,由一个中心服务对比查询三方的事物记录表,有如下两种处理方式:
    第一种:A、B成功,C失败了,重试C,知道C成功;
    第二种:C不可能成功时,回滚A、B,如C为扣库存,当库存为0时,则不能成功(不考虑补库存)。
另,这种recordx表由RPC框架层进行维护,对业务是透明的。
 
c.蘑菇街交易创建过程
场景:将下单功能拆分为12个子业务(见参考资料b),对于非实时、非强一致性的关联业务,使用"eBay经典的BASE模式"思想,第一个本地事务执行成功后,以发消息通知、关联事务异步化执行方案,来避免a中第二个事务的"分布式事务框架"对业务带来的侵入和复杂性,具体方案是基于DB事件变化通知给MQ,而MQ消费者通过ACK,保证消息一定消费成功,完成强一致性(消息可能会被重发,所以消息消费方要保证幂等性)。
 
而对要求同步做、强一致性要求的场景(和b的ii相同场景),如优惠券和优惠券减库存:可以引入"a.eBay经典的BASE模式"的第二个事务(分布式事务框架)来处理,但是复杂性会急剧上升;
另一种方式是创建一个不可见订单,然后在同步调用锁券和扣减库存时,针对调用异常(失败或者超时),发出废单消息到MQ。如果消息发送失败,本地会做时间阶梯式的异步重试;优惠券系统和库存系统收到消息后,会进行判断是否需要做业务回滚,这样就准实时地保证了多个本地事务的最终一致性。
根据业务进行不同的方案处理,解决了高并发分布式架构、分布式系统的数据一致性问题。
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324884339&siteId=291194637