Distributed transactional consistency analysis solutions

First, from the data consistency talk

Consistency, "root of all evil" is data redundancy and distribution network and via interactive + network anomaly is the norm.

1, data consistency of the case

  • Primary library, from the library, and the cache data consistency , the same data redundancy, relational databases, and to ensure the availability of high-performance database off, usually a master-slave (standby) and introduced into the cache architecture. Wherein data inconsistency exists in the time window of data redundancy. Common solution, see the database's schema .
  • Data consistency between multiple copies of data , identical copies of data, large data field, there will be multiple copies of a data and stored on different nodes. The client can access any node a read or write operation. Common solution is based on open source implementation of Paxos, ZAB, Raft, Quorum, Gossip and the like. Here are just a mention, we will not discuss. Interest free to Google or Baidu.
  • Data consistency between distributed services , data distribution, distributed services, different service operations library (table), and the library (Table) between to be consistent. Common solution is distributed transaction consistency solutions . This is the article to explore content.

2, the concept of data consistency

  • Strong consistency
  • Weak consistency
  • The final consistency

3, data consistency principle

  • ACID
  • CAP
  • BASE

4, data consistency protocol

  • Two-phase commit protocol
  • Three-phase commit protocol
  • TCC agreement
  • Paxos protocol
  • ZAB agreement
  • Raft agreement
  • Quorum agreement
  • Gossip protocol

Second, the consistency of data between distributed services

Distributed transactional consistency analysis solutions

The so-called distributed services is through the local module before the mutual contact ××× split into separate independent application deployment and network messages through the remote interface and interaction. And regardless say strict imprecise, is not correct, understand just fine. Focus of this article is not about this topic. Simply draw a picture aid in the understanding, as. Centralized architecture, in order to ensure the consistency of orders and inventory table table, as long as a local transaction (ACID) will be able to guarantee strong consistency between the two. Distributed architecture, order table service operation by the orders, inventory list by the inventory service operations. To ensure the consistency of orders and inventory table table, then it must ensure order service orders and inventory services operating table operating table stock colleagues success. Before a local transaction becomes a distributed transaction. Due to interact through a network between the web service exception + is the norm, it will produce inconsistent data between the service situation. This involves the issue of a distributed transaction consistency.

Third, the distributed transaction consistency Solutions

1, the interface synchronous call mode and conformance solutions

Distributed transactional consistency analysis solutions
Analysis Model : A Service B Service Interface synchronous call and return to wait for the result, the subsequent process will depend B and services return results. In this interactive mode, the result A service has been subdivided in three ways.

  1. Requesting initiation phase network timeout or abnormal case, B has not received the service request, the corresponding process is not made;
  2. Results return phase network timeout or abnormal case, B has received the service request, and to make corresponding processing;
  3. The results returned to normal (clear success or failure).

Business scenario : suitable for large-scale, highly concurrent short operation and rely on the return value of the scene. For example, transaction services, and inventory services (service card vouchers, envelopes service) interactions, and other interactive user login and access the service.

Solution : Option One, caller to the service query retry strategy; Option II, TCC program.

Note : These two programs to ensure data consistency in fact or by "asynchronous", but need fast calibration, quasi real-time.

1. The caller to the service query retry strategy for a business service from the scene.

Distributed transactional consistency analysis solutions

Note :

  1. 查询重试后依然失败(极少),报警,人工处理或者准实时对账系统自动校准;
  2. 重试次数不宜多,甚至只重试一次;
  3. B服务处理请求要做幂等。

2.TCC方案,适合多个从业务服务场景。TCC是阿里在二阶段提交协议的基础上提出的一种解决分布式事务一致性的协议,原理图如下。其对应的产品是DTX(老版是DTS)。DTS中有个快速开始的例子看明白了,TCC就基本OK了。在蚂蚁金服内部被广泛地应用于交易、转账、红包等核心资金链路,服务于亿级用户的资金操作。

Distributed transactional consistency analysis solutions

  1. 关于TCC,个人认为,理解原理很重要。工作中遇到吻合的场景可以根据原理自行实现,满足业务即可;
  2. 一个开源实现:tcc-transaction

2、接口异步调用模式与一致性解决方案

Distributed transactional consistency analysis solutions

模式分析:A服务调用B服务,B服务先受理请求并落库,状态是待处理。B服务处理请求很耗时,或者要依赖其他的服务。B服务处理完后通知A服务或者A服务定时去查询B服务的处理结果。这种交互模式下,对于CASE-1,第1步和第2步同接口同步调用模式,第3步同消息异步处理模式;对于CASE-2,相当于两次接口同步调用模式

业务场景:适用于非核心链路上负载较高的处理环节,这个环节经常耗时较长,并且对时效性要求不高。例如,用户提现时,账户系统和提现系统的交互(CASE-1);提现系统和三方系统(银行系统或者三方托管系统)的交互(CASE-2)。

解决方案服务被调方最大努力处理方案。由于B服务中请求有落库,所以可以用定时任务不断重试尽最大努力将请求处理出结果。处理后,将请求状态设置成对应的结果落库。然后再通知A服务或者A服务异步主动查询。

Distributed transactional consistency analysis solutions

  1. B服务通常都是接受请求并持久化后才返回A服务受理成功。避免服务进程被杀掉而导致请求丢失。
  2. 不管是第(1,2)两步还是CASE-2中的第(3,4)两步,如果查询重试失败,可以落库,用定时任务处理,知道成功。反正不像接口同步调用模式,A服务不需要实时的结果。

3、消息异步处理模式与一致性解决方案

Distributed transactional consistency analysis solutions
模式分析:A服务将B服务需要的信息通过消息中间件传递给B服务,A服务无需知道B服务的处理结果。这种交互模式下,消息生产者要确保消息发送成功;消息消费者要确保消息消费成功。

业务场景:消息异步处理模式与接口异步调用模式类似,多应用于非核心链路上负载较高的处理环节中,井且服务的上游不关心下游的处理结果,下游也不需要向上游返回处理结果。例如,在电商系统中,用户下订单支付且交易成功后,发送消息给物流系统或者账务系统进行后续的处理。

解决方案生产者最大努力通知+消费者最大努力处理方案。

1.非事务消息,生产者先执行本地事务并将消息落库,状态标记为待发送,然后发送消息。如果发送成功,则将消息改为发送成功。定时任务定时从数据库捞取在一定时间内待发送的消息并将消息发送。通过定时任务来保证消息的发送。为确保消息一定能消费,消费者一般采用手动ACK机制,那么消息服务器必然会重发未ACK的消息,这就要求消息消费者做好幂等。

Distributed transactional consistency analysis solutions
Distributed transactional consistency analysis solutions

2.事务消息,以RocketMQ为例,下图是RocketMQ事务消息的流程。官网有示例代码。和不支持事务的消息中间相比,只是消息发送的时候,保证了和本地事务的一致。消费者实现还是不变。

Distributed transactional consistency analysis solutions

  1. 定时任务重试发送消息和消息服务器重发未ACK的消息一般都是时间阶梯式的(2<sup>n</sup>*时间间隔);
  2. 支持事务消息中间件之RocketMQ

四、保证操作幂等性的常用方法

  1. 有业务状态,业务逻辑来保证幂等。比如接到支付成功的消息订单状态变成支付完成,如果当前状态是支付完成,则再收到一个支付成功或者支付成功之前状态的消息则说明消息重复了,不用再次处理。
  2. 无业务状态,业务唯一ID保证幂等。增加一个去重表(或分布式缓存)来记录有业务唯一ID的操作。比如调用充值接口,当请求过来时,会根据唯一充值ID去查充值流水表,若已经存在,则直接返回;否则继续进行充值操作。

:保证幂等性的方法很多,根据具体的业务场景,总能找到保证幂等性的方法。

五、总结

  1. 接口同步调用模式,服务调用方查询重试方案和TCC方案。
  2. Interface asynchronous invocation patterns, service Callees best effort treatment program.
  3. Asynchronous message processing model, producer best to inform + consumers best to treatment options.
  4. Any service operations are required to provide a query interface for external output to the state of execution of the operation.
  5. Never call a remote service in local affairs, in this scenario if a remote service there is a problem, the transaction will be prolonged, resulting in the application server taking up too much of the database connection, so that the server load is rising rapidly, in severe cases can press collapse database.
  6. Last line of defense - reconciliation system.
  7. Synchronous and asynchronous choices:
  • Asynchronously place, it should be asynchronous implementation. If the business logic allows it, we can be some of the longer time-consuming, the user's operation of asynchronous responses have no special requirements, in order to reduce the level of the core link, release the pressure of the system.
  • Synchronization issue can be resolved not to introduce asynchronous. If the performance is not an issue, or the operation is treated lightweight short processing logic, it is highly desirable that the synchronization is called, because it does not require the introduction of complex asynchronous processes.

Note: If the above scenarios and solutions, not able to contain the scene encountered your work, welcome to exchange and discuss solutions.

Guess you like

Origin blog.51cto.com/13732225/2415133