How to achieve eventual consistency in a distributed micro-service architecture applications?

In distributed systems, to achieve strong consistency is not easy. Even 2PC, 3PC-phase commit, can not guarantee a strong consistency.

We can not because of inconsistency very small probability, lead to lower overall system performance, scalability or affected, and architecture becomes extremely complicated. Therefore, in 2PC / 3PC submitted to the absence of large-scale applications, eventual consistency is a better solution in the industry to get a lot of use.

A retry mechanism

As shown below, Service Consumer calls simultaneously and Service A Service B, if the call is successful Service A, Service B call identification, in order to ensure a final consistency, the easiest way is to retry.

Here Insert Picture Description
Retry, pay attention to set the timeout Service Consumer, avoiding long waits or stuck, depleted resources.

When Consumer retry, the following aspects should be noted:

Timeout time;
the number of retries;
interval retry;
retry attenuation time interval;
specific implementation details, refer to "retry strategy based elegant Spring-tryer".

Second, the local logging
by local logging, and then collect the distributed control system, or other back-end systems, start a regular inspection of the tool. According to the actual situation, can be selected manually.

Log Format: TranID-AB-Detail

TransID as transaction ID, you can generate a random number sequence;
the Detail detailed content data;
if A successful call, the records A Success;
If the call fails B, or failure, etc. is not recorded, i.e. there is no log B Success , the recall B;
periodically detect, and process log.
Identification log collection design, as shown below.
Here Insert Picture Description
Third, the reliable messaging model
taking into account the probability of failure is relatively low real business scenario, consider the following scheme.

Service Consumer Service B fails in the call, first retry. If a certain number of retries fails, the message is sent directly Message Queue, converted into asynchronous processing.

It may be distributed to the MQ relatively strong, such as Kafka, RocketMQ other open source distributed message system for asynchronous processing.

Service B may be a dedicated integrated error handling assembly, continue to collect compensation from the message MQ.
Separate components, or a process error, the independent compensation processing MQ message, including other irregularities Service component.
Here Insert Picture Description
This approach also has the risk of lost messages, Service Consumer is not issued to the news hung up, this is a small probability event.

Another embodiment - reliable message mode, as shown in FIG. Send a message to the Service Consumer Message Queue Broker, as RocketMQ, Kafka and the like. Service A and Service B by the consumer news.

MQ MQ distributed can be used, and can be persistent, so the message is not lost by MQ ensure that the MQ reliable.
Here Insert Picture Description
Reliable message schema advantages:

Enhance throughput;
In some scenarios, the response time is reduced;
Problems:

存在不一致的时间窗口(业务数据进入了MQ,但是没有进入DB,导致一些场景读不到业务数据);
增加了架构的复杂度;
消费者(Service A/B)需要保证幂等性;
针对上述不一致的时间窗口问题,可以进一步优化。

将业务分为:核心业务和从属业务
核心业务服务 - 直接调用;
从属业务服务 - 从MQ 消费消息;
Here Insert Picture Description
直接调用订单服务(核心服务),将业务订单数据落地DB;同时,发送向MQ 发送消息。

考虑到在向MQ 发送消息之前,Service Consumer(创建订单)可以会挂掉,也就是说调用订单服务和发送Message 必须在一个事务中,因为处理分布式事务比较麻烦,且影响性能。

因此,创建了另外一张表:事件表,和订单表在同一个数据库中,可以添加事务保护,把分布式事务变成单数据库事务。

整个流程如下:

(1)创建订单 - 持久化业务订单数据,并在事件表中插入一条事件记录。注意,这里在一个事务中完成,可以保证一致性。如果失败了,无须关心业务服务的回退,如果成功则继续。

(2)发送消息 - 发送订单消息到消息队列。

如果发送消息失败,则进行重试,如果重试成功之前,挂掉了,则由补偿服务去重新发送消息(小概率事件)。
补偿服务会不断地轮询事件表,找出异常的事件进行补偿消息发送,如果成功则忽略。
如果发送消息成功,或者补偿服务发送消息成功,则可以考虑删除事件表中的事件信息记录(逻辑删除)。
(3)消费消息 - 其他从属业务服务,则可以消费MQ中的订单消息,进行自身业务逻辑的处理。

上述设计方案中,有3点需要说明一下:

(1)直接调用订单服务(核心业务),是为了让业务订单数据尽快落地,避免不一致的时间窗口问题,保证写后读一致性。

(2) create business orders send messages directly to MQ, it is to increase real-time, only the abnormal situation, only the use of compensation services. If less demanding real-time, it may be considered a logical Message sent directly removed.

(3) the introduction of an additional event table, in order to become a single distributed transaction database transactions, to a certain extent, also increased the pressure on the database.

Above are some of my own thoughts, to share out the welcome to correct me, the way to find a wave of concern, the idea of ​​partnership can comment or private letter I oh ~
Here Insert Picture Description

Published 22 original articles · won praise 7 · views 7600

Guess you like

Origin blog.csdn.net/ZYQZXF/article/details/104595594