Several ways to achieve data consistency under microservices are reproduced: Original link: https://www.jianshu.com/p/b264a196b177

I recently learned about the characteristics of data consistency under microservices, and summarized the current several implementation methods to ensure data consistency under microservices as follows for future reference. This article is intended to give you a general introduction to the implementation of data consistency based on microservices. It has not been developed in depth. The specific implementation method is also continuing to learn. If there are errors, you are welcome to make bricks.

 

Traditional application transaction management

 


Before introducing data consistency in microservices for local transactions , briefly introduce the background of the transaction. Traditional stand-alone applications use an RDBMS as the data source. The application starts the transaction, performs CRUD, commits or rolls back the transaction, all occurs in the local transaction, and the transaction support is directly provided by the resource manager (RM). Data consistency is guaranteed in a local transaction.


Distributed transaction
two-phase commit (2PC)
When the application gradually expands, an application uses multiple data sources. At this time, local transactions can no longer meet the requirements of data consistency. Due to the simultaneous access of multiple data sources, transactions need to be managed across multiple data sources, and distributed transactions arise at the historic moment. The most popular of these is the two-phase commit (2PC), where distributed transactions are managed by the transaction manager (TM).
The two-phase submission is divided into a preparation phase and a submission phase.

Two-phase commit

Two-phase commit-rollback
However, the two-phase commit cannot fully guarantee data consistency and has the problem of synchronous blocking, so its optimized version of the three-phase commit (3PC) was invented.
Three-phase submission (3PC)

Three-phase submission
However, 3PC can only guarantee data consistency in most cases.

 

Transaction management under microservices

 

So, is distributed transaction 2PC or 3PC suitable for transaction management under microservices? The answer is no, for three reasons:

  1. Because there is no direct data access between microservices, microservices usually call each other through RPC (Dubbo) or Http API (Spring Cloud), so it is no longer possible to use TM to uniformly manage the RM of microservices.

  2. The types of data sources used by different microservices may be completely different. If the microservices use a database such as NoSQL that does not support transactions, there is no way to talk about transactions.

  3. Even if the data sources used by microservices all support transactions, if a large transaction is used to manage the transactions of many microservices, the time that this large transaction is maintained will be several orders of magnitude longer than the local transaction. Such long-term transactions and cross-service transactions will generate many locks and data unavailability, which will seriously affect system performance.

 

It can be seen that the traditional distributed transaction can no longer meet the transaction management requirements under the microservice architecture. So, since it is impossible to satisfy the traditional ACID transactions, the transaction management under microservices must follow a new rule-BASE theory.
BASE theory was proposed by eBay architect Dan Pritchett. BASE theory is an extension of CAP theory. The core idea is that even if strong consistency cannot be achieved, the application should be able to achieve final consistency in a suitable way. BASE refers to Basically Available, Soft State, and Eventual Consistency.

  • Basic availability: Refers to the fact that the distributed system is allowed to lose part of its availability when a failure occurs, which means that the core is available.

  • Soft state: Allow the system to have an intermediate state, and the intermediate state will not affect the overall availability of the system. Generally, there is at least three copies of a piece of data in distributed storage, and the delay in allowing copy synchronization between different nodes is the embodiment of soft state.

  • Final consistency: Final consistency means that all data copies in the system will eventually reach a consistent state after a certain period of time. Weak consistency is the opposite of strong consistency, and ultimate consistency is a special case of weak consistency.

 

The final consistency in BASE is the fundamental requirement for transaction management under microservices. Both microservice-based transaction management cannot achieve strong consistency, but it must guarantee the heaviest consistency. So, what methods can guarantee the final consistency of transaction management under microservices? According to the implementation principle, there are two main types, event notification type and compensation type, of which event notification type can be divided into reliable event notification mode and best efforts Notification mode, and compensation type can be divided into TCC mode and business compensation mode. These four modes can achieve the final consistency of data under microservices.

 

Ways to achieve data consistency under microservices

 

Reliable event notification mode The design concept of the
synchronous event
reliable event notification mode is relatively easy to understand, that is, the master service passes the results to the slave service through the event (often a message queue), and the slave service consumes after receiving the message and completes the business. , So as to achieve message consistency between the master service and the slave service. The first and easiest thing that can be thought of is synchronization event notification. Business processing and message sending are executed synchronously. For the implementation logic, see the code and timing diagram below.

  1. public void trans() {

  2. try {

  3. // 1. 操作数据库

  4. bool result = dao.update(data);// 操作数据库失败,会抛出异常

  5. // 2. 如果数据库操作成功则发送消息

  6. if(result){

  7. mq.send(data);// 如果方法执行失败,会抛出异常

  8. }

  9. } catch (Exception e) {

  10. roolback();// 如果发生异常,就回滚

  11. }

  12. }

 


The above logic looks seamless. If the database operation fails, it will exit without sending a message; if the message fails to send, the database will be rolled back; if the database operation is successful and the message is sent successfully, the business will be successful, and the message will be sent to the downstream consumer. Then after careful consideration, there are actually two shortcomings in synchronous message notification.
Under the architecture of microservices, there may be network IO problems or server downtime problems. If these problems appear in step 7 of the timing diagram, the main service (network problem) cannot be notified normally after the message is delivered, or the submission cannot continue. Transaction (downtime), then the master service will think that the message delivery fails and will roll over the master service business. However, in fact, the message has been consumed by the slave service, so it will cause the data of the master service and the slave service to be inconsistent. The specific scene can be seen in the following two timing diagrams.


The event service (in this case, the message service) is too coupled with the business. If the message service is unavailable, the business is unavailable. The event service should be decoupled from the business and executed independently and asynchronously, or try to send a message first after the business is executed. If the message fails to be sent, it will be downgraded to asynchronous.
Asynchronous event
local event service:
In order to solve the problem of the synchronous event described in the above synchronous event, the asynchronous event notification model has been developed. Both the business service and the event service are decoupled, and the event is performed asynchronously. A separate event service ensures the reliability of the event Post.

Asynchronous event notification-local event service

When the business is executed, the event is written to the local event table in the same local transaction, and the event is delivered at the same time. If the event is delivered successfully, the event is deleted from the event table. If the delivery fails, the event service is used to periodically and uniformly process the failed delivery events, and re-delivery is performed until the event is correctly delivered, and the event is deleted from the event table. This method guarantees the effectiveness of event delivery to the greatest extent possible, and when the first delivery fails, the asynchronous event service can also be used to ensure that the event is delivered at least once.
However, this method of using local event services to ensure reliable event notifications also has its shortcomings, that is, the business is still coupled to the event service (when the first synchronous delivery), and more seriously, local transactions require Responsible for the operation of additional event tables, which puts pressure on the database. In high-concurrency scenarios, because each business operation will generate a corresponding event table operation, the available throughput of the database is almost halved, which is undoubtedly impossible Accepted. It is for this reason that the reliable event notification model has further developed-external event services appear in people's eyes.
External event service: The
external event service takes the local event service a step further, and separates the event service from the main business service. The main business service does not have any strong dependence on the event service.

Asynchronous event notification-The external event service
business service sends events to the event service before submission. The event service only records events and does not send them. The business service notifies the event service after submission or rollback, and the event service sends events or deletes events. Don't worry about the business system's downtime after submission or rollover and unable to send confirmation events to the event service, because the event service will periodically obtain all unsent events and query the business system to decide whether to send or delete the event based on the business system's return event.
Although external events can decouple the business system and the event system, they also bring additional workload: the external event service has twice the network communication overhead (before submission, after submission / rollback) compared to the local event service. At the same time, the business system also needs to provide a separate query interface for the event system to determine the status of unsent events.
Notes on the reliable event notification mode:
There are two points to note in the reliable event mode: 1. The correct sending of events; 2. The repeated consumption of events.
The asynchronous message service can ensure the correct sending of events. However, events are likely to be sent repeatedly. Then the consumer needs to ensure that the same event will not be consumed repeatedly. In short, it is to ensure the idempotence of event consumption.
If the event itself is an idempotent status event, such as the notification of the order status (ordered, paid, shipped, etc.), you need to determine the order of the event. It is generally judged by the timestamp. After consuming the new message, the old message is discarded and not consumed when the old message is received. If you cannot provide a global time stamp, you should consider using a globally uniform serial number.
For events that do not have idempotency, it is generally action behavior events, such as deduction 100, deposit 200, you should persist the event ID and event result, query the event ID before consuming the event, and directly return the execution result if it has been consumed ; If it is a new message, execute it, and store the execution result.
Best effort notification mode
Compared with the reliable event notification mode, the best effort notification mode is much easier to understand. The feature of the best effort notification type is that after the transaction is submitted, the business service sends a limited number of messages (setting a maximum number of messages), such as sending three messages. If all three messages fail to be sent, the message will not be sent. So it may lead to the loss of messages. At the same time, the master business party needs to provide a query interface to the slave business service to recover lost messages. The best effort notification type has poor guarantee for timeliness (that is, a soft state may occur for a long time), so the system with high timeliness requirements for data consistency cannot be used. This model is usually used in different business platform services or notifications for third-party business services, such as bank notifications, merchant notifications, etc., and will not be expanded here.
Business compensation mode
Next, two compensation modes are introduced. The biggest difference between the compensation mode and the event notification mode is that the upstream service of the compensation mode depends on the operation result of the downstream service, while the upstream service of the event notification mode does not depend on the operation result of the downstream service. . First introduce the business compensation mode. The business compensation mode is a pure compensation mode. Its design concept is that the business is submitted normally when it is called. When a service fails, all upstream services it depends on perform business compensation operations. For example, Xiaoming started from Hangzhou and went on a business trip to New York, USA. Now he needs to book a train ticket from Hangzhou to Shanghai and a plane ticket from Shanghai to New York. If Xiaoming successfully bought a train ticket and found that the plane ticket was sold out, then instead of staying in Shanghai for another day, Xiaoming might as well cancel the train ticket to Shanghai and choose to fly to Beijing and then transfer to New York, so Xiaoming canceled Train ticket to Shanghai. In this example, buying a train ticket from Hangzhou to Shanghai is service a, and buying a ticket from Shanghai to New York is service b. The business compensation model is to compensate for service a when service b fails. In the example, it is to cancel Hangzhou to Shanghai train ticket.
The compensation model requires that each service provide an excuse for compensation, and this compensation is generally incomplete compensation. Even if the compensation operation is performed, the cancelled train ticket record is still in the database and can be tracked (generally believe The status field of "is cancelled" as a mark), after all, online data that has been submitted cannot generally be physically deleted.
The biggest disadvantage of the business compensation mode is that the soft state takes a long time. The timeliness of data consistency is very low, and multiple services may often be in inconsistent data.
TCC / Try Confirm Cancel mode
TCC mode is an optimized business compensation mode. It can be fully compensated. It does not leave a record of compensation after compensation. At the same time, the soft state time of TCC is very short because the TCC is a two-stage model. Only when the first stage (try) of all services is successful, the second stage confirmation operation is performed, otherwise Carry out the compensation (Cancel) operation, but in the try phase, there will be no real business processing.

TCC mode
The specific process of TCC mode is two stages:

  1. Try, the business service completes all business checks and reserves necessary business resources

  2. If Try is successful in all services, then perform Confirm operation, Confirm operation does not do any business check (because it has been done in try), just use the business resources reserved in the Try phase for business processing; otherwise, Cancel operation, Cancel The operation releases the business resources reserved in the Try phase.

 

It may be vague to say so. Let me give a specific example. Xiaoming Online transfers RMB 100 from China Merchants Bank to Guangfa Bank. This operation can be seen as two services. Service a transfers 100 yuan from Xiaoming's China Merchants Bank account, and service b transfers 100 yuan from Xiaoming's Guangfa Bank account.
Service a (Xiaoming transferred 100 yuan from China Merchants Bank):
try: update cmb_account set balance = balance-100, freeze = freeze + 100 where acc_id = 1 and balance> 100;
confirm: update cmb_account set freeze = freeze-100 where acc_id = 1;
cancel: update cmb_account set balance = balance + 100, freeze = freeze-100 where acc_id = 1;
service b (Xiaoming transfers 100 yuan to Guangfa Bank):
try: update cgb_account set freeze = freeze + 100 where acc_id = 1 ;
confirm: update cgb_account set balance = balance + 100, freeze = freeze-100 where acc_id = 1;
cancel: update cgb_account set freeze = freeze-100 where acc_id = 1;
specific description: In
the try phase of a, the service has done two things Things: 1. Business check, here is to check whether the money in Xiaoming's account is more than 100 yuan; 2. Reserve resources and transfer 100 yuan from the balance to the frozen funds.
In the confirm phase of a, no business check is performed here, because the try phase has already been done, and because the transfer has been successful, the frozen funds will be deducted.
In the cancel phase of a, the reserved resources are released, both 100 yuan frozen funds, and restored to the balance.
The try phase of b is carried out, resources are reserved, and 100 yuan is frozen.
In the confirm phase of b, the resources reserved in the try phase are used to transfer 100 yuan of frozen funds to the balance.
In the cancel phase of b, the reserved resources in the try phase are released, and 100 yuan is subtracted from the frozen funds.
As can be seen from the simple example above, the TCC model is more complex than the pure business compensation model, so each service needs to implement two interfaces, Cofirm and Cancel, in the implementation. The following table
summarizes
these four commonly used modes:

Types of name Real-time data consistency Development costs Whether upstream service depends on downstream service result
Notification maximum effort low low not depend on
Notification Reliable event high high not depend on
Compensation type Business compensation low low rely
Compensation type TCC high high rely

 

Original link: https://www.jianshu.com/p/b264a196b177

Guess you like

Origin www.cnblogs.com/testzcy/p/12703314.html