CP architecture and AP architecture solutions for distributed transactions

1. What are CP architecture and AP architecture?

In the CAP theorem of distributed transactions , we learned that a distributed system cannot simultaneously satisfy data consistency, service availability, and partition-tolerance.

In reality, what we are facing is an unreliable network and a device with a certain probability of downtime. These two factors will lead to Partition. Therefore, P is a must, not an option, in the implementation of a distributed system.

For distributed system engineering practice, the more appropriate description of CAP theory is: under the premise of satisfying partition fault tolerance, no algorithm can satisfy data consistency and service availability at the same time.

Therefore, we need to choose between C and A:

  • CP architecture (rigid transaction): If you want to meet the strong consistency of data, you must lock other service data resources under the distributed service at the same time as a service database is locked. Wait for all services to finish processing business before releasing resources. At this time, if there are other requests to operate the locked resource, it will be blocked, so that the CP is satisfied. Strong consistency and weak availability are achieved.
  • AP architecture (flexible transaction): If the strong availability of services is to be satisfied, each service can execute local transactions independently without locking the resources of other services. When the transactions of each service have not been fully processed, if you go to access the database, you may encounter data inconsistencies on each node. Then we need some measures to make the data of each node finally reach consistency after a period of time. This is what satisfies the AP. Weak consistency (eventual consistency) and strong availability are achieved.
    insert image description here

2. CP architecture solution

2.1. DTP and XA

In 1994, the X/Open organization (now Open Group) defined the DTP model for distributed transaction processing.insert image description here

The model includes the following roles:

  • AP (Application Program): that is, the application program, which can be understood as a DTP program, which is our microservice;
  • TM (Transaction Manager): The transaction manager is responsible for coordinating and managing transactions, providing AP application programming interfaces and managing resource managers.
  • RM (Resource Manager): The resource manager (here it can be a DBMS, or a message server management system) application controls resources through the resource manager, and the resource must implement the interface defined by XA;
  • CRM (Communication Resource Manager): Communication Resource Manager, which is the communication middleware between TM and RM;

In this model, a distributed transaction (global transaction) can be split into many local transactions and run on different APs and RMs. The ACID of each local transaction is easy to implement, but the global transaction must ensure that every local transaction contained in it can succeed at the same time. If one local transaction fails, all other transactions must be rolled back. But the problem is that in the process of local transaction processing, the running status of other transactions is not known. Therefore, it is necessary to notify each local transaction through the CRM to synchronize the execution status of the transaction.

Therefore, the communication of each local transaction must have a unified standard, otherwise different databases cannot communicate. XA is the interface specification between communication middleware and TM in X/Open DTP . It defines interfaces for notifying transaction start, commit, termination, and rollback. All database vendors must implement these interfaces.

2.2. Two-Phase Commit (2PC)

2.2.1. Generation of agreement

To control distributed transactions, it is not enough for vendors to implement the XA specification, so a two-phase commit protocol is introduced.

2.2.2. What is the two-phase commit protocol?

Refers to the submission of a transaction is divided into two phases: the preparation phase and the execution phase.

Two-phase commit is used to coordinate the activities of multiple servers participating in an update to prevent data inconsistency when a part of the distributed system fails. For example, if an update operation requires records to be changed on three different nodes, and if only one node fails, the other two nodes must detect the failure and undo their changes.

2.2.3. Contents of the two-phase commit protocol

The two-phase commit protocol splits the global transaction into two phases for execution:

  • Phase 1: preparation phase, each local transaction completes the preparation of the local transaction.
  • Phase 2: Execution phase, each local transaction is committed or rolled back according to the execution result of the previous phase.

This process requires a coordinator and a participant in the transaction (voter).

  1. normal circumstances
    insert image description here

​Voting phase : The coordination group asks each transaction participant whether the transaction can be executed. Each transaction participant executes the transaction, writes redo and undo logs, and then feeds back the successful execution of the transaction (agree)

​Commitment phase : The coordination group finds that each participant can execute the transaction (agree), so it sends a commit command to each transaction participant, and each transaction participant submits the transaction.

  1. abnormal situation
    insert image description here

​Voting phase : The coordination group asks each transaction participant whether the transaction can be executed. Each transaction participant executes the transaction, writes redo and undo logs, and then feeds back the transaction execution result, but as long as one participant returns Disagree, the execution fails.

​Submission phase : The coordination group finds that one or more participants return Disagree, and considers the execution to be a failure. Then, an abort command is sent to each transaction participant, and each transaction participant rolls back the transaction.

2.2.4. Defects of the two-phase commit protocol

  1. single point of failure problem

    The disadvantage of 2PC is that it cannot handle node failures in the form of fail-stop. For example, the situation in the figure below.
    insert image description here

    Assume that coordinator and voter3 both crashed at the Commit stage, but voter1 and voter2 did not receive the commit message. At this time, voter1 and voter2 are in a dilemma. Because they can't judge which of the two scenarios is now:

    1. The unanimous vote passed in the last round and voter3 was the first to receive the commit message and crashed after the commit operation;
    2. In the last round voter3 objected, so it didn't pass at all.
  2. blocking problem

    • In the preparation phase and submission phase, each transaction participant will lock local resources and wait for the execution results of other transactions. The blocking time is long, and the resource locking time is too long, so the execution efficiency is relatively low.

    • A deadlock may occur.

  3. In the face of the above shortcomings of two-phase commit, three-phase commit was evolved later, but it still did not completely solve the problems of blocking and resource locking, and introduced some new problems, so there are fewer actual use scenarios.

2.2.5. 2PC usage scenarios

It has strong consistency requirements for transactions, is not sensitive to transaction execution efficiency, and does not want too much code intrusion.

3. AP architecture solution

3.1. TCC (compensated type)

  1. The TCC mode can solve resource locking and blocking problems in 2PC and reduce resource locking time.

  2. It is essentially a way of compensation. The transaction running process includes three methods:

    • Try: resource detection and reservation;
    • Confirm: The business operation to be executed is submitted; it is required that the Try must be successful and the Confirm must be successful;
    • Cancel: Reserved resources are released.

    Execution occurs in two phases:

    • Preparation phase (try): detection and reservation of resources;
    • Execution phase (confirm/cancel): According to the results of the previous step, determine the following execution method. If all transaction participants in the previous step are successful, confirm is executed here. On the contrary, execute cancel;
      insert image description here
  3. At a glance, it seems to be no different from two-phase commit, but it is actually very different:

    • try, confirm, and cancel are all independent transactions, not affected by other participants, and will not block waiting for others
    • try, confirm, and cancel are written by programmers in the business layer, and the lock granularity is controlled by code
  4. Advantage

    Each stage of TCC execution will submit the local transaction and release the lock, and there is no need to wait for the execution results of other transactions. And if other transactions fail to execute, in the end, instead of rolling back, a compensation operation is performed. In this way, long-term locking and blocking waiting of resources are avoided, and the execution efficiency is relatively high, which is a distributed transaction method with better performance.

  5. shortcoming

    • Code intrusion: It is necessary to manually write code to implement try, confirm, and cancel, and there are many code intrusions
    • High development cost: a business needs to be split into 3 steps, and the business implementation is written separately, and the business writing is more complicated
    • Security considerations: If the execution of the cancel action fails, the resource cannot be released, and a retry mechanism needs to be introduced, and retrying may lead to repeated execution, and the idempotence problem during retrying should also be considered
  6. scenes to be used

    • There are certain consistency requirements for transactions (eventually consistent)
    • High performance requirements
    • Developers have high coding ability and experience in idempotent processing

3.2. MQ transaction message scheme (message notification type)

  1. Its basic design idea is to split remote distributed transactions into a series of local transactions.
  2. It is generally divided into the initiator A of the transaction and the other participants B of the transaction:
    • Transaction initiator A executes local transactions;
    • Transaction initiator A sends transaction information to be executed to transaction participant B through MQ;
    • Transaction participant B executes the local transaction after receiving the message;
  3. A few notes:
    • The transaction initiator A must ensure that after the local transaction is successful, the message must be sent successfully;
    • MQ must ensure that messages are delivered correctly and persisted;
    • Transaction participant B must ensure that the message can be consumed in the end, and if it fails, it needs to retry multiple times;
    • Transaction B fails and will be retried, but transaction A will not be rolled back;

3.3. Local message table scheme (message notification type)

In order to avoid message sending failure or loss in the MQ transaction message scheme, we can persist the message to the database. There are two ways of implementation, the simplified version and the decoupled version.

3.3.1. Simplified version

  1. Transaction initiator:

    1. Start a local transaction;
    2. Execution of transaction-related business;
    3. Send a message to MQ;
    4. Persist the message to the database and mark it as sent;
    5. Submit local transactions;
  2. Transaction receiver:

    1. receive messages;
    2. Start a local transaction;
    3. Handle transaction-related business;
    4. Modify the database message status as consumed;
    5. Submit local transactions;
  3. Additional timed tasks

    1. Periodically scan the unconsumed messages in the table and resend them;
  4. advantage:

    • Compared with tcc, the implementation method is relatively simple and the development cost is low;

    shortcoming:

    • Data consistency completely depends on the message service, so the message service must be reliable;
    • It is necessary to deal with the idempotence of passive business parties;
    • The failure of passive business will not lead to the rollback of active business, but retry the passive business;
    • The transaction business is coupled with the message sending business , and the business data and the message table must be together;

3.3.2. Decoupled version

  1. In order to solve the above problems, we will introduce an independent message service to complete a series of behaviors such as message persistence, sending, confirmation, and failure retry. The general model is as follows:
    insert image description here

  2. Sequence diagram of a message sending:
    insert image description here

  3. The basic execution steps of transaction initiator A:

    1. Start a local transaction
    2. Notify the message service that it is ready to send the message (the message service will persist the message and mark it as ready to send)
    3. conduct local business,
      • If the execution fails, terminate, notify the message service, and cancel sending (the message service modifies the order status)
      • If the execution is successful, continue, notify the message service, and confirm sending (the message service sends a message, modifies the order status)
    4. Submit a local transaction
  4. The message service itself provides the following interfaces:

    1. Ready to send: persist the message to the database and mark the status as ready to send
    2. Cancel sending: Change the status of the database message to cancel
    3. Confirm sending: Change the status of the database message to confirm sending. Try to send a message, and modify the status to sent after success
    4. Confirm consumption: the consumer has received and processed the message, and changed the status of the database message to consumed
    5. Timed task: regularly scan the messages in the database whose status is confirmation, and then ask the corresponding transaction initiator whether the execution of the transaction business is successful, and the result is:
      • Business execution is successful: try to send a message, and modify the status to sent after success
      • Business execution failed: change the status of the database message to cancel
  5. Basic steps for transaction participant B:

    1. receive message
    2. Start a local transaction
    3. perform business
    4. Notify the message service that the message has been received and processed
    5. commit transaction
  6. advantage:

    • Uncoupling transaction business and message-related business

    shortcoming:

    • more complicated to implement

3.4. RabbitMQ message confirmation (message notification type)

RabbitMQ's idea of ​​ensuring that messages are not lost is rather peculiar. Instead of using traditional local tables, it uses the message confirmation mechanism:

  1. Producer confirmation mechanism: ensure that there will be no problem with messages reaching MQ from producers
    1. When the message producer sends a message to RabbitMQ, it can set an asynchronous listener to listen for the ACK from MQ;
    2. After MQ receives the message, it will return a receipt to the producer:
      • After the message reaches the switch, the routing fails, and a failure ACK will be returned;
      • Message routing is successful, but persistence fails, and a failure ACK will be returned;
      • Message routing is successful, persistence is successful, and a successful ACK will be returned;
    3. The producer prepares the processing methods of different receipts in advance
      • Failure receipt: resend after waiting for a certain period of time;
      • Success receipt: record logs and other behaviors;
  2. Consumer confirmation mechanism: to ensure that messages can be correctly consumed by consumers
    1. Consumers need to specify the manual ACK mode when listening to the queue;
    2. After RabbitMQ delivers the message to the consumer, it will wait for the consumer's ACK and delete the message after receiving the ACK. If the ACK message is not received, it will remain on the server. If the consumer disconnects or is abnormal, the message will be delivered to other consumers;
    3. After the consumer finishes processing the message and submits the transaction, it manually ACKs. If an exception is thrown during execution, it will not ACK, business processing will fail, and wait for the next message;

Through the above two confirmation mechanisms, the message security from the message producer to the consumer can be ensured, and combined with the local transactions at both ends of the producer and consumer, the final consistency of a distributed transaction can be guaranteed.

3.5. Advantages and disadvantages of message transactions

  1. advantage:

    • The business is relatively simple, and there is no need to write three-stage business;

    • It is a combination of multiple local transactions, so the resource locking cycle is short and the performance is good;

  2. shortcoming:

    • code hacking;

    • Rely on the reliability of MQ;

    • The message initiator can roll back, but the message participant cannot cause the transaction to roll back;

    • The timeliness of the transaction is poor, depending on whether the MQ message is sent in time, and the execution of the message participant;

In view of the problem that the transaction cannot be rolled back, someone proposed that after the execution of the transaction participant fails, MQ can be used to notify the message service again, and then the message service will notify other participants to roll back. Then, congratulations, you have realized the 2PC model again by using MQ and custom message service, and built another big wheel;

3.6. AT mode

  1. In January 2019, Seata open sourced the AT mode. AT mode is a non-intrusive distributed transaction solution. It can be regarded as an optimization of the TCC or two-phase commit model, which solves the problems of code intrusion and complicated coding in the TCC mode.

  2. In AT mode, users only need to pay attention to their own "business SQL", and the user's "business SQL" is used as the first stage, and the Seata framework will automatically generate the second-phase commit and rollback operations of the transaction. You can refer to Seata's official documentation .

  3. Fundamental:

    1. flow chart:
      insert image description here

    2. Do you feel that it is very similar to the execution of TCC, which is divided into two stages:

      • Phase one: Execute local transactions and return execution results
      • The second stage: According to the results of the first stage, judge the second stage approach: commit or rollback

      But the bottom layer of the AT mode is completely different, and the second stage does not require us to write, all of which are implemented by Seata itself. That is to say: the code we write is the same as the code for local transactions , and there is no need to manually handle distributed transactions.

    3. In the first stage, Seata will intercept the business SQL, first analyze the SQL semantics, find the business data to be updated by the business SQL, save it before the business data is updated, and then execute the business SQL to update the business data. After the business data is updated before image, Then save it after image, and finally acquire the global row lock and commit the transaction . The above operations are all completed within a database transaction, which ensures the atomicity of the first-stage operation.

      The sum here before imageis after imagesimilar to the undo and redo logs of the database, but it is actually simulated by the database.insert image description here

    4. If the second stage is submitted, because the business SQL has been submitted to the database in the first stage, the Seata framework only needs to delete the snapshot data and row locks saved in the first stage to complete the data cleaning.

    5. If the second stage is a rollback, Seata needs to roll back the business SQL that has been executed in the first stage to restore the business data. The rollback method is to before imagerestore the business data; but before restoring, you must first verify the dirty write, compare the current business data and the database after image, if the two data are completely consistent, it means that there is no dirty write, you can restore the business data, if not, it means If there is dirty writing, it needs to be processed manually.insert image description here

    6. However, because of the global lock mechanism, the probability of dirty writes can be reduced.

      The one-phase and two-phase commits and rollbacks of the AT mode are automatically generated by the Seata framework. Users only need to write business SQL to easily access distributed transactions. The AT mode is a distributed transaction solution without any intrusion into the business. plan.

  4. Detailed Architecture and Process

    1. Several basic concepts in Seata:

      • TC (Transaction Coordinator) - transaction coordinator

        Maintain the state of global and branch transactions, drive global transaction commit or rollback (coordinator between TMs).

      • TM (Transaction Manager) - transaction manager

        Define the scope of a global transaction: start a global transaction, commit or rollback a global transaction.

      • RM (Resource Manager) - Resource Manager

        Manage resources for branch transactions, talk to TCs to register branch transactions and report the status of branch transactions, and drive branch transactions to commit or rollback.

    2. architecture diagraminsert image description here

      1. TM: The opener of the global transaction in the business module
        • Open a global transaction to TC
        • call other microservices
      2. RM: The business module executor includes the RM part, which is responsible for reporting the transaction execution status to the TC
        • Execute local transactions
        • Register branch transactions with TC, and submit local transaction execution results
      3. TM: End the call to the microservice, notify the TC, the execution of the global transaction is completed, and the first phase of the transaction ends
      4. TC: Summarize the execution results of each branch transaction, and decide whether to commit or roll back the distributed transaction;
      5. TC notifies all RMs to commit/rollback resources, and the second phase of the transaction ends.
    3. Phase one:

      1. TM starts the global transaction and declares the global transaction to TC, including the global transaction XID information;
      2. The service where the TM is located calls other microservices;
      3. Microservices are mainly executed by RM:
        1. inquiry before_image;
        2. perform local transactions;
        3. inquiry after_image;
        4. generate undo_logand write to the database;
        5. Register the branch transaction with TC and inform the transaction execution result;
        6. Acquire global locks (prevent other global transactions from concurrently modifying current data);
        7. Release the local lock (does not affect other business operations on data);
      4. After all the business is executed, the transaction initiator (TM) will try to submit the global transaction to TC;
    4. Phase two:

      1. TC counts the execution of branch transactions, and judges the next action based on the results:
        • The branches are all successful: notify the branch transaction and submit the transaction;
        • There is a branch execution failure: notify the successful execution of the branch transaction and roll back the data;
      2. RM for branch transactions:
        • Commit the transaction: directly clear before_imagethe after_imagesum information and release the global lock;
        • Rollback transaction:
          • Check after_image to determine whether there is dirty writing;
          • If there is no dirty write, roll back the data to before_image, clear before_imageand after_image;
          • If there are dirty writes, request manual intervention;

Guess you like

Origin blog.csdn.net/itigoitie/article/details/127785659