Distributed transaction --- Seata transaction mode, high availability

1. Business model

1.1, XA mode

The XA specification is a distributed transaction processing (DTP, Distributed Transaction Processing) standard defined by the X/Open organization. The XA specification describes the interface between the global TM and the local RM. Almost all mainstream databases support the XA specification. .

 

1.1.1, two-phase commit

XA is a specification. Currently, mainstream databases have implemented this specification. The principle of implementation is based on two-phase commit.

normal circumstances:

abnormal situation:

Phase one:

  • The transaction coordinator notifies each transaction participant to perform a local transaction

  • After the execution of the local transaction is completed, report the transaction execution status to the transaction coordinator. At this time, the transaction does not commit and continues to hold the database lock

Phase two:

  • The transaction coordinator judges the next step based on the report of the first stage

    • If all phases are successful, notify all transaction participants and commit the transaction

    • If any participant in a phase fails, notify all transaction participants to roll back the transaction

 

1.1.2, Seata's XA model

Seata simply encapsulates and transforms the original XA mode to adapt to its own transaction model. The basic architecture is shown in the figure:

The work of the first stage of RM:

① Register the branch transaction to TC

② Execute branch business sql but do not submit

③ Report execution status to TC

The work of the second phase of TC:

  • TC detects the transaction execution status of each branch

    a. If all succeed, notify all RMs to commit the transaction

    b. If there is a failure, notify all RMs to roll back the transaction

The work of the second phase of RM:

  • Receive TC instructions, commit or rollback transactions

 

1.1.3. Advantages and disadvantages

What are the advantages of XA mode?

  • The strong consistency of transactions meets the ACID principle.

  • Commonly used databases are supported, the implementation is simple, and there is no code intrusion

What are the disadvantages of XA mode?

  • Because the database resources need to be locked in the first stage and released only after the end of the second stage, the performance is poor

  • Relying on Relational Databases to Realize Transactions

 

1.1.4. Realize the XA mode

Seata's starter has completed the automatic assembly of the XA mode, and the implementation is very simple. The steps are as follows:

1) Modify the application.yml file (each microservice participating in the transaction), and enable the XA mode:

seata:
  data-source-proxy-mode: XA

 2) Add the @GlobalTransactional annotation to the entry method that initiates the global transaction:

 

1.2, AT mode

The AT mode is also a phased commit transaction model, but it makes up for the long resource locking period in the XA model.

 

1.2.1, Seata's AT model

Basic flowchart:

Phase 1 RM work:

  • register branch transaction

  • Record undo-log (data snapshot)

  • Execute business sql and submit

  • report transaction status

The work of RM at the time of phase 2 submission:

  • Just delete the undo-log

The work of RM during phase 2 rollback:

  • Restore data to before update according to undo-log

 

1.2.2. Process review

Let's sort out the principle of the AT mode with a real business.

For example, now there is another database table that records user balances:

id money
1 100

The SQL to be executed by one of the branch services is:

update tb_account set money = money - 10 where id = 1

In AT mode, the current branch transaction execution flow is as follows:

Phase one:

  1. TM initiates and registers global transactions to TC
  2. TM call branch transaction
  3. Branch transaction prepares to execute business SQL
  4. RM intercepts business SQL, queries original data according to where conditions, and forms a snapshot.
    {
        "id": 1, "money": 100
    }
  5. RM executes business SQL, submits local transactions, and releases database locks. at this timemoney = 90
  6. RM reports local transaction status to TC

 

Phase two:

  1. TM informs TC that the transaction is over
  2. TC checks branch transaction status

        a. If all are successful, delete the snapshot immediately

        b. If a branch transaction fails, it needs to be rolled back. Read snapshot data ( {"id": 1, "money": 100}), restore the snapshot to the database. At this point the database is restored to 100 again

flow chart:

  

1.2.3 The difference between AT and XA

Briefly describe the biggest difference between AT mode and XA mode?

  • The XA mode does not commit transactions in the first stage, and locks resources; the AT mode commits directly in the first stage, without locking resources.

  • The XA mode relies on the database mechanism to achieve rollback; the AT mode uses data snapshots to achieve data rollback.

  • Strong consistency in XA mode; final consistency in AT mode

 

1.2.4. Dirty write problem

When multiple threads access distributed transactions in AT mode concurrently, dirty write problems may occur, as shown in the figure:

The solution is to introduce the concept of a global lock. Before releasing the DB lock, get the global lock first. Avoid another transaction to operate the current data at the same time.  

 

1.2.5. Advantages and disadvantages

Advantages of AT mode:

  • Complete the direct submission of transactions in one stage, release database resources, and have better performance

  • Using global locks to achieve read-write isolation

  • No code intrusion, the framework automatically completes rollback and commit

Disadvantages of AT mode:

  • The soft state between the two phases belongs to the final consistency

  • The snapshot function of the framework will affect performance, but it is much better than XA mode

 

1.2.6. Implement AT mode

Actions such as snapshot generation and rollback in AT mode are automatically completed by the framework without any code intrusion, so the implementation is very simple.

However, AT mode requires a table to record global locks and another table to record data snapshot undo_log.

1. Import the database table and record the global lock

Among them, the lock_table is imported into the database associated with the TC service, and the undo_log table is imported into the database associated with the microservice:

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for undo_log
-- ----------------------------
DROP TABLE IF EXISTS `undo_log`;
CREATE TABLE `undo_log`  (
  `branch_id` bigint(20) NOT NULL COMMENT 'branch transaction id',
  `xid` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT 'global transaction id',
  `context` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT 'undo_log context,such as serialization',
  `rollback_info` longblob NOT NULL COMMENT 'rollback info',
  `log_status` int(11) NOT NULL COMMENT '0:normal status,1:defense status',
  `log_created` datetime(6) NOT NULL COMMENT 'create datetime',
  `log_modified` datetime(6) NOT NULL COMMENT 'modify datetime',
  UNIQUE INDEX `ux_undo_log`(`xid`, `branch_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = 'AT transaction mode undo table' ROW_FORMAT = Compact;

-- ----------------------------
-- Records of undo_log
-- ----------------------------

-- ----------------------------
-- Table structure for lock_table
-- ----------------------------
DROP TABLE IF EXISTS `lock_table`;
CREATE TABLE `lock_table`  (
  `row_key` varchar(128) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `xid` varchar(96) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `transaction_id` bigint(20) NULL DEFAULT NULL,
  `branch_id` bigint(20) NOT NULL,
  `resource_id` varchar(256) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `table_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `pk` varchar(36) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `gmt_create` datetime NULL DEFAULT NULL,
  `gmt_modified` datetime NULL DEFAULT NULL,
  PRIMARY KEY (`row_key`) USING BTREE,
  INDEX `idx_branch_id`(`branch_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;


SET FOREIGN_KEY_CHECKS = 1;

 2. Modify the application.yml file and change the transaction mode to AT mode:

seata:
  data-source-proxy-mode: AT # 默认就是AT

 

1.3, TCC mode

The TCC mode is very similar to the AT mode, and each stage is an independent transaction. The difference is that TCC implements data recovery through manual coding. Three methods need to be implemented:

  • Try: resource detection and reservation;

  • Confirm: Complete the resource operation business; it is required that the Try succeeds and the Confirm must succeed.

  • Cancel: Reserved resources are released, which can be understood as the reverse operation of try.

  

1.3.1. Process Analysis

For example, a business that deducts user balance. Assuming that the original balance of account A is 100, the balance needs to be deducted by 30 yuan.

  • Stage 1 (Try) : Check whether the balance is sufficient, if it is sufficient, the frozen amount will be increased by 30 yuan, and the available balance will be deducted by 30 yuan

Initial balance:

The balance is sufficient and can be frozen:

 At this point, the total amount = frozen amount + available amount, and the quantity remains unchanged at 100. Transactions commit directly without waiting for other transactions.

  • Phase 2 (Confirm) : If you want to submit (Confirm), the frozen amount will be deducted by 30

Confirm that it can be submitted, but the available amount has been deducted before, so just clear the frozen amount here:

At this point, the total amount = frozen amount + available amount = 0 + 70 = 70 yuan  

  • Phase 2 (Cancel) : If you want to roll back (Cancel), the frozen amount will be deducted by 30, and the available balance will be increased by 30

If a rollback is required, the frozen amount must be released and the available amount restored:

 

1.3.2, Seata's TCC model

The TCC model in Seata still continues the previous transaction architecture, as shown in the figure:

 

1.3.3. Advantages and disadvantages

What does each stage of TCC mode do?

  • Try: resource checking and reservation

  • Confirm: business execution and submission

  • Cancel: release of reserved resources

What are the advantages of TCC?

  • Complete the direct commit transaction in one stage, release database resources, and have good performance

  • Compared with the AT model, there is no need to generate snapshots, no need to use global locks, and the performance is the strongest

  • Does not rely on database transactions, but relies on compensation operations, which can be used for non-transactional databases

What are the disadvantages of TCC?

  • There is code intrusion, and it is too troublesome to manually write try, confirm and cancel interfaces

  • Soft state, transactions are eventually consistent

  • It is necessary to consider the failure of Confirm and Cancel, and do idempotent processing

 

1.3.4, transaction suspension and empty rollback

empty rollback


When the try phase of a branch transaction is blocked , it may cause the global transaction to time out and trigger the cancel operation of the second phase. When the try operation is not executed, the cancel operation is executed first. At this time, the cancel cannot be rolled back, which is an empty rollback .

As shown in the picture:

When executing the cancel operation, it should be judged whether the try has been executed, and if it has not been executed, it should be rolled back empty.  


business suspension


For the business that has been rolled back empty, the previously blocked try operation resumes, and if the try continues to be executed, it will never be possible to confirm or cancel, and the transaction is always in an intermediate state, which is the business suspension .

When executing the try operation, it should be judged whether the cancel has been executed. If it has been executed, the try operation after the empty rollback should be prevented to avoid suspension

 

1.3.5. Implement TCC mode

To solve the problem of empty rollback and business suspension, it is necessary to record the current transaction status, is it in try or cancel?

 1. Thinking Analysis

Here we define a table:

CREATE TABLE `account_freeze_tbl` (
  `xid` varchar(128) NOT NULL,
  `user_id` varchar(255) DEFAULT NULL COMMENT '用户id',
  `freeze_money` int(11) unsigned DEFAULT '0' COMMENT '冻结金额',
  `state` int(1) DEFAULT NULL COMMENT '事务状态,0:try,1:confirm,2:cancel',
  PRIMARY KEY (`xid`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

in:

  • xid: is the global transaction id

  • freeze_money: used to record the frozen amount of the user

  • state: used to record transaction status

 

At this time, how should we start our business?

  • Try business:

    • Record the frozen amount and transaction status to the account_freeze table

    • Deduct the amount available in the account table

  • Confirm business

    • Delete the frozen record of account_freeze table according to xid

  • Cancel business

    • Modify the account_freeze table, the frozen amount is 0, and the state is 2

    • Modify the account table to restore the available amount

  • How to judge whether the rollback is empty?

    • In the cancel business, query account_freeze according to the xid, if it is null, it means that the try has not been done yet, and an empty rollback is required

  • How to avoid business suspension?

    • In the try business, query account_freeze according to xid, if it already exists, it proves that Cancel has been executed, and refuses to execute the try business

Next, we transform the account-service and use TCC to implement the balance deduction function.

 

2. Declare the TCC interface

TCC's Try, Confirm, and Cancel methods all need to be declared in the interface based on annotations.

cn.itcast.account.serviceWe create a new interface in the package in the account-service project , and declare three interfaces of TCC:

@LocalTCC
public interface AccountTCCService {

    @TwoPhaseBusinessAction(name = "deduct", commitMethod = "confirm", rollbackMethod = "cancel")
    void deduct(@BusinessActionContextParameter(paramName = "userId") String userId,
                @BusinessActionContextParameter(paramName = "money")int money);

    boolean confirm(BusinessActionContext ctx);

    boolean cancel(BusinessActionContext ctx);
}

 

3. Write the implementation class

Create a new class under the package in the account-service service cn.itcast.account.service.implto implement the TCC business

@Service
@Slf4j
public class AccountTCCServiceImpl implements AccountTCCService {

    @Autowired
    private AccountMapper accountMapper;
    @Autowired
    private AccountFreezeMapper freezeMapper;

    @Override
    @Transactional
    public void deduct(String userId, int money) {
        // 0.获取事务id
        String xid = RootContext.getXID();
        // 1.扣减可用余额
        accountMapper.deduct(userId, money);
        // 2.记录冻结金额,事务状态
        AccountFreeze freeze = new AccountFreeze();
        freeze.setUserId(userId);
        freeze.setFreezeMoney(money);
        freeze.setState(AccountFreeze.State.TRY);
        freeze.setXid(xid);
        freezeMapper.insert(freeze);
    }

    @Override
    public boolean confirm(BusinessActionContext ctx) {
        // 1.获取事务id
        String xid = ctx.getXid();
        // 2.根据id删除冻结记录
        int count = freezeMapper.deleteById(xid);
        return count == 1;
    }

    @Override
    public boolean cancel(BusinessActionContext ctx) {
        // 0.查询冻结记录
        String xid = ctx.getXid();
        AccountFreeze freeze = freezeMapper.selectById(xid);

        // 1.恢复可用余额
        accountMapper.refund(freeze.getUserId(), freeze.getFreezeMoney());
        // 2.将冻结金额清零,状态改为CANCEL
        freeze.setFreezeMoney(0);
        freeze.setState(AccountFreeze.State.CANCEL);
        int count = freezeMapper.updateById(freeze);
        return count == 1;
    }
}

  

1.4, SAGA mode

Saga mode is Seata's upcoming open source long transaction solution, which will be mainly contributed by Ant Financial.

Its theoretical basis is the paper Sagas published by Hector & Kenneth in 1987 .

Seata official website guide for Saga: Seata Saga mode

 

1.4.1. Principle

In the Saga mode, there are multiple participants in the distributed transaction, and each participant is a reversal compensation service, which requires the user to implement its forward operation and reverse rollback operation according to the business scenario.

During the execution of the distributed transaction, the forward operations of each participant are executed sequentially. If all the forward operations are executed successfully, the distributed transaction is committed. If any forward operation fails, the distributed transaction will go back to perform the reverse rollback operation of the previous participants, roll back the submitted participants, and return the distributed transaction to the initial state.

Saga is also divided into two stages:

  • Phase 1: Submit local transactions directly

  • Phase 2: If it succeeds, do nothing; if it fails, it will roll back by writing compensation business

  

1.4.2. Advantages and disadvantages

advantage:

  • Transaction participants can implement asynchronous calls based on event-driven, high throughput

  • Submit transactions directly in one stage, no locks, good performance

  • It is easy to implement without writing the three stages in TCC

shortcoming:

  • The duration of the soft state is uncertain and the timeliness is poor

  • No locks, no transaction isolation, dirty writes

 

1.5. Comparison of four modes

We compare the four implementations in the following aspects:

  • Consistency: Can transaction consistency be guaranteed? Strong consistency or eventual consistency?

  • Isolation: How isolated are transactions?

  • Code intrusion: Do you need to modify the business code?

  • Performance: Is there any performance loss?

  • Scenarios: common business scenarios

As shown in the picture:

  

 

2. High availability

As the core of distributed transactions, Seata's TC service must ensure the high availability of the cluster.

 

2.1, high availability architecture model

Building a TC service cluster is very simple, just start multiple TC services and register with nacos.

However, the cluster cannot guarantee 100% security. What if the computer room where the cluster is located fails? Therefore, if the requirements are high, disaster recovery with multiple computer rooms in different places is generally done.

For example, one TC cluster is in Shanghai and another TC cluster is in Hangzhou:

The microservice finds which TC cluster should be used based on the mapping relationship between the transaction group (tx-service-group) and the TC cluster. When the SH cluster fails, you only need to change the mapping relationship in vgroup-mapping to HZ. Then all microservices will be switched to HZ's TC cluster.

 

2.2, to achieve high availability

1. Simulate remote disaster recovery TC cluster

It is planned to start two seata tc service nodes:

node name ip address The port number cluster name
set 127.0.0.1 8091 SH
seata2 127.0.0.1 8092 HZ

We have started a seata service before, the port is 8091, and the cluster name is SH.

Now, copy the seata directory and name it seata2

Modify seata2/conf/registry.conf as follows:

registry {
  # tc服务的注册中心类,这里选择nacos,也可以是eureka、zookeeper等
  type = "nacos"

  nacos {
    # seata tc 服务注册到 nacos的服务名称,可以自定义
    application = "seata-tc-server"
    serverAddr = "127.0.0.1:8848"
    group = "DEFAULT_GROUP"
    namespace = ""
    cluster = "HZ"
    username = "nacos"
    password = "nacos"
  }
}

config {
  # 读取tc服务端的配置文件的方式,这里是从nacos配置中心读取,这样如果tc是集群,可以共享配置
  type = "nacos"
  # 配置nacos地址等信息
  nacos {
    serverAddr = "127.0.0.1:8848"
    namespace = ""
    group = "SEATA_GROUP"
    username = "nacos"
    password = "nacos"
    dataId = "seataServer.properties"
  }
}

Enter the seata2/bin directory, and then run the command:

seata-server.bat -p 8092

Open the nacos console to view the service list:

 

Click to view details:

 

2. Configure the transaction group mapping to nacos

Next, we need to configure the mapping relationship between tx-service-group and cluster to the nacos configuration center.

Create a new configuration:

 

The content of the configuration is as follows:

# 事务组映射关系
service.vgroupMapping.seata-demo=SH

service.enableDegrade=false
service.disableGlobalTransaction=false
# 与TC服务的通信配置
transport.type=TCP
transport.server=NIO
transport.heartbeat=true
transport.enableClientBatchSendRequest=false
transport.threadFactory.bossThreadPrefix=NettyBoss
transport.threadFactory.workerThreadPrefix=NettyServerNIOWorker
transport.threadFactory.serverExecutorThreadPrefix=NettyServerBizHandler
transport.threadFactory.shareBossWorker=false
transport.threadFactory.clientSelectorThreadPrefix=NettyClientSelector
transport.threadFactory.clientSelectorThreadSize=1
transport.threadFactory.clientWorkerThreadPrefix=NettyClientWorkerThread
transport.threadFactory.bossThreadSize=1
transport.threadFactory.workerThreadSize=default
transport.shutdown.wait=3
# RM配置
client.rm.asyncCommitBufferLimit=10000
client.rm.lock.retryInterval=10
client.rm.lock.retryTimes=30
client.rm.lock.retryPolicyBranchRollbackOnConflict=true
client.rm.reportRetryCount=5
client.rm.tableMetaCheckEnable=false
client.rm.tableMetaCheckerInterval=60000
client.rm.sqlParserType=druid
client.rm.reportSuccessEnable=false
client.rm.sagaBranchRegisterEnable=false
# TM配置
client.tm.commitRetryCount=5
client.tm.rollbackRetryCount=5
client.tm.defaultGlobalTransactionTimeout=60000
client.tm.degradeCheck=false
client.tm.degradeCheckAllowTimes=10
client.tm.degradeCheckPeriod=2000

# undo日志配置
client.undo.dataValidation=true
client.undo.logSerialization=jackson
client.undo.onlyCareUpdateColumns=true
client.undo.logTable=undo_log
client.undo.compress.enable=true
client.undo.compress.type=zip
client.undo.compress.threshold=64k
client.log.exceptionRate=100

 

3. The microservice reads the nacos configuration

Next, you need to modify the application.yml file of each microservice to let the microservice read the client.properties file in nacos:

seata:
  config:
    type: nacos
    nacos:
      server-addr: 127.0.0.1:8848
      username: nacos
      password: nacos
      group: SEATA_GROUP
      data-id: client.properties

Restart the microservice. Whether the microservice is connected to tc's SH cluster or tc's HZ cluster is determined by the client.properties of nacos.

Guess you like

Origin blog.csdn.net/a1404359447/article/details/130488781