TCC Affairs principle

Original: https://yq.aliyun.com/articles/682871

This paper describes the principle of the TCC, and analyzed from the perspective of how to implement the code; examples are not directed to specific use. This analysis is on github open source project tcc-transaction code, address: https: //github.com/changmingxie/tcc-transaction, of course, there are more tcc project on github, but their principles are similar, but more so the introduction, interested junior partner to read the source code itself. A  TCC architecture

1   architecture

 

f81a815047d761053273ff2e5f33b9e603e4f713

As shown in FIG:

 - a complete business activities by a master of business services and business services from a number of components.

 - Primary Business Services is responsible for initiating and completing the overall business activity.

 - Provide TCC type business operations from a business service.

 - Business activity manager to control the consistency of business activities, operating its business registration activities, and confirm the operation at the time of submission of the operational activities carried out cancel action if the business activities canceled

TCC and 2PC / 3PC like, but TCC transaction control all aspects of the business code, and 2PC / 3PC is the resource level.

2   each stage of specification

TCC transaction actually consists of two main stages: Try stage, Confirm / Cancel stage.

We can see from the logical model of the TCC, TCC's core idea is, try checking stage and set aside resources to ensure that there are resources available confirm stage, so you can ensure maximum confirm the successful implementation stage.

1)     a TRY-attempts to perform business

Completion of all business checks (consistency)

Business must set aside resources (quasi-isolation)

2)     confirm- confirm the execution of business

The real implementation of business

It does not make any business checks

Try using only business resource reservation stage

Confirm operation must ensure idempotency 

3)    CANCEL-cancels the execution of business

Try to release business resource reservation stage

Cancel operation must ensure idempotency

 

Dicarboxylic  one example

TCC transaction are discussed below with an example. Tom Tracy need to turn $ 10 when using TCC resolve this transaction, how to do it?

1   The main problem facing

We consider the problem faced by the transfer process:

 - the need to ensure Tom account balance of not less than 10 yuan.

 - the need to ensure the correctness of the account balance, for example: Suppose Tom is only 10 dollars, but at the same time to Tom Tracy, Angle transfer 10 yuan; to others when Tom transfer, may also receive other people's money turn, this time account the balance can not be there in confusion (Tracy accounts are also facing similar problems)

 - When a concurrent than larger, to be able to ensure performance.

 

2   ideas to solve the problem of TCC

5e77c9b1bbdfa607393644d3e2b3427db93b1c08

TCC ideas to solve a distributed transaction is a large transaction into smaller transactions dismantling.

 

. 3   the TCC processing logic

When a transaction using the TCC, the pseudo-code as follows:

@Compensable(confirmMethod = "transferConfirm", cancelMethod = "transferCancel")
@Transactional
public void transferTry(long fromAccountId, long toAccountId, int amount) {
    //检查Tom账户
    //锁定Tom账户
    //锁定Tracy账户
}

@Transactional
public void transferConfirm(long fromAccountId, long toAccountId, int amount) {
    //tom账户-10元
    //tracy账户+10元
}

@Transactional
public void transferCancel(long fromAccountId, long toAccountId, int amount) {
    //解除Tom账户锁定
    //接触Tracy账户锁定
}

逻辑如下图所示:

e3ff945eedac9e35269a47aa2d0efd4a9db844a1

在Try逻辑中需要确保Tom账户的余额足够,并锁定需要使用的资源(Tom、Tracy账户);如果这一步操作执行成功(没有出现异常),那么将执行Confirm方法,如果执行失败,那么将执行Cancel方法。注意Confirm、Cancel需要做好幂等。

 

 原理分析

在上面的TCC事务中,转账操作其实涉及六次操作,实际项目中,在任何一个步骤都可能失败,那么当任何一个步骤失败时,TCC框架是如何做到数据一致性的呢?

1  整体流程图

以下为TCC的处理流程图,他可以确保不管是在try阶段,还是在confirm/cancel阶段都可以确保数据的一致性。

b5b927eb63c1c76388a213f6a2825b7ae93c3a3b

从流程图上可以看到,TCC依赖于一条事务处理记录,在开始TCC事务前标记创建此记录,然后在TCC的每个环节持续更新此记录的状态,这样就可以知道事务执行到那个环节了,当一次执行失败,进行重试时同样根据此数据来确定当前阶段,并判断应该执行什么操作。

因为存在失败重试的逻辑,所以cancel、commit方法必须实现幂等。其实在分布式开发中,凡是涉及到写操作的地方都应该实现幂等。

 

2  TCC核心处理逻辑

因为使用了@Compensable注解,所以当调用transferTry方法前,首先进入代理类中。在TCC中有两个Interceptor会对@Compensable标注的方法生效,他们分别是:CompensableTransactionInterceptor(TCC主要逻辑在此Interceptor中完成)、ResourceCoordinatorInterceptor(处理资源相关的事宜)。

CompensableTransactionInterceptor#interceptCompensableMethod是TCC的核心处理逻辑。interceptCompensableMethod封装请求数据,为TCC事务做准备,源码如下:

public Object interceptCompensableMethod(ProceedingJoinPoint pjp) throws Throwable {
    Method method = CompensableMethodUtils.getCompensableMethod(pjp);
    Compensable compensable = method.getAnnotation(Compensable.class);
    Propagation propagation = compensable.propagation();
    TransactionContext transactionContext = FactoryBuilder.factoryOf(compensable.transactionContextEditor()).getInstance().get(pjp.getTarget(), method, pjp.getArgs());
    boolean asyncConfirm = compensable.asyncConfirm();
    boolean asyncCancel = compensable.asyncCancel();
    boolean isTransactionActive = transactionManager.isTransactionActive();
    if (!TransactionUtils.isLegalTransactionContext(isTransactionActive, propagation, transactionContext)) {
        throw new SystemException("no active compensable transaction while propagation is mandatory for method " + method.getName());
    }
    MethodType methodType = CompensableMethodUtils.calculateMethodType(propagation, isTransactionActive, transactionContext);
    switch (methodType) {
        case ROOT:
            return rootMethodProceed(pjp, asyncConfirm, asyncCancel);
        case PROVIDER:
            return providerMethodProceed(pjp, transactionContext, asyncConfirm, asyncCancel);
        default:
            return pjp.proceed();
    }
}

rootMethodProceed是TCC和核心处理逻辑,实现了对Try、Confirm、Cancel的执行,源码如下,重点注意标红加粗部分:

private Object rootMethodProceed(ProceedingJoinPoint pjp, boolean asyncConfirm, boolean asyncCancel) throws Throwable {
    Object returnValue = null;
    Transaction transaction = null;
    try {
        transaction = transactionManager.begin();
        try {
           returnValue = pjp.proceed();
        } catch (Throwable tryingException) {
            if (isDelayCancelException(tryingException)) {
               transactionManager.syncTransaction();
            } else {
               logger.warn(String.format("compensable transaction trying failed. transaction content:%s", JSON.toJSONString(transaction)), tryingException);
               transactionManager.rollback(asyncCancel);
            }
            throw tryingException;
        }
       transactionManager.commit(asyncConfirm);
    } finally {
        transactionManager.cleanAfterCompletion(transaction);
    }
    return returnValue;
}

在这个方法中我们看到,首先执行的是@Compensable注解标注的方法(try),如果抛出异常,那么执行rollback方法(cancel),否则执行commit方法(cancel)。

 

3  异常处理流程

考虑到在try、cancel、confirm过程中都可能发生异常,所以在任何一步失败时,系统都能够要么回到最初(未转账)状态,要么到达最终(已转账)状态。下面讨论一下TCC代码层面是如何保证一致性的。

1)  Begin

在前面的代码中,可以看到执行try之前,TCC通过transactionManager.begin()开启了一个事务,这个begin方法的核心是:

 - 创建一个记录,用于记录事务执行到那个环节了。

 - 注册当前事务到TransactionManager中,在confirm、cancel过程中可以使用此Transaction来commit或者rollback。

TransactionManager#begin方法

public Transaction begin() {
    Transaction transaction = new Transaction(TransactionType.ROOT);
   transactionRepository.create(transaction);
    registerTransaction(transaction);
    return transaction;
}

CachableTransactionRepository#create创建一个用于标识事务执行环节的记录,然后将transaction放到缓存中区。代码如下:

@Override
public int create(Transaction transaction) {
    int result = doCreate(transaction);
    if (result > 0) {
        putToCache(transaction);
    }
    return result;
}

CachableTransactionRepository有多个子类(FileSystemTransactionRepository、JdbcTransactionRepository、RedisTransactionRepository、ZooKeeperTransactionRepository),通过这些类可以实现记录db、file、redis、zk等的解决方案。

 

2)  Commit/rollback

在commit、rollback中,都有这样一行代码,用于更新事务状态:

transactionRepository.update(transaction);

这行代码将当前事务的状态标记为commit/rollback,如果失败会抛出异常,不会执行后续的confirm/cancel方法;如果成功,才会执行confirm/cancel方法。

 

3)  Scheduler

如果在try/commit/rollback过程中失败了,请求(transferTry方法)将会立即返回,TCC在这里引入了重试机制,即通过定时程序查询执行失败的任务,然后进行补偿操作。具体见:

TransactionRecovery#startRecover查询所有异常事务,然后逐个进行处理。注意重试操作有一个最大重试次数的限制,如果超过最大重试次数,此事务将会被忽略。

public void startRecover() {
    List<Transaction> transactions = loadErrorTransactions();
   recoverErrorTransactions(transactions);
}

private List<Transaction> loadErrorTransactions() {
    long currentTimeInMillis = Calendar.getInstance().getTimeInMillis();
    TransactionRepository transactionRepository = transactionConfigurator.getTransactionRepository();
    RecoverConfig recoverConfig = transactionConfigurator.getRecoverConfig();
    return transactionRepository.findAllUnmodifiedSince(new Date(currentTimeInMillis - recoverConfig.getRecoverDuration() * 1000));
}

private void recoverErrorTransactions(List<Transaction> transactions) {
    for (Transaction transaction : transactions) {
        if (transaction.getRetriedCount() > transactionConfigurator.getRecoverConfig().getMaxRetryCount()) {
           logger.error(String.format("recover failed with max retry count,will not try again. txid:%s, status:%s,retried count:%d,transaction content:%s", transaction.getXid(), transaction.getStatus().getId(), transaction.getRetriedCount(), JSON.toJSONString(transaction)));
            continue;
        }
        if (transaction.getTransactionType().equals(TransactionType.BRANCH)
                && (transaction.getCreateTime().getTime() +
               transactionConfigurator.getRecoverConfig().getMaxRetryCount() *
                       transactionConfigurator.getRecoverConfig().getRecoverDuration() * 1000
                > System.currentTimeMillis())) {
            continue;
        }
        try {
           transaction.addRetriedCount();
            if (transaction.getStatus().equals(TransactionStatus.CONFIRMING)) {
               transaction.changeStatus(TransactionStatus.CONFIRMING);
                transactionConfigurator.getTransactionRepository().update(transaction);
                transaction.commit();
               transactionConfigurator.getTransactionRepository().delete(transaction);
            } else if (transaction.getStatus().equals(TransactionStatus.CANCELLING)
                    || transaction.getTransactionType().equals(TransactionType.ROOT)) {
               transaction.changeStatus(TransactionStatus.CANCELLING);
               transactionConfigurator.getTransactionRepository().update(transaction);
                transaction.rollback();
               transactionConfigurator.getTransactionRepository().delete(transaction);
            }
        } catch (Throwable throwable) {
            if (throwable instanceof OptimisticLockException
                    || ExceptionUtils.getRootCause(throwable) instanceof OptimisticLockException) {
               logger.warn(String.format("optimisticLockException happened while recover. txid:%s, status:%s,retried count:%d,transaction content:%s", transaction.getXid(), transaction.getStatus().getId(), transaction.getRetriedCount(), JSON.toJSONString(transaction)), throwable);
            } else {
               logger.error(String.format("recover failed, txid:%s, status:%s,retried count:%d,transaction content:%s", transaction.getXid(), transaction.getStatus().getId(), transaction.getRetriedCount(), JSON.toJSONString(transaction)), throwable);
            }
        }
    }
}

 

 TCC优缺点

目前解决分布式事务的方案中,最稳定可靠的方案有:TCC、2PC/3PC、最终一致性。这三种方案各有优劣,有自己的适用场景。下面我们简单讨论一下TCC主要的优缺点。

1  TCC的主要优点有

因为Try阶段检查并预留了资源,所以confirm阶段一般都可以执行成功。

Resource locking is done in business code, does not block live DB, db can do no effect on performance.

TCC real-time high, all write operations are concentrated in the DB confirm, the result of the write operation of real-time return (because the timing of execution of the time, a slight delay failure).

2   The main disadvantage of TCC are

Source can be seen from the analysis, since the transaction state management, the operation will generate multiple DB, which will be some loss of performance, and to lengthen the time that the entire transaction TCC.

The more transactions directed side, the more complex Try, Confirm, Cancel code, (this is primarily in terms of the relative consistency of the final solution) greater reusability bottom. In addition the more parties involved, the longer processing time of these stages, the higher the possibility of failure.

 

Five  related documents

TCC-Transaction and the use of source documentation Reference: https://github.com/changmingxie/tcc-transaction

The final consistency solutions, refer to " RocketMQ practice "

Guess you like

Origin www.cnblogs.com/smileIce/p/11221610.html