A piece of code wrapped by Try-Catch almost made me lose my job!


A piece of code wrapped by try-catch suddenly had an exception after 200 days of stable operation on the production line, and this exception caused the rollback of the production line transaction.


image.png

Picture from Pexels


What happened during this period? How to avoid abnormal affairs in the daily project process? At this moment, the boss came over with the "XX Company's Notice on Optimization of Thirty Years Old Employees"...

image.png

01


Part of the production line data was lost because of a strange transaction rollback. What caused the transaction rollback was a piece of code wrapped by try-cath, a piece of code that had been running stably on the production line for 200 days, and it was so stable that we had forgotten it.


No one thought that it would return to our field of vision in such a way, announcing its existence!


Xiao Jiujiu is a 19-year-old programmer who is as sunny and handsome as all programmers. (Believe it or not, I don’t believe it anyway. To be able to start today’s article, just make it up like this. "A programmer without hair" starts well).


When he told me a piece of try-catch code caused the production line transaction to roll back, I gently and patiently said to him: "Go away, don’t you see that I’m busy?", then he gave me a piece Code, tell me with wretched and sincere eyes, what he said is true.


02


Let's take a look at the code that caused the rollback of the production line transaction, similar to the following:
@Transactional
public void main() {
    // 假设有多个user的操作,需要事务控制
    methodA();

    try {
        orderService.methodB();
    } catch (Exception e) {
        // order失败了不能影响该方法,不回滚。
        // 异常处理,略
    }
    userOtherProcess();
}

The methodA method requires transaction control, and the methodB method cannot affect the A transaction no matter what exception it encounters, so try-catch is added.


Some people may have the same first reaction as mine. Did the exception of the last userOtherProcess method cause the transaction rollback of methodA?


Xiao Jiujiu told me that it was really because of methodB. This code was rigorously tested and no one has touched it for 200 days.


Someone may have guessed the cause of the problem, so let's sell it first, because in this matter, the most important thing is how the pit was produced step by step.


In order to describe this thing more vividly, I drew a picture. The red background indicates that the method is transaction controlled, and the white background indicates that the method has no transaction:

image.png

At the beginning, as you can see in the code, methodA has transactions, and methodB has no transactions and is wrapped by try-catch, which runs perfectly.


After a period of time, it came to phase two, because some requirement changes added methodC, the business also relied on methodB, and it went online perfectly. image.png

After a period of time, it came to stage 3, and the related business that relied on methodC changed again, and some logic was added to methodB and transaction control was required.


After evaluation, it did not affect methodA, so after thorough testing, it went online again perfectly, but the hidden bomb was planted at this time.


Friends should have guessed the reason by this time, yes, you guessed it right. One day when methodA called methodB, methodB had an exception. Because it was an inherited transaction, although methodB had an exception and was try-catched, it still caused the methodA transaction to roll back.


For those who have not understood, you can see the picture below:

image.png

We can understand the transaction control mechanism as a long red room like the picture above. This room is guarded. He is responsible for the initiation and submission of transactions. Another important task is to monitor abnormalities.


Once the RuntimeException is found to roll back the entire transaction directly, we give him a title and call it "Supervisor".


Let's look at the code at stage three and the beginning. There is an @Transactional annotation at the beginning of the method, so he opened the door of the red room and put methodA in.


Then methodB came over, and the transaction—inheritance transaction—was opened, so the supervisor arranged methodB in this room.


Although methodB was abnormal and was wrapped by try-catch, he couldn't escape the attention of the supervisor, so he pressed the button to roll back the transaction.


After understanding this way, let's take a brief look at the source code:
org.springframework.transaction.UnexpectedRollbackExceptionTransaction rolled back because it has been marked as rollback-only
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.proce***ollback(AbstractPlatformTransactionManager.java:873)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:710)
    at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:534)

According to the exception prompt, you can see that the error occurred in the proceed***ollback method on line 873 of AbstractPlatformTransactionManager.


Find the caller's commit method through Find Usages. Obviously this is a logic of transaction commit.
@Override
public final void commit(TransactionStatus status) throws TransactionException {
    // 为便于阅读,删除部分代码
    ......
 if (!shouldCommitOnGlobalRollbackOnly() && defStatus.isGlobalRollbackOnly()) {
  // 为便于阅读,删除部分代码
  proce***ollback(defStatus, true);
  return;
 }
 processCommit(defStatus);
}


shouldCommitOnGlobalRollbackOnly: 默认实现是 false,意思是如果发现事务被标记全局回滚并且该标记不需要提交事务的话,那么则进行回滚。
defStatus.isGlobalRollbackOnly(): 判断是否是读取 DefaultTransactionStatus 中 transaction 对象的 ConnectionHolder 的 rollbackOnly 标志位。

继续往上追溯,来到 TransactionAspectSupport.invokeWithinTransaction 方法:

@Nullable
protected Object invokeWithinTransaction(Method method, @Nullable Class<?> targetClass,
  final InvocationCallback invocation)
 throws Throwable 
{
 // 为便于阅读,删除部分代码
    ......
    // 如果是声明式事务
 if (txAttr == null || !(tm instanceof CallbackPreferringPlatformTransactionManager)) {
  // Standard transaction demarcation with getTransaction and commit/rollback calls.
  TransactionInfo txInfo = createTransactionIfNecessary(tm, txAttr, joinpointIdentification);

  Object retVal;
  try {
   // This is an around advice: Invoke the next interceptor in the chain.
   // This will normally result in a target object being invoked.
   // 执行事务方法
   retVal = invocation.proceedWithInvocation();
  }
  catch (Throwable ex) {
   // 捕获异常,并将会把事务设置为Rollback回滚状态。
   completeTransactionAfterThrowing(txInfo, ex);
   throw ex;
  }
  finally {
   cleanupTransactionInfo(txInfo);
  }
  // 提交事务
  commitTransactionAfterReturning(txInfo);
  return retVal;
 }

 else {
  // 声明式事务,略
 }
}

整个执行过程参见注释说明,其它源码就不罗列了。Spring 捕获异常后,正如我们所猜测的,事务将会被设置全局 rollback。


而最外层的事务方法执行 commit 操作,这时由于事务状态为 rollback,Spring 认为不应该 commit 提交事务,而应该回滚事务,所以抛出 rollback-only 异常。


03


还有一个比较典型的事务问题就是:在同一个类中,mehtodA 没有事务,mehtodB 开启了(声明式)事务。


此时 mehtodA 调用 mehtodB 时事务是不生效的:

image.png

如上面这张图所示,我们还是把 AOP 想像成一个长方形的房间,由于 mehtodA 没有事务,这个房间已经被标志为没有事务无人值守了,mehtodB 虽然标记了事务,但很显然是不生效的。


接下来我们重新回顾一下事务的几种配置:

  • REQUIRED:支持当前事务,如果当前没有事务,就新建一个事务。这是最常见的选择。

  • REQUIRES_NEW:新建事务,如果当前存在事务,把当前事务挂起。

  • SUPPORTS:支持当前事务,如果当前没有事务,就以非事务方式执行。

  • MANDATORY:支持当前事务,如果当前没有事务,就抛出异常。

  • NEVER:以非事务方式执行,如果当前存在事务,则抛出异常。

  • NOT_SUPPORTED:以非事务方式执行操作,如果当前存在事务,就把当前事务挂起。

  • NESTED:支持当前事务,如果当前事务存在,则执行一个嵌套事务,如果当前没有事务,就新建一个事务。


这方面的文章很多,这里就不做描述了。


04


事务问题本身是比较难通过测试发现的,我们再来聊一聊项目过程中如何防止事务问题的发生。


比如笔者之前曾负责过支付及资金处理相关系统,产品的单笔交易额比较大,每笔至少 1 万+,正常 10 万+,很多时候一笔支付就是 300 万,所以容不得出现一笔资金差错。好在我们资金交易从 0 做到了 3000 亿,依然资金 0 差错。


针对可能的事务问题,我们采取的措施有:

  • 通过开发规范、产线坑集等文档、培训等让开发人员对事务有足够的了解、敏感度。

  • 系统设计时,对于关键的业务场景需要写明是否启用了事务,哪些方法包裹在一个事务中,并进行评审。

  • 代码 Review 环节有很多专项 Review,比如资金 Review、多线程 Review 等等,也有一项专门的事务 Review:需不需要加事务?事务配置是否正确?异常是否处理等。

  • 开发人员构造事务异常场景进行自测、交叉验证。

  • 测试团队参与系统设计评审,并进行事务相关测试。比如通过防火墙阻断请求、手动锁表等方式来模拟可能的事务异常。


笔者在之前一家公司还有一种做法就是通过开发规范约束:所有事务的方法全部以 tx 开头。


比如 methodB 方法需要开启事务,则新增一个 txMethodB 方法,在该方法中调用 methodB。通过这种方式完全可以避免上面问题的发生,但很显然这种方式相当地“丑陋”。


05


正和小九九聊着事务问题,老板手里拿着几张 A4 纸走了过来。


As the only 30-year-old programmer in the company, I raised my voice and said to Xiao Jiujiu: Have you noticed that there is a configuration item readOnly in @Transactional? If you need to use this parameter, you must start a transaction.


But if it is to read data, there is no need for transactions at all? Why is there such a contradictory configuration item? Xiao Jiujiu shook his head blankly.


The boss nodded at me, turned back to the office, sat down and thought for a while, then put the A4 paper in his hand "XX Company's Optimized Notice for Thirty-year-old Employee" at the bottom of the stack of materials in the drawer, and then pulled it out again Put it in the middle of the data.


It seems that my programming career can last for a while!


Guess you like

Origin blog.51cto.com/14410880/2545888