Some exception handling of seata (source code analysis and processing)

Continue to create, accelerate growth! This is the 7th day of my participation in the "Nuggets Daily New Plan · June Update Challenge", click to view the details of the event


write in front

How to say, many times, we just use other people's open source 服务, some open source 框架.

If we can't go deep into the bottom layer of the framework to analyze, understand some implementation principles of the bottom layer of these frameworks.

When we encounter problems, we can only be Aba Aba Aba! ! ! This is a very frustrating reality. =_=

image.png

So we have to go deep into some excellent open source frameworks to learn, understand more of their underlying layers, and deal with the problems we encounter from source code analysis.

Today, I will share seatasome problems encountered in the use of distributed transactions, and how to solve them.

problem solved

1. Seata exception

abnormal:

io.seata.core.exception.GlobalTransactionException: Could not found global transaction xid = 192.168.2.121:8091:239779337648214016, may be has finished.

image.png

This problem, the seata official document has a resolution:

举例说明:

@GlobalTransactional(timeout=60000) 
public void A(){

​ call remoting B();//远程调用B服务​ local DB operation;

}

public void B(){

}
可能原因:
1. A 执行的总体时间超过了60000ms,导致全局事务发起了全局回滚,此时A或B方法继续执行DB操作,校验全局事务状态,发现全局事务已经回滚。
2. B服务执行超出其设定的readTimeout 返回异常给A并将异常抛出导致全局事务回滚,此时B服务执行DB操作时,校验全局事务状态,发现全局事务已经回滚。
3. tc集群节点时间不一致。

影响:出现这种情况时,数据会整体回滚至A方法执行前的数据的初态,从数据一致性的视角上看,数据是整体一致的。

除了上述情况,如果引用的是`seata-spring-boot-starter`的话,产生这个错误的原因也可能是因为一个bug,目前在1.5版本进行了修复,具体可以参考[issues4020](https://github.com/seata/seata/issues/4020),[PR4039](https://github.com/seata/seata/pull/4039)。

复制代码

Try to fix:

  • 1. @GlobalTransactional(timeout=60000), timeout*10 is set to 600s, which is already very long.

But the final result is not able to solve the problem.

github continues to find, there is such a description: 3438

image.png

This old man also encountered this problem. The official answer is that the tcnode time should be synchronized.

But I am a single node, there are no multiple tcnodes, and the time of the server and the time of the database are also synchronized.

So it still didn't solve my problem.

We have this problem, is this: 多次提交,总会成功那么一两次.

This shows that there is no problem with our business code.

And this problem has only recently appeared, and it did not appear before.

image.png

  • 2. Try to solve and debug the seata source code from the source code.

This is a very cumbersome process and a very brain-burning process, but we still have to be patient and debug step by step.

This process is ignored here, because the source code debugging did not find the problem.

  • 3. Then analyze from experience and practice.
1.为什么之前都好好的?
2.为什么时得时不行?
复制代码

From the analysis of these two problems, it may be caused by environmental problems, so let's try to restore a clean seataservice.

How to test it?

We do this by service.vgroupMapping.xxxx_tx_group=defaultgoing to find the seata service registered to nacos

image.png

Default means to find the seata service under that cluster.

Then we can change this value and change it to a new seata service of our own,

image.png

image.png

For example this

Then, let our microservice connect to llsseata under the cluster. Then test it out.

The effect of this test is also very powerful, and this abnormality will basically not occur.

That's it, here, we can draw such a conclusion:

When the seata server registers too many services, the load of seata will increase, resulting in excessive pressure on seata.

This exception also occurred.

Why do we sign up for so many services?

Because we use the function of multiple data sources, each data source will also be registered with seata, which leads to a high load of seata.

  • 4. The final solution

Increase the resources of the server, deploy the cluster in seata, and reduce the pressure on a single seata.

Okay, I'll be here first today, skimming, skimming! ! ! ^_^

If you find it useful, please help 点赞、评论、收藏! ! !

image.png

Guess you like

Origin juejin.im/post/7103901225455714317