[Fat] towards your interface, you really can withstand high concurrency?

Foreword

Data in this chapter is to explain the problem a while back pressure measured. Then direct straight to the point

Some friends do not do not know forceTransactionTemplatethis is why, first of all here to popularize, in Java, we generally have three ways to open the transaction

  • XML configuration of the service and the method name section, to open the transaction (with a higher frequency of previous years, is now basically rarely used)

  • @Transactional annotation turned Affairs (most frequently used)

  • Transactions using spring template (screenshot of the way, almost no one used)

Why do not we first tangled third, behind talking about 事务传播机制when I will be devoted to, we focus on what topic, as long as you now know, that is the meaning of open transactions on the line. I deliberately use the logging code red and blue circle up, meaning that, when entering a method of printing log, and then turn on the transaction, and then print a log after a wave of pressure measured and found to frequent overtime, our data are consistent pressure does not go up to see the log follows:

We found. The two log output time interval, actually spent nearly five seconds! Why open a transaction with 5 seconds?事出反常必有妖!

How to cut to solve the problem

Online high concurrent encounter problems, due to the generally high concurrency problems more difficult to reproduce, so generally are based fertilizer eyes towards the compilation, nine shallow a depth look at the source code static analysis approach to specifically refer to a locally run, on-line it collapse? panic! but considering the number of fat toward the public still has a small part of the new focus of the fans have not mastered skills to analyze problems herein, this analysis will revisit some of the common ways of experiencing such problems, and will not encounter when the issue,慌得一比!

Fortunately, the difficulty of this problem is not large concurrent herein, this case is ideal for white investigation started, we can reproduce the scene by a local analog, will narrow the scope of the problem, so as to gradually positioning problem.

Local reproduce

First, we can prepare a concurrent tools, this tool class, you can. Not friendly mobile phone view the code in the local environment simulation concurrency scenarios, but it does not matter, are the following code to copy and paste into your project to reproduce the problem with, 并不是给你手机上看的As Why this tool can simulate concurrent class scene, because of this utility class code ** 全是JDK中的代码**, is the core CountDownLatchclass, this principle based on the keywords you provide me with the front of your favorite search engine search.

CountDownLatchUtil.java

public class CountDownLatchUtil {

    private CountDownLatch start;
    private CountDownLatch end;
    private int pollSize = 10;

    public CountDownLatchUtil() {
        this(10);
    }

    public CountDownLatchUtil(int pollSize) {
        this.pollSize = pollSize;
        start = new CountDownLatch(1);
        end = new CountDownLatch(pollSize);
    }

    public void latch(MyFunctionalInterface functionalInterface) throws InterruptedException {
        ExecutorService executorService = Executors.newFixedThreadPool(pollSize);
        for (int i = 0; i < pollSize; i++) {
            Runnable run = new Runnable() {
                @Override
                public void run() {
                    try {
                        start.await();
                        functionalInterface.run();
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    } finally {
                        end.countDown();
                    }
                }
            };
            executorService.submit(run);
        }

        start.countDown();
        end.await();
        executorService.shutdown();
    }

    @FunctionalInterface
    public interface MyFunctionalInterface {
        void run();
    }
}
复制代码

HelloService.java

public interface HelloService {

    void sayHello(long timeMillis);

}
复制代码

HelloServiceImpl.java

@Service
public class HelloServiceImpl implements HelloService {

    private final Logger log = LoggerFactory.getLogger(HelloServiceImpl.class);

    @Transactional
    @Override
    public void sayHello(long timeMillis) {
        long time = System.currentTimeMillis() - timeMillis;
        if (time > 5000) {
            //超过5秒的打印日志输出
            log.warn("time : {}", time);
        }
        try {
            //模拟业务执行时间为1s
            Thread.sleep(1000);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
复制代码

HelloServiceTest.java

@RunWith(SpringRunner.class)
@SpringBootTest
public class HelloServiceTest {

    @Autowired
    private HelloService helloService;

    @Test
    public void testSayHello() throws Exception {
        long currentTimeMillis = System.currentTimeMillis();
        //模拟1000个线程并发
        CountDownLatchUtil countDownLatchUtil = new CountDownLatchUtil(1000);
        countDownLatchUtil.latch(() -> {
            helloService.sayHello(currentTimeMillis);
        });
    }

}
复制代码

We debug logs from local, we found a lot more than 5s interface, and there are some rules, specifically towards fertilizer use different colors frame framed for everyone

Why these times, all five as a group, and each group of data is a difference of about 1s it?

Truth

@TransactionalThe core code is as follows (I will specifically a series of follow-up analysis of this part of the source code, so as not to miss the core of attention towards the fat content). Here it simply is the TransactionInfo txInfo = createTransactionIfNecessary(tm, txAttr, joinpointIdentification);method would be to get a database connection.

if (txAttr == null || !(tm instanceof CallbackPreferringPlatformTransactionManager)) {
	// Standard transaction demarcation with getTransaction and commit/rollback calls.
	TransactionInfo txInfo = createTransactionIfNecessary(tm, txAttr, joinpointIdentification);
	Object retVal = null;
	try {
		// This is an around advice: Invoke the next interceptor in the chain.
		// This will normally result in a target object being invoked.
		retVal = invocation.proceedWithInvocation();
	}
	catch (Throwable ex) {
		// target invocation exception
		completeTransactionAfterThrowing(txInfo, ex);
		throw ex;
	}
	finally {
		cleanupTransactionInfo(txInfo);
	}
	commitTransactionAfterReturning(txInfo);
	return retVal;
}
复制代码

Parameters and then toward the fat in order to better demonstrate the problem, the database connection pool (Benpian use the Druid) made the following settings

//初始连接数
spring.datasource.initialSize=1
//最大连接数
spring.datasource.maxActive=5
复制代码

Since the maximum number of connections is 5. So when 1000 concurrent threads came in, you can imagine there is a team of 1,000 people line up, the front five, get connected, and execution of business time is 1 second. Then the team left 995 individual under, just wait outside. this and other five executed when finished. released five connections, five people turn to come back again, and perform one second business operations. by simple mathematics in primary schools, are 5 can calculate the last executed, how long. by this analysis, you know, why the above log output, a set of 5 seconds, and each interval of 1s and a.

How to deal with it

Read the fat source towards real fans know, never fat towards bullying, who dished out a problem, will be given the appropriate 其中一种solution. Of course program没有最优只有更优!

For example, see here, some people may say, you set the maximum number of connections ** 就像平时赞赏肥朝的金额一样小** If larger, naturally there would be no problem. Of course we demonstrate here for the convenience of the problem, we set the maximum number of connections is the normal number of connections to production in order to obtain a reasonable value based on business characteristics and constant pressure test, of course, towards fat also learned that the company machine configuration some students, but even greater than the market千元手机!!!

But in fact, was the pressure measured when setting the maximum number of connections to the database is 200, and then the pressure test pressure is not great. Why there will be the problem? So close look at the previous code

In which this 校验code is RPC calls, colleagues and the interface is not as fat as North Korea 值得托付终身般的高度可靠, leading to longer time-consuming, resulting in subsequent threads database connection time to wait too long. You said earlier according to another elementary school mathematics to count it is easy to understand why the pressure measurement problems.

Knock on the blackboard draw focus

North Korea said repeatedly before fertilizers, have problems, go through the depth of thinking, such as this, we think about what expansion of get it? We look at the fans before a job interview experience

In fact, he interviews encountered this problem, and our pressure measuring basic problem is the same problem, but in fact the interviewer's conclusion is not accurate enough. Let's look together Alibaba's development manual

So what kind of abuse is called? In fact, fat towards the view that even if this approach is often called, but all single-table insert, update operation, the execution time is very short, so bear a greater concurrency problem is not large. The key is that this transaction All method calls, whether or not it makes sense, or whether the transaction method is really to ensure that the transaction is the key, because some students in some of the more traditional companies are doing more 能用就行of CRUD work, it is easy to a service method directly marked Affairs notes begin transaction, then in a transaction, a large number of unrelated matters a dime and had nothing to do time-consuming operations, such as file IO operations, such as query-checking operation, etc. For example in this article 业务校验it is completely unnecessary in the transaction. work usually no corresponding real scene, with not concerned about the number of fat towards the public, the principle source of real combat scenes nothing. I asked a little interview on principle in pain, interview officer also had to change direction to continue in-depth!

Through this experience, what we think of expanding it? Because the problem is solved forever endless, but we can constantly thinking, put this question to squeeze more value! Let us look at Ali specification manual

Is summed up with the vernacular, to minimize the lock granularity. RPC method and try to avoid calling in the lock, because the RPC method involves a network of factors, there is a lot of time calling his uncontrollable, it is easy to cause the lock time is too occupied long.

In fact, this measure pressure and our problem is the same. First you call a local transaction transaction can not play the role of both RPC (RPC requires a distributed transaction assurance), but would lead to uncontrollable factors because RPC database connection occupied for too long. causing interface times out. of course, we can also APMbe time consuming combing topology interface tool, such problems before pressure test is exposed.

Written in the last

Guess you like

Origin juejin.im/post/5cf52ed3e51d454d1d6284c4