Why do I suggest that the online high-concurrency log output cannot include code locations?

Personal Creation Convention: I declare that all the articles I create are my own. If there is any reference to any article, it will be marked. If there are any omissions, you are welcome to criticize. If you find any plagiarism of this article on the Internet, please report it, and actively submit an issue to this github repository , thank you for your support~

This article is the second part of the "Why I Suggest" series. In this series, I will explain, analyze and interpret some development suggestions and requirements of development specifications for background development in the group under high concurrency scenarios. Avoid some pits when facing high concurrent business. Past reviews:

When the business first went online, our online log level was INFO, and the code location was output in the log content. The format is for example:

2022-03-02 19:57:59.425  INFO [service-apiGateway,,] [35800] [main][org.springframework.cloud.gateway.route.RouteDefinitionRouteLocator:88]: Loaded RoutePredicateFactory [Query]
复制代码

The log framework we use is Log4j2, and the method is asynchronous logging. The WaitStrategy of Log4j2's Disruptor adopts Sleep, which is more balanced and has a smaller CPU usage, that is, configured: AsyncLoggerConfig.WaitStrategy=Sleep. With the growth of the business, we found that the CPU usage of some instances is very high (especially those with a large amount of log output in a short period of time). We dumped JFR for further positioning:

First, let's look at GC. Our GC algorithm is G1, which is mainly viewed through the event of G1 Garbage Collection:

image

It is found that all GCs are Young GCs, and the time consumption is relatively normal, and there is no obvious abnormality in the frequency.

Next, look at the CPU usage. Look directly at the Thread CPU Load event to see the CPU usage of each thread. It is found that reactor-http-epollthe threads of the thread pool have a high CPU usage, which together are close to 100%.

image

These threads are the threads that reactor-netty handles business. Observe other instances and find that under normal circumstances, there will not be such a high CPU load. So why is there such a high load? Take a Thread Dump to see what the thread stack finds.

通过查看多个线程堆栈 dump,发现这些线程基本都处于 Runnable,并且执行的方法是原生方法,和StackWalker相关,例如(并且这个与 JFR 中采集的 Method Runnable 事件中占比最高的吻合,可以基本确认这个线程的 CPU 主要消耗在这个堆栈当前对应的方法上): image

主要和这两个原生方法有关

  • java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk
  • java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames

并且需要注意微服务中线程堆栈会很深(150左右),对于响应式代码更是如此(可能会到300),主要是因为 servlet 与 filter 的设计是责任链模式,各个 filter 会不断加入堆栈。响应式的代码就更是这样了,一层套一层,各种拼接观察点。上面列出的堆栈就是响应式的堆栈。

会到那两个原生方法,其实这里的代码是在做一件事,就是日志中要输出调用打印日志方法的代码位置,包括类名,方法名,方法行数这些。在上面我给出的线程堆栈的例子中,调用打印日志方法的代码位置信息就是这一行:at com.xxx.apigateway.filter.AccessCheckFilter.filter(AccessCheckFilter.java:144),这一行中,我们使用 log.info() 输出了一些日志。

可以看出,Log4j2 是通过获取当前线程堆栈来获取调用打印日志方法的代码位置的。并且并不是堆栈中的栈顶就是调用打印日志方法的代码位置,而是找到 log4j2 堆栈元素之后的第一个堆栈元素才是打印日志方法的代码位置

Log4j2 中是如何获取堆栈的

我们先来自己思考下如何实现:首先 Java 9 之前,获取当前线程(我们这里没有要获取其他线程的堆栈的情况,都是当前线程)的堆栈可以通过:

image 其中 Thread.currentThread().getStackTrace(); 的底层其实就是 new Exception().getStackTrace(); 所以其实本质是一样的。

Java 9 之后,添加了新的 StackWalker 接口,结合 Stream 接口来更优雅的读取堆栈,即: image

我们先来看看 new Exception().getStackTrace(); 底层是如何获取堆栈的:

javaClasses.cpp image

然后是 StackWalker,其核心底层源码是: image

可以看出,核心都是填充堆栈详细信息,区别是一个直接填充所有的,一个会减少填充堆栈信息。填充堆栈信息,主要访问的其实就是 SymbolTable,StringTable 这些,因为我们要看到的是具体的类名方法名,而不是类的地址以及方法的地址,更不是类名的地址以及方法名的地址。那么很明显:通过 Exception 获取堆栈对于 Symbol Table 以及 String Table 的访问次数要比 StackWalker 的多,因为要填充的堆栈多

我们接下来测试下,模拟在不同堆栈深度下,获取代码执行会给原本的代码带来多少性能衰减

模拟两种方式获取调用打印日志方法的代码位置,与不获取代码位置会有多大性能差异

以下代码我参考的 Log4j2 官方代码的单元测试,首先是模拟某一调用深度的堆栈代码:

image 然后,编写测试代码,对比纯执行这个代码,以及加入获取堆栈的代码的性能差异有多大。

image

执行:查看结果:

image 从结果可以看出,获取代码执行位置,也就是获取堆栈,会造成比较大的性能损失。同时,这个性能损失,和堆栈填充相关。填充的堆栈越多,损失越大。可以从 StackWalker 的性能优于通过异常获取堆栈,并且随着堆栈深度增加差距越来越明显看出来

Why is it slow? Performance degradation test brought by String::intern

This performance degradation, from the previous analysis of the underlying JVM source code, can actually be seen because of the access to StringTable and SymbolTable, let's simulate this access, in fact, the underlying access to StringTable is through the intern method of String, that is, we The simulation test can be performed through the String::internmethod , and the test code is as follows:image

Test Results:image

StackWalkBenchmark.baselineComparing StackWalkBenchmark.toStringthe results of and , we see that there is no performance penalty in bh.consume(time);itself . But by comparing them StackWalkBenchmark.internwith StackWalkBenchmark.intern3the results of and , it is found that this performance degradation is also very serious, and the more access, the more serious the performance degradation (analogous to the previous stack).

conclusion and suggestion

From this, we can draw the following intuitive conclusions:

  1. The code line position is output in the log. Before Java 9, the stack was obtained by exception, and after Java 9, it was obtained through StackWalker.
  2. Both methods need to access SymbolTable and StringTable, StackWalker can reduce the amount of access by reducing the stack to be filled
  3. Both methods have serious performance degradation .

Therefore, I suggest: For microservice environments, especially reactive microservice environments, the stack depth is very deep. If a large number of logs are output, the log should not contain code locations, otherwise it will cause serious performance degradation .

After we turned off the output code line position, under the same pressure, the CPU usage was not so high, and the overall throughput was significantly improved.

Wechat search "Dry goods full Zhang Hash", follow the official account, brush every day, easily improve the technology, gain various offers

Guess you like

Origin juejin.im/post/7079198003940048926