Interviewer: How to JVM Tuning (with real cases)

foreword

Interviewer: Have you ever done JVM tuning at work? Tell me what JVM tuning you have done?

I have a project with a QPS of less than 10. Last time I was asked about cache penetration and cache avalanche. This time I was asked about JVM tuning. It was really difficult for me.

But don't panic, I am enthusiastic and I have found a few full-point answers for you, and you can choose the appropriate use.

Answer 1: Listen, this will be my first JVM tuning.

Answer 2: I usually only tune during the interview.

Answer 3: I usually add machines and memory directly.

Answer 4: Lao Tzu used ZGC directly, and adjusted a snake skin.

text

1. Does the JVM need tuning?

After so many years of development and verification, the JVM is very robust as a whole. Personally, 99% of the time, JVM tuning is basically unnecessary.

Generally speaking, most of our JVM parameter configurations still follow the official JVM recommendations, such as:

  • -XX:NewRatio=2, young generation:old generation=1:2

  • -XX:SurvivorRatio=8,eden:survivor=8:1

  • The heap memory is set to about 3/4 of the physical memory

  • etc

The default (recommended) values ​​of JVM parameters are reasonable values ​​obtained through repeated testing by the JVM team and full verification by predecessors. Therefore, they are generally reliable and general, and generally do not cause major problems.

Of course, what's more important is that most applications have less than 10 QPS and less than tens of thousands of data. In this low-voltage environment, it is quite difficult to make the JVM have problems.

What most students encounter more often is their own code bugs that cause OOM, high CPU load, and frequent GC. These scenarios are basically code repairs, and usually do not need to move the JVM.

Of course, as the saying goes, nothing is absolute, and there are still a small number of scenarios that may require JVM tuning. The specific scenarios are described below.

It is worth mentioning that the JVM tuning we are talking about here is more about optimizing and adjusting JVM parameters for our own business scenarios to make them more suitable for our business, rather than changing the JVM source code.

2. There is no need for JVM tuning, and using a garbage collector with better performance can solve the problem?

This is a statement I saw on the Internet. Because there are many people who agree with it, I estimate that many students will also have this idea, so I will share my views here.

1) Practical perspective

Regardless of the factor of dealing with the interview, upgrading the garbage collector will indeed be one of the most effective ways, for example: CMS upgrade to G1, or even ZGC.

This is easy to understand. The higher version of the garbage collector is equivalent to the optimization of the JVM by the JVM developers. After all, people do this specially, so generally speaking, the performance of the higher version will be improved a lot.

G1 has already begun to be gradually applied. There are many teams around using G1 in JDK8. As far as I know, there are still many problems. Many students are constantly adjusting parameters, but in JDK11 How it can be optimized remains to be verified.

At present, ZGC is still relatively small, and it looks good only from the data released to the public. The maximum pause time does not exceed 10ms, or even 1ms. Everyone has high expectations. But from what I have collected so far, ZGC is not a silver bullet. The known obvious problems are:

  • The throughput will be lower than that of G1, and the official said that the maximum does not exceed 15%

  • If ZGC encounters a very high object allocation rate, it will not be able to keep up. At present, the only effective "tuning" method is to increase the size of the entire GC heap to give ZGC more breathing space - R is large The original words after communicating with the ZGC team leader

Moreover, with the subsequent application of ZGC, more problems will continue to appear in the future.

Overall, I personally think that JVM tuning is still necessary in some scenarios. After all, there is a saying: there is no best, only the most suitable.

2) Interview angle

If you answer to upgrade the garbage collector directly, the interviewer may agree, but this topic may end like this. The interviewer probably did not hear the answer he wanted. You will definitely not get extra points in this question, or even Points may be deducted.

So, in the interview, you can answer to upgrade the garbage collector, but you can't just answer to upgrade the garbage collector.

3. When is the JVM optimized?

Avoid premature optimization. Donald Ervin Knuth, author of The Art of Computer Programming, once said a classic saying:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

The real problem is that programmers spend too much time worrying about efficiency in the wrong places and at the wrong time; premature optimization is the root of all (or at least most) evil in programming.

Avoiding too early doesn't mean you don't care at all. A more correct approach should be to add monitoring alarms to some important JVM indicators of core services. When indicators fluctuate or become abnormal, you can intervene in time for investigation.

Interviewer: What are the core metrics of the JVM? What should be a reasonable range?

There is no unified answer to this question, because each service has different requirements for performance indicators such as AVG/TP999/TP9999, so the reasonable range is also different.

In order to prevent the interviewer from asking, for ordinary Java back-end applications, I will give a relatively reasonable range value here. The following metrics are for a single server:

  • jvm.gc.time: The GC time per minute is within 1s, preferably within 500ms

  • jvm.gc.meantime: Each YGC takes less than 100ms, preferably less than 50ms

  • jvm.fullgc.count: FGC once every few hours at most, preferably less than once a day

  • jvm.fullgc.time: Each FGC takes less than 1s, preferably less than 500ms

Generally speaking, as long as these indicators are normal, there will be no problems with other indicators. If there is a problem in other places, these indicators will generally be affected.

4. JVM optimization steps?

4.1. Analyze and locate the bottleneck of the current system

For the core metrics of the JVM, our focus and common tools are as follows:

1) CPU metrics

  • View the processes that are using the most CPU

  • View the threads that are using the most CPU

  • View thread stack snapshot information

  • Analyze code execution hotspots

  • See which code takes the longest CPU execution time

  • View the percentage of CPU time used by each method

Common commands:

// 显示系统各个进程的资源使用情况
top
// 查看某个进程中的线程占用情况
top -Hp pid
// 查看当前 Java 进程的线程堆栈信息
jstack pid

Common tools: JProfiler, JVM Profiler, Arthas, etc.

2) JVM memory metrics

  • Check whether the current JVM heap memory parameter configuration is reasonable

  • View statistics for objects in the heap

  • View heap storage snapshots and analyze memory usage

  • Check whether the memory growth of each area of ​​the heap is normal

  • See which area is causing the GC

  • Check whether the memory can be recovered normally after GC

Common commands:

// 查看当前的 JVM 参数配置
ps -ef | grep java
// 查看 Java 进程的配置信息,包括系统属性和JVM命令行标志
jinfo pid
// 输出 Java 进程当前的 gc 情况
jstat -gc pid
// 输出 Java 堆详细信息
jmap -heap pid
// 显示堆中对象的统计信息
jmap -histo:live pid
// 生成 Java 堆存储快照dump文件
jmap -F -dump:format=b,file=dumpFile.phrof pid

Common tools: Eclipse MAT, JConsole, etc.


3) JVM GC metrics

  • Check whether the GC time per minute is normal

  • Check whether the number of YGC per minute is normal

  • Check if the number of FGC is normal

  • Check whether a single FGC time is normal

  • View the detailed time-consuming of each stage of a single GC, and find the stage with serious time-consuming

  • Check whether the dynamic promotion age of the object is normal

The GC indicators of the JVM are generally viewed from the GC log. The default GC log may be relatively small. We can add the following parameters to enrich our GC log output and facilitate us to locate the problem.

Common JVM parameters for GC logs:

// 打印GC的详细信息
-XX:+PrintGCDetails
// 打印GC的时间戳
-XX:+PrintGCDateStamps
// 在GC前后打印堆信息
-XX:+PrintHeapAtGC
// 打印Survivor区中各个年龄段的对象的分布信息
-XX:+PrintTenuringDistribution
// JVM启动时输出所有参数值,方便查看参数是否被覆盖
-XX:+PrintFlagsFinal
// 打印GC时应用程序的停止时间
-XX:+PrintGCApplicationStoppedTime
// 打印在GC期间处理引用对象的时间(仅在PrintGCDetails时启用)
-XX:+PrintReferenceGC

The above are the common methods for locating the bottleneck of the system. Most of the problems can be located through the above methods to locate the cause of the problem, and then combine the code to find the root cause of the problem.

4.2. Determine optimization goals

After locating the bottleneck of the system, what is the goal of optimization before optimization, for example:

  • Reduce the number of FGCs from 1 per hour to 1 per day

  • Reduced GC time per minute from 3s to 500ms

  • Reduce each FGC time from 5s to less than 1s

  • ...

4.3. Develop an optimization plan

Develop corresponding optimization plans for the identified system bottlenecks, the common ones are:

  • Code bugs: Upgrade to fix bugs. Typical are: infinite loop, using unbounded queue.

  • Unreasonable JVM parameter configuration: Optimize JVM parameter configuration. Typical examples are: the young generation memory configuration is too small, the heap memory configuration is too small, and the metaspace configuration is too small.

4.4. Compare the indicators before and after optimization, and calculate the optimization effect

4.5. Continuously observe and track the optimization effect

4.6, if necessary, repeat the above steps

5. Tuning case: metaspace causes frequent FGC problems

The following cases come from the Internet or my own real experience, all of which can be justified. After understanding, students can use it to meet the interviewer.

Service environment: ParNew + CMS + JDK8

Problem phenomenon : FGC frequently occurs in the service

Reason analysis :

1) First check the GC log and find that the reason for FGC is that the metaspace space is not enough

Corresponding GC log:

Full GC (Metadata GC Threshold)

2) Further check the log and find that there is memory fragmentation in the metaspace

Corresponding GC log:

Metaspace       used 35337K, capacity 56242K, committed 56320K, reserved 1099776K

Here is a brief explanation of the meaning of these parameters

  • used : the amount of space used

  • capacity: the currently allocated and unreleased space capacity

  • committed: the size of the currently allocated space

  • reserved: reserved space size

Here used is easier to understand, reserved is not important in this case and can be ignored first, mainly because capacity and committed are easy to confuse.

It is easier to understand in combination with the following figure. The allocation of metaspace is in chunks. When a ClassLoader is garbage collected, all the space (chunk) belonging to it is released. At this time, the chunk is called Free Chunk, and the committed chunk is The sum of capacity chunk and free chunk.

The reason why the fragmentation of memory is said is based on the data of used and capacity. It is said that the allocation of meta space is in chunks. Even if a ClassLoader only loads one class, it will monopolize the entire chunk, so when used appears When the difference between capacity and capacity is large, it indicates that there is memory fragmentation at this time.

The GC log demo is as follows:

{Heap before GC invocations=0 (full 0):
 par new generation   total 314560K, used 141123K [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
  eden space 279616K,  50% used [0x00000000c0000000, 0x00000000c89d0d00, 0x00000000d1110000)
  from space 34944K,   0% used [0x00000000d1110000, 0x00000000d1110000, 0x00000000d3330000)
  to   space 34944K,   0% used [0x00000000d3330000, 0x00000000d3330000, 0x00000000d5550000)
 concurrent mark-sweep generation total 699072K, used 0K [0x00000000d5550000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 35337K, capacity 56242K, committed 56320K, reserved 1099776K
  class space    used 4734K, capacity 8172K, committed 8172K, reserved 1048576K
1.448: [Full GC (Metadata GC Threshold) 1.448: [CMS: 0K->10221K(699072K), 0.0487207 secs] 141123K->10221K(1013632K), [Metaspace: 35337K->35337K(1099776K)], 0.0488547 secs] [Times: user=0.09 sys=0.00, real=0.05 secs] 
Heap after GC invocations=1 (full 1):
 par new generation   total 314560K, used 0K [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
  eden space 279616K,   0% used [0x00000000c0000000, 0x00000000c0000000, 0x00000000d1110000)
  from space 34944K,   0% used [0x00000000d1110000, 0x00000000d1110000, 0x00000000d3330000)
  to   space 34944K,   0% used [0x00000000d3330000, 0x00000000d3330000, 0x00000000d5550000)
 concurrent mark-sweep generation total 699072K, used 10221K [0x00000000d5550000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 35337K, capacity 56242K, committed 56320K, reserved 1099776K
  class space    used 4734K, capacity 8172K, committed 8172K, reserved 1048576K
}
{Heap before GC invocations=1 (full 1):
 par new generation   total 314560K, used 0K [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
  eden space 279616K,   0% used [0x00000000c0000000, 0x00000000c0000000, 0x00000000d1110000)
  from space 34944K,   0% used [0x00000000d1110000, 0x00000000d1110000, 0x00000000d3330000)
  to   space 34944K,   0% used [0x00000000d3330000, 0x00000000d3330000, 0x00000000d5550000)
 concurrent mark-sweep generation total 699072K, used 10221K [0x00000000d5550000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 35337K, capacity 56242K, committed 56320K, reserved 1099776K
  class space    used 4734K, capacity 8172K, committed 8172K, reserved 1048576K
1.497: [Full GC (Last ditch collection) 1.497: [CMS: 10221K->3565K(699072K), 0.0139783 secs] 10221K->3565K(1013632K), [Metaspace: 35337K->35337K(1099776K)], 0.0193983 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
Heap after GC invocations=2 (full 2):
 par new generation   total 314560K, used 0K [0x00000000c0000000, 0x00000000d5550000, 0x00000000d5550000)
  eden space 279616K,   0% used [0x00000000c0000000, 0x00000000c0000000, 0x00000000d1110000)
  from space 34944K,   0% used [0x00000000d1110000, 0x00000000d1110000, 0x00000000d3330000)
  to   space 34944K,   0% used [0x00000000d3330000, 0x00000000d3330000, 0x00000000d5550000)
 concurrent mark-sweep generation total 699072K, used 3565K [0x00000000d5550000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 17065K, capacity 22618K, committed 35840K, reserved 1079296K
  class space    used 1624K, capacity 2552K, committed 8172K, reserved 1048576K
}

Metaspace is mainly suitable for storing class-related information, and the existence of memory fragmentation means that more class loaders are likely to be created, and the usage rate is low.

Therefore, when memory fragmentation occurs in the metaspace, we will focus on creating a large number of class loaders.

3) It is found that there are a large number of DelegatingClassLoaders through dump heap storage files

Through further analysis, it was found that a large number of DelegatingClassLoaders were created due to reflection. Its core principles are as follows:

On the JVM, reflection invocation of methods is initially implemented through JNI calls, and when the JVM notices that a method is frequently accessed through reflection, it will generate bytecode to perform the same operation, known as inflation. If the bytecode method is used, a DelegatingClassLoader will be generated for the method. If there are a large number of methods that are frequently called by reflection, a large number of DelegatingClassLoaders will be created.

How often does the reflection call get converted from JNI to bytecode?

The default is 15 times, which can be controlled by the parameter -Dsun.reflect.inflationThreshold. When the number of calls is less than this number, the method will be called by JNI. If the number of calls exceeds this number, the method call will be generated by means of bytecode.

Analysis conclusion : Reflection calls lead to the creation of a large number of DelegatingClassLoaders, occupying a large amount of metaspace memory, and at the same time, there is a phenomenon of memory fragmentation, resulting in low metaspace utilization, thus reaching the threshold quickly and triggering FGC.

Optimization Strategy:

1) Appropriately adjust the size of the metaspace.

2) Optimize unreasonable reflection calls. For example, the most common property copy tool class BeanUtils.copyProperties can be replaced by mapstruct.

Summarize

When the interviewer asked about JVM tuning, he could answer in the context of this article:

  • First of all, if you use a reasonable JVM parameter configuration, you should not need to tune in most cases - corresponding to the first question in this article

  • Secondly, there may still be a small number of scenarios that need to be tuned. We can configure monitoring alarms for some JVM core indicators, and human intervention analysis and evaluation when there are fluctuations - corresponding to the third question in this article

  • Finally, an actual tuning example is given to illustrate - corresponding to question 5 of this article

If the interviewer asks how to analyze and troubleshoot, you can use the common commands and tools in question 4 of this article to line up with it.

After this process, I believe that most interviewers will have a good impression of you.

finally

I am Jon Hui, a programmer who insists on sharing original technical dry goods . If you think this article is helpful to you, remember to like and follow, and we will see you in the next issue.

Recommended reading

Java basic high-frequency interview questions (the latest version in 2021)

Java Collection Framework high-frequency interview questions (2021 latest version)

Spring, a must-have interview, do you understand?

Interview must ask MySQL, do you understand?

Guess you like

Origin blog.csdn.net/v123411739/article/details/123778478