How to plan a jvm performance tuning reasonably

This is the third article in the JVM optimization series:

JVM performance tuning involves various trade-offs, often affecting the whole body, and it is necessary to comprehensively consider the impact of all aspects. But there are some basic theories and principles that understanding and following these principles will make your performance tuning tasks much easier. In order to better understand the content introduced in this article. You need to already know and follow the following:

1. Understand the jvm garbage collector

2. Have an understanding of common tools for jvm performance monitoring

3. Able to read gc log

4. Be sure not to tune for tuning, jvm tuning cannot solve all performance problems

 

These contents have been introduced in the previous two articles. If you don't understand them, you can click the above link to review them. If you don't understand them, it is not recommended to read this article.

 

This article is based on jvm performance tuning, combined with various parameters of jvm to tune the application, the main contents are as follows:

1. The general process of jvm tuning

2. Several performance indicators to be concerned with jvm tuning

3. Some principles that jvm tuning needs to master

4. Tuning strategies & examples

 

 

First, the level of performance tuning

In order to improve system performance, we need to optimize all aspects and levels of the system. The following are several levels that need to be optimized.

From the above, we can see that in addition to jvm tuning, there are several other levels that need to be dealt with, so the tuning for the system is not only the tuning of the jvm, but the overall tuning of the system to improve the system performance. This article is only for jvm tuning to explain, and other aspects will be introduced later.

Before performing jvm tuning, we assume that the project's architecture tuning and code tuning have been performed or are optimal for the current project. These two are the basis of JVM tuning, and architecture tuning has the greatest impact on the system. We cannot expect an application with flawed system architecture or endless code-level optimization. Through JVM tuning, it can achieve a qualitative leap. This is impossible.

In addition, before tuning, there must be a clear performance optimization goal, and then find its performance bottleneck. Afterwards, to optimize the bottleneck, it is also necessary to perform stress and benchmark tests on the application, and use various monitoring and statistical tools to confirm whether the optimized application has reached the relevant goals.

 

Second, the jvm tuning process

The ultimate goal of tuning is to make the application use the least hardware consumption to carry more throughput. JVM tuning is no exception. JVM tuning is mainly aimed at optimizing the collection performance of the garbage collector, so that applications running on virtual machines can use less memory and delay to obtain greater throughput. Of course, the least here is the best choice, not the less the better.

1. Performance Definition

To find and evaluate performance bottlenecks, we must first know the performance definition. For jvm tuning, we need to know the following three defining properties, which are the basis for evaluation:

  • Throughput: One of the important indicators, it refers to the highest performance indicators that the garbage collector can support the application, regardless of the pause time or memory consumption caused by garbage collection.
  • Latency: Its metric is to reduce the pause time due to garbage collection or completely eliminate the pause due to garbage collection to avoid jitter when the application is running.
  • Memory footprint: The amount of memory the garbage collector needs to run smoothly.

Among these three attributes, the improvement of the performance of any one of them is almost at the cost of the loss of the performance of the other one or two attributes. Important, to be determined based on the business needs of the application.

2. Performance tuning principles

During the tuning process, we should keep the following 3 principles in mind to help us more easily complete the tuning of garbage collection, so as to meet the performance requirements of the application.

1. MinorGC recycling principle: Every minor GC should collect as many garbage objects as possible. To reduce the frequency of Full GC in the application.

2. The principle of GC memory maximization: When dealing with throughput and latency issues, the larger the memory that the garbage processor can use, the better the garbage collection effect, and the smoother the application will be.

3. The principle of choosing 2 out of 3 for GC tuning: In the performance attributes, throughput, latency, and memory usage, we can only choose two of them for tuning, not both.

3. Performance tuning process

 

The above is the basic process of JVM tuning for an application. We can see that JVM tuning is a process of continuous optimization of configuration and multiple iterations based on performance test results. Each of the previous steps may go through multiple iterations before each system requirement metric is reached. Sometimes in order to achieve a certain aspect of the index, it may be necessary to adjust the previous parameters several times, and then all the previous steps need to be re-tested.

In addition, tuning generally starts from meeting the memory usage requirements of the program, followed by the time delay requirements, and finally the throughput requirements. It is necessary to continuously optimize based on this step. Each step is the basis for the next step, which is irreversible. Do it. Below we provide detailed examples for each step.

In terms of the running mode of the JVM, we directly choose the server mode, which is also the officially recommended mode after jdk1.6.

In terms of garbage collector, we directly use the default parallel collector in jdk1.6-1.8 (the new generation adopts parallelGC, and the old generation adopts parallelOldGC).

 

3. Determine the memory usage

Before determining the memory footprint, we need to know two knowledge points:

  1. The running phase of the application
  2. jvm memory allocation

 

1. Operation stage

The running phase of the application, I can divide it into the following three phases:

1. Initialization phase: JVM loads the application and initializes the main modules and data of the application.

2. Stable stage: The application runs for most of the time at this time, and after the stress test, various performance parameters are in a stable state. The core function is executed and has been warmed up by jit compilation.

3. Summary stage: In the final summary stage, some benchmark tests are performed to generate a corresponding policy report. We can ignore it at this stage.

To determine the memory usage and the size of active data, we should determine it in the stable stage of the program, not in the initial stage of the project. How to determine, let's look at the following jvm memory allocation.

 

2. JVM memory allocation & parameters

The main space in the jvm heap is composed of the above-mentioned new generation, old generation, and permanent generation. The entire heap size = new generation size + old generation size + permanent generation size. The specific object promotion method will not be introduced too much here. Let's look at some jvm command parameters and specify the heap size. If you do not specify the following parameters, the virtual machine will automatically select the appropriate value, and will also automatically adjust based on the system overhead.

 

Generation

parameter

describe

heap size

-Xms

Initial heap size, defaults to 1/64 of physical memory (<1GB)

 

-Xmx

The maximum heap size, the default (MaxHeapFreeRatio parameter can be adjusted) when the free heap memory is greater than 70%, the JVM will reduce the heap until the minimum limit of -Xms

Cenozoic

-XX:NewSize

The initial value of the new generation space size

 

-XX:MaxNewSize

The maximum size of the new generation space

 

-Xmn

The size of the new generation space, the size here is (eden+2 survivor space)

permanent generation

-XX:PermSize

Initial value & minimum value of permanent generation space

 

-XX:MaxPermSize

Maximum value of permanent generation space

old age

The space size of the old generation is implicitly set according to the size of the young generation

 

 

Initial value = -Xmx minus the value of -XX:NewSize

 

 

Min = -Xmx value minus -XX:MaxNewSize value

 

When setting, if you pay attention to the performance overhead, you should try to set the initial value and the maximum value of the permanent generation to the same value, because the size adjustment of the permanent generation requires FullGC to achieve.

 

3. Calculate the active data size

Calculating the active data size should follow the following process:

 

As mentioned earlier, the active data should be based on the long-term survival and the size of the space occupied by the objects in the Java heap when the application is stable.

The calculation of live data should ensure that the following conditions occur:

1. During the test, the startup parameters use the default parameters of the jvm, and are not set manually.

2. Make sure that the application is in the stable phase when the Full GC occurs.

Starting with the default parameters of the jvm is to observe the memory usage required by the application in the stable phase.

 

What is considered a stable stage?

It must be necessary to generate enough pressure to find a load with a similar state at the peak of the application and the production environment. After reaching the peak and maintaining a stable state, it can be regarded as a stable stage. Therefore, in order to achieve a stable stage, stress testing is essential. How to apply stress testing, this article will not explain too much, and there will be special introductions later.

 

When it is determined that the application is in the stable stage, pay attention to observe the GC log of the application, especially the Full GC log.

GC log commands: -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:<filename>

GC logs are the best way to collect information required for tuning. Even in a production environment, you can enable GC logs to locate problems. Enabling GC logs has minimal impact on performance, but can provide rich data.

 

There must be a FullGC log, if not, you can use the monitoring tool to force a call, or use the following command, you can also trigger

jmap -histo:live pid

 

When FullGC is triggered in the stable phase, we generally get the following information:

From the above gc logs, we can probably analyze the heap occupancy and GC time of the entire application when fullGC occurs. Of course, in order to be more accurate, it should be collected several times to obtain an average value. Or use the longest FullGC to estimate.

 

In the above figure, after fullGC, the old generation space occupies 93168kb (about 93MB), which we use as the active data of the old generation space.

The allocation of other heap space is based on the following rules .

space

Command parameters

It is recommended to expand the multiple

java heap

-Xms和-Xmx

3-4 times the old generation space after FullGC

permanent generation

-XX:PermSize

-XX:MaxPermSize

1.2-1.5 times the permanent band space occupation after FullGc

Cenozoic

-Xmn

1-1.5 times the old generation space after FullGC

old age

 

2-3 times the old generation space after FullGC

Based on the above rules and the FullGC information in the above figure, we can now plan the application heap space as:

java heap space: 373Mb (= old age space 93168kb*4)

New generation space: 140Mb (= old generation space 93168kb*1.5)

Permanent generation space: 5Mb (= permanent generation space 3135kb*1.5)

Old generation space: 233Mb=heap space-new generation viewing space=373Mb-140Mb

The corresponding application startup parameters should be:

java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5m

 

Fourth, delay tuning

After determining the active data size of the application, we need to perform latency tuning, because for the size of the heap memory at this time, the latency requirement cannot meet the needs of the application, and it needs to be debugged based on the situation of the application.

During this step, we may optimize the heap size configuration again, evaluate the duration and frequency of GC, and whether we need to switch to a different garbage collector.

 

1. System delay requirements

Before tuning, we need to know what the latency requirements of the system are, and what the corresponding latency tuning metrics are.

  • Average stall time acceptable to the application: This time is compared to the measured Minor GC duration.
  • Acceptable Minor GC Frequency: The frequency of Minor GC is compared to the tolerable value.
  • Maximum acceptable pause time: The maximum pause time is compared to the duration of the worst-case FullGC.
  • The maximum acceptable frequency of pauses: basically the frequency of FullGC.

Among the above, the average dead time and the maximum dead time are the most important to the user experience, so you can pay more attention.

Based on the above requirements, we need to count the following data:

  • duration of MinorGC;
  • Count the times of MinorGC;
  • Worst duration of FullGC;
  • In the worst case, the frequency of FullGC;

 

2. Optimize the size of the new generation

 

For example, in the above gc log, we can see that the average duration of Minor GC = 0.069 seconds, and the frequency of Minor GC is once every 0.389 seconds.

If the average dead time set by our system is 50ms, the current 69ms is obviously too long and needs to be adjusted.

We know that the larger the new generation space, the longer the GC time and the lower the frequency of Minor GC.

If you want to reduce its duration, you need to reduce its space size.

If you want to reduce its frequency, you need to increase its space size.

 

In order to reduce the minimal impact of changing the size of the young generation on other regions. When changing the size of the new generation space, try to keep the size of the old generation space as much as possible.

For example, the size of the new generation space is reduced by 10% this time, and the size of the old generation and the holding generation should be kept unchanged. The parameters after the first step of tuning are changed as follows:

java -Xms359m -Xmx359m -Xmn126m -XX:PermSize=5m -XX:MaxPermSize=5m

新生代的大小有140m变为126,堆大小顺应变化,此时老年代是没有变化的。

 

3. Optimize the size of the old age

As in the previous step, before optimization, the data of the gc log also needs to be collected. This time we focus on the duration and frequency of FullGC.

 

In the figure above, we can see

FullGC 平均频率 =5.8s

FullGC 平均持续时间=0.14s

(以上为了测试,真实项目的fullGC 没有这么快)

 

If there is no FullGC log, is there a way to evaluate it?

We can calculate it by the object lift rate.

 

Object lift rate

For example, in the above startup parameters, our old age size = 233Mb.

So how long does it take to fill the 233Mb free space in the old generation depends on the promotion rate from the young generation to the old generation.

Each time the old generation occupancy is increased = the java heap occupancy after each MinorGC minus the space occupancy of the new generation after MinorGC

Object promotion rate = average (occupancy of the old generation per promotion) divided by the old generation space

With the object promotion rate, we can calculate how many minorGCs are needed to fill the old generation space, which can be calculated in about one fullGC.

 

for example:

In the picture above:

第一次minor GC 之后,老年代空间:13740kb - 13732kb =8kb

第二次minor GC 之后,老年代空间:22394kb - 17905kb =4489kb

第三次minor GC 之后,老年代空间:34739kb - 17917kb =16822kb

第四次minor GC 之后,老年代空间:48143kb - 17913kb =30230kb

第五次minor GC 之后,老年代空间:62112kb - 17917kb =44195kb

The improvement rate of each minorGC in the old age

4481kb 第二次和第一次minorGC之间

12333kb 第3次和第2次minorGC之间

13408kb 第4次和第3次minorGC之间

13965kb 第5次和第4次minorGC之间

We can calculate:

每次minorGC 的平均提升为12211kb,约为12Mb

上图中,平均minorGC的频率为 213ms/次

提升率=12211kb/213ms=57kb/ms

老年代空间233Mb ,占满大概需要233*1024/57=4185ms 约为4.185s。

 

The expected worst frequency duration of FullGC can be estimated by the above two methods. You can adjust the size of the old generation to adjust the frequency of FullGC. Of course, if the duration of FullGC is too long to meet the worst delay requirement of the application, you need to Switch garbage disposals. Specifically how to switch, I will talk about it in the next article. For example, when switching to CMS, the tuning method for CMS will be slightly different.

 

V. Throughput Tuning

 

After the above-mentioned long tuning process, we finally came to the last step of tuning. This step tests the throughput of the above results and makes fine-tuning.

Throughput tuning is mainly based on the throughput requirements of the application. The application should have a comprehensive throughput indicator, which is derived based on the requirements and testing of the actual application. When the throughput of an application reaches or exceeds the expected throughput target, the entire tuning process can be completed successfully.

If the throughput target of the application still cannot be achieved after tuning, it is necessary to review the throughput requirements and assess whether the gap between the current throughput and the target is huge. It is necessary to consider from the entire application level, whether the design and goals are consistent, and re-evaluate the throughput goal.

For the garbage collector, the goal of performance tuning to improve throughput is to avoid or rarely occur FullGC or Stop-The-World Compressive Garbage Collection (CMS), because both methods can cause application throughput. reduce. Try to recycle as many objects as possible in the MinorGC phase to avoid objects being promoted too quickly to the old age.

 

6. Finally

According to a study conducted by Plumbr Corporation on the use of specific garbage collectors, the research data used 84,936 cases. In 13% of the cases where the garbage collector was explicitly specified, the concurrent collector (CMS) was used the most; however, the best garbage collector was not selected in most cases. This proportion is about 87%.

 

JVM tuning is a systematic and complicated task. At present, the automatic tuning under the JVM has been done relatively well. Some basic initial parameters can ensure that the general application runs relatively stable. For some teams, the program performance may be The priority is not high, the default garbage collector is enough. Tuning should be based on your own situation.

-----------------------------------------------------------------------------

If you want to see more interesting and original technical articles, scan and follow the official account.

Focus on personal growth and game development, and promote the growth and progress of the domestic game community.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324523325&siteId=291194637