JVM performance tuning methods and templates

With 1 million login requests per day, how to set JVM parameters with 8G memory can be roughly divided into the following 8 steps.

The first step is how to plan the capacity when the new system goes online?

1. Routine summary

Before any new business system goes online, it is necessary to estimate the server configuration and JVM memory parameters. This capacity and resource planning is not just arbitrarily estimated by the system architect. It needs to be estimated according to the business scenario where the system is located, and a system can be deduced. Run the model to evaluate indicators such as JVM performance and GC frequency. The following is a modeling step that I summarized based on Daniel's experience and my own practice:

  • Calculate how much memory space the objects created by the business system will occupy per second, and then calculate the memory footprint per second of each system under the cluster (object creation speed);

  • Set up a machine configuration, estimate the space of the new generation, and compare how often MinorGC is triggered under different new generation sizes;

  • In order to avoid frequent GC, it is possible to re-estimate how many machine configurations are needed, how many machines to deploy, how much memory space to give to the JVM, and how much space for the new generation;

  • According to this configuration, we can basically calculate the operation model of the entire system, how many objects are created per second, and become garbage after 1 second, how long the new generation will trigger a GC after the system has been running, and how often.

2. Routine combat: Take the login system as an example

Some students are still stunned when they see these steps, and what they say seems to be the same thing. I still don’t know how to do it in the actual project. Just talk without practicing fake moves, take the login system as an example to simulate the deduction process:

  • Assuming that there are 1 million login requests per day, and the login peak is in the morning, it is estimated that 100 login requests per second during the peak period;

  • Assume that 3 servers are deployed, and each machine processes 30 login requests per second. Assuming that a login request needs to be processed for 1 second, 30 login objects will be generated per second in the JVM new generation, and these objects will become garbage after 1 second after the request is completed;

  • A login request object assumes 20 fields, an object estimates 500 bytes, and 30 logins take up about 15kb. Considering RPC and DB operations, network communication, write library, and write cache can be expanded to 20-50 times in one operation, and hundreds of K to 1M data will be generated in about 1 second;

  • Assuming that 2C4G machines are deployed, 2G heap memory is allocated, and the new generation is only a few hundred M. According to the garbage generation speed of 1M/s, MinorGC will be triggered once in a few hundred seconds;

  • Assuming that 4C8G machines are deployed, 4G heap memory is allocated, and the new generation allocates 2G, so it takes several hours to trigger a MinorGC.

Therefore, it can be roughly inferred that a login system with 1 million requests per day, according to the 4C8G 3 instance cluster configuration, allocates 4G heap memory and 2G new generation JVM, which can guarantee a normal load of the system.

Basically, the resources of a new system are evaluated, so how much capacity and configuration each instance needs to build a new system, how many instances are configured in the cluster, etc., cannot be determined by patting the head and the chest.

The second step, how to choose a garbage collector

Throughput or response time?

First introduce two concepts - throughput and low latency.

吞吐量 = CPU在用户应用程序运行的时间 / (CPU在用户应用程序运行的时间 + CPU垃圾回收的时间)

响应时间 = 平均每次的GC的耗时

Usually, throughput priority or response priority is a dilemma in the JVM.

As the heap memory increases, the number of gcs that can be processed at one time increases, and the throughput increases; however, the time for one gc will become longer, resulting in longer waiting times for threads queued later; on the contrary, if the heap memory is small, the time for one gc is short, and the queuing time will be longer. The waiting thread waits for a shorter time, and the delay decreases, but the number of requests at a time becomes smaller (not absolutely consistent).

It is impossible to give consideration to both at the same time. It is a matter of trade-off whether to give priority to throughput or response.

CMS and G1

The current mainstream garbage collector configuration is that the new generation uses ParNew, the old generation uses the combination of CMS, or the G1 collector is completely used.

From the perspective of future trends, G1 is the officially maintained and more respected garbage collector.

business system:

  • Latency sensitive recommended CMS;

  • For large memory services that require high throughput, use the G1 recycler!

 How the CMS garbage collector works

CMS is mainly a collector for the old generation. The old generation is mark-clear. By default, it will perform a sorting algorithm after a FullGC algorithm to clean up memory fragments.

 

  • Advantages: Concurrent collection, focusing on "low latency". STW did not occur in the two most time-consuming phases, and the phases requiring STW were completed very quickly.

  • Disadvantages: 1. CPU consumption; 2. Floating garbage; 3. Memory fragmentation

  • Applicable scenarios: attach importance to server response speed, and require the shortest system pause time.

In short:

Business system, delay-sensitive recommended CMS;

For large memory services that require high throughput, use the G1 recycler!

 The third step, how to plan the proportion and size of each partition

First of all, the most important and core parameter of the JVM is to evaluate memory and allocation. The first step is to specify the size of the heap memory.

  • -Xms initial heap size
  • -Xmx maximum heap size

Second, you need to specify the size of the -Xmn new generation. This parameter is very critical and flexible.

  • -Xmn size of new generation 

Sun's official recommendation is 3/8 size, but it should be determined according to the business scenario. For stateless or light state services (now the most common business systems such as Web applications), the general new generation can even give 3 heap memory /4 size;

For stateful services (such as IM services, gateway access layer and other systems), the new generation can be set according to the default ratio of 1/3. If the service is stateful, it means that there will be more local cache and session state information resident in memory. This should be done by setting a larger space for the old generation to store these objects.

Finally, it is to set the -Xss stack memory size and set the single thread stack size.

 The default value is related to the JDK version and system, and generally defaults to 512~1024kb. If a background service has hundreds of resident threads, the stack memory will also occupy hundreds of M.

A generic JVM parameter template

Based on the ParNew+CMS recycler template of the 4C8G system (response priority), the size of the new generation can be flexibly adjusted according to the business!

-Xms4g
-Xmx4g
-Xmn2g
-Xss1m
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=10
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-Xloggc:gc.log

If the throughput of GC is preferred, G1 is recommended, based on the G1 recycler template of the 8C16G system:

The G1 collector already has a set of prediction and adjustment mechanisms, so our first choice is to trust it,

That is to adjust- XX:MaxGCPauseMillis=Nparameters, which is also in line with the purpose of G1 - to make GC tuning as simple as possible!

At the same time, do not explicitly set the size of the new generation by yourself (use the -Xmn or -XX:NewRatio parameter),

If the size of the new generation is artificially intervened, the parameter of target time will be invalidated.

-Xms8g
-Xmx8g
-Xss1m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=150
-XX:InitiatingHeapOccupancyPercent=40
-XX:+HeapDumpOnOutOfMemoryError
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-Xloggc:gc.log

For -XX:MaxGCPauseMillisspecific purposes, the setting of the parameters has a clear tendency: lowering ↓: lower latency, but frequent MinorGC, MixGC recycles less old areas, and increases the risk of Full GC. Increase ↑: Recycle more objects at a time, but the overall response time of the system will also be lengthened.

For specific InitiatingHeapOccupancyPercentpurposes, the effect of adjusting the parameter size is also different: lowering ↓: triggering MixGC earlier, wasting CPU. Increase ↑: Accumulate multiple generations of recycling regions, increasing the risk of FullGC.

Tuning summary

Comprehensive tuning ideas before the system goes online:

1. Business estimation: According to the expected concurrency and the average memory requirement of each task, then evaluate how many machines are needed to carry it, and what configuration each machine needs.

2. Capacity estimation: According to the task processing speed of the system, the size of the Eden and Surivior areas and the memory size of the old generation are allocated reasonably.

3. Recycler selection: For response priority systems, it is recommended to use the ParNew+CMS collector; for throughput priority, multi-core large memory (heap size ≥ 8G) services, it is recommended to use the G1 collector.

4. Optimization ideas: Let short-lived objects be recycled in the MinorGC phase (at the same time, the surviving objects after recycling are less than 50% of the Survivor area, which can be controlled and kept in the new generation). Long-lived objects enter the old age as soon as possible, and do not copy back and forth in the new generation; Minimize the frequency of Full GC to avoid the influence of FGC system.

5. So far, the tuning process summarized is mainly based on the test verification stage before going online, so we try to set the JVM parameters of the machine to the optimum before going online!

JVM tuning is just a means, but not all problems can be solved through JVM tuning. Most Java applications do not need JVM optimization. We can follow the following principles:

  • Before going online, you should consider setting the JVM parameters of the machine to the optimum;

  • Reduce the number of objects created (code level);

  • Reduce the use of global variables and large objects (code level);

  • Prioritize architecture tuning and code tuning, JVM optimization is a last resort (code, architecture level);

  • Analyzing the GC situation to optimize the code is better than optimizing the JVM parameters (code level);

 

Guess you like

Origin blog.csdn.net/h363659487/article/details/129880185