JVM-performance tuning-memory optimization

JVM tuning is a systematic and complex process, but we know that in most cases, we basically don't need to adjust JVM memory allocation, because some initialization parameters can already ensure the normal and stable operation of application services.

In the specific scenarios of application services, the performance caused by unreasonable JVM memory allocation is not as prominent as the memory overflow problem. Generally, if you don’t go deep into the performance indicators, it is difficult to find the hidden performance loss.

Pressure testing tool AB

The Ab(ApacheBench) testing tool is a testing tool provided by Apache, which is simple and easy to use, and is very useful when testing Web services.

ab is generally used on Linux.

The installation is very simple, just enter the yum -y install httpd-tools or (sudo apt-get install apache2-utils) command in the Linux system.

After the installation is successful, enter the ab command, you can see the following information:

The ab tool is very convenient to test the post get interface request. You can specify the number of requests, the number of concurrency, and request parameters through parameters.

Test the get request interface

ab -c 10 -n 100 http://www.test.api.com/test/login?userName=test&password=test

Test post request interface

ab-c 10-n 100 -p 'post.txt' -T 'application/x-www-form-urlencoded' 'http://test.api.com/test/register'

post.txt is the document storing post parameters, the storage format is as follows

usernanme=test&password=test&sex=1

The meaning of the parameters:

-n: total number of requests (the minimum default is 1);

-c: Concurrency times (the minimum default is 1 and cannot be greater than the total number of requests, for example: 10 requests, 10 concurrent, which is actually 1 request for 1 person);

-p: post parameter document path (-p and -T parameters should be used together);

-T: header content type (remember the capital English letter T here);

In the output, performance index reference

Requests per second: Throughput rate, which refers to the number of requests processed per unit time under a certain number of concurrent users;

Time per request: The above is the average user request waiting time, which refers to the time it takes to process all the requests / (total number of requests / number of concurrent users);

Time per request: The following is the average request processing time of the server, which refers to the time it takes to process all requests / total number of requests;

Percentage of the requests served within a certain time: The time distribution of requests per second refers to the distribution of the length of time of each request in the entire request. For example, 50% of the requests served within 8ms and 66% of the requests Within 10ms, it means that 16% of the requests are between 8ms~10ms.

JVM heap memory allocation

Tuning case of JVM memory allocation

A snap-up interface in a high-concurrency system, 5W concurrent requests at peak times, and each request will generate 20KB objects (including orders, users, coupons and other object data).

We can simulate a scenario where a large number of objects are generated by 10,000 concurrent requests through an interface that creates a 1MB object concurrently. The specific code is as follows:

AB pressure test

Stress test the application service to simulate the response of the service under different numbers of concurrent users:

1. 10 concurrent users/100,000 requests (total)

2. 100 concurrent users/100,000 requests (total)

3. 1,000 concurrent users/100,000 requests (total)

ab -c 10 -n 100000 http://127.0.0.1:8080/jvm/heap

ab -c 100 -n 100000 http://127.0.0.1:8080/jvm/heap

ab -c 1000 -n 100000 http://127.0.0.1:8080/jvm/heap

server information

I started a Linux virtual machine with 2G of allocated memory and 2 processors. The specific information is as follows

GC monitoring

There is another sentence, no monitoring is not tuning, so we need to monitor it. In the JVM, we use the jstat command to monitor the GC situation of the JVM.

Statistics of GC.

jstat-gc 8404 5000 20 | awk'{print $13,$14,$15,$16,$17} Only count the data we need.

Heap space monitoring

In the case that the JVM heap memory size is not configured by default, the JVM configures the current memory size according to the default value.

We can use the following command to view the default value of the heap memory configuration:

java -XX:+PrintFlagsFinal -version | grep HeapSize

Test project started

Using jmap -heap <pid> this way, we see the heap space occupied by this JVM application

Pressure test results

10 concurrent users / 100,000 requests ( total )

Use AB for stress test:

ab -c 10 -n 100000 http://127.0.0.1:8080/jvm/heap

Statistics GC situation

jstat -gc 9656 5000 20 | awk '{print $13,$14,$15,$16,$17}

Test Results

User throughput is greater than about 1426/sec
JVM server average request processing time is about 0.7ms
There were more than 2700 YGCs on the JVM server, which took 15 seconds, and 45 FGCs, which took about 2.3 seconds.

A GC took 17 seconds

10 0 concurrent users / 10 million requested amount ( total )

Use AB for stress test:

ab -c 100 -n 100000 http://127.0.0.1:8080/jvm/heap

Test Results

User throughput is greater than about 1262/sec
JVM server average request processing time is about 0.8ms
More than 2,700 YGCs occurred on the JVM server, which took 30 seconds, and there were 56 FGCs, which took about 3 seconds. Together, the GC took 33 seconds.

10 00 concurrent users / 10 Wan requested amount ( total )

Use AB for stress test:

ab -c 1000 -n 100000 http://127.0.0.1:8080/jvm/heap

Test Results

User throughput is greater than about 1145/second Ø JVM server average request processing time is about 0.8ms
There were more than 2700 YGCs on the JVM server, which took 38 seconds, and 47 FGCs, which took about 3 seconds. Together, the GC took 42 seconds.

Result analysis

GC frequency

High-frequency FullGC will bring a very large performance consumption to the system. Although MinorGC is much better than FullGC, too much MinorGC will still bring pressure to the system.

Heap memory size

Heap memory is divided into young generation memory and old generation memory. Insufficient heap memory will increase MinorGC and affect system performance.

Throughput

Frequent GC will cause thread context switching, increase the performance overhead of the system, thereby affecting the thread requests processed each time, and ultimately lead to a decrease in system throughput. (STW will cause the thread to hang, causing the thread to give up the CPU execution rights, and other threads occupy the CPU to continue execution. This behavior is context switching).

Delay

The GC duration of the JVM also affects the response time of each request.

Tuning plan

Option one (increase heap memory)

10 concurrent users / 100,000 requests ( total )

Adjust the heap memory space to reduce the number of GCs. Heap memory is basically used up. And there are a large number of MinorGC and FullGC, which means that our heap memory is seriously insufficient. At this time we need to adjust the maximum space of the heap memory.

Increase the heap memory to 1.5G

java -jar -Xms1500m -Xmx1500m jvm-1.0-SNAPSHOT.jar

Use AB for stress test:

ab -c 10 -n 100000 http://127.0.0.1:8080/jvm/heap

The user throughput is greater than about 1205/second Ø The average request processing time of the JVM server is about 0.83ms

right.

800 YGCs occurred on the JVM server, which took 33 seconds, and there was 1 FGC, which took about 1 second, plus one

GC takes 34 seconds

10 0 concurrent users / 10 million requested amount ( total )

Use AB for stress test:

ab -c 100 -n 100000 http://127.0.0.1:8080/jvm/heap

The user throughput is greater than about 989 per second. The average request processing time of the JVM server is about 1.01ms.
There were 800 YGCs on the JVM server, which took 46 seconds, and 8 FGCs, which took about 6 seconds. Together, the GC took 52 seconds.

10 00 concurrent users / 10 Wan requested amount ( total )

Use AB for stress test:

ab -c 1000 -n 100000 http://127.0.0.1:8080/jvm/heap

66 seconds, there are 8 FGCs, about 9 seconds, and GC takes 75 seconds.

User throughput is greater than about 749/sec
The average request processing time of the JVM server is about 1.3ms, and 800 YGCs have occurred on the JVM server, which is time-consuming

We may be surprised with the above experiment. After this optimization, both throughput and GC time have become longer?

What is the reason for this? Stay confused and continue to test.

Option 2 (Adjust the size of the Cenozoic and the proportion of Eden and Servivor areas)

The heap memory remains unchanged (1.5 G), and 1 G is allocated in the Eden area. Eden, Serivvor's ratio adjustment bit is fixed at 1:8. Continue to test.

10 concurrent users / 100,000 requests ( total )

java -jar -Xms1500m -Xmx1500m -Xmn1000m -XX:SurvivorRatio=8 jvm-1.0-SNAPSHOT.jar

Use AB for stress test:

ab -c 10 -n 100000 http://127.0.0.1:8080/jvm/heap

The user throughput is greater than about 1780/sec.
The average request processing time of the JVM server is about 0.56ms. The JVM server has 400 YGCs and it takes 5.8 seconds, and there are 2 FGCs which take about 0.1 seconds. Together, the GC takes 6 seconds.

10 0 concurrent users / 10 million requested amount ( total )

java -jar -Xms1500m -Xmx1500m -Xmn1000m -XX:SurvivorRatio=8 jvm-1.0-SNAPSHOT.jar

Use AB for stress test:

ab -c 100 -n 100000 http://127.0.0.1:8080/jvm/heap

User throughput is greater than around 1927/sec
The average request processing time of the JVM server is about 0.51ms, and more than 400 YGCs have occurred on the JVM server.

It took 11 seconds. Without FGC, GC took 11 seconds.

10 00 concurrent users / 10 Wan requested amount ( total )

java -jar -Xms1500m -Xmx1500m -Xmn1000m -XX:SurvivorRatio=8 jvm-1.0-SNAPSHOT.jar

Use AB for stress test:

ab -c 1000 -n 100000 http://127.0.0.1:8080/jvm/heap

User throughput is greater than about 1657/sec
The average request processing time of the JVM server was about 0.6ms. The VM server had more than 400 YGCs, which took 14 seconds, and one FGC took about 3 seconds . Together, the GC took 17 seconds .

After our second tuning, performance has been significantly enhanced. What is the principle in this?

Memory optimization summary

Under normal circumstances, in a high-concurrency business scenario, a relatively large heap space is required, but under the default parameters, the heap space will not be large. So we need to make adjustments.

But don't simply adjust the total size of the heap, but adjust the ratio of the young generation to the old generation, as well as the ratio of the Eden area, the From area, and the To area.

Therefore, in our above-mentioned test, adjust the plan two, and get the best result. In the three test cases, there are very good performance indicators, and the GC time-consuming relatively control is also better.

Regarding the first adjustment scheme, it is simply to increase the heap space. The ratio inside is not suitable for high-concurrency scenarios, and instead causes the heap space to become larger. The entire heap has expanded by a factor of two, but our Eden area has expanded by a few tenths. The main trigger condition of our new generation garbage collection is that the Eden area is full. Therefore, this optimization did not significantly reduce the number of GCs, but each GC requires a larger heap space for retrieving objects, so GC takes longer. (When I heard it for the first time here, I had a doubt. Isn’t the Eden area not increasing much? Why the scanning time has increased? Friends who have this doubt prove that the knowledge has not been fully linked. Let’s think about it and judge Whether the object is alive or not, then think about why the problem of cross-generation references should be solved? Yes, it is to optimize our scanning and marking cost! We judge the garbage is based on the reachability analysis algorithm, so the object in the Eden area may not be Created by the new GCRoots, it may also be our Serivor object, or an object of the old generation. Therefore, the scan time is linked to the size of the entire heap!)

Option 2: Adjust to a large young generation and a small old generation. The reason is that most short-term objects can be recycled as much as possible, and the medium-term objects can be reduced, while the old generation should store long-term surviving objects.

Since the space of the Cenozoic is small, the Eden area will be filled up quickly, which will cause frequent Minor GC. Therefore, we can reduce the frequency of Minor GC by increasing the space of the Cenozoic.

A single Minor GC time is composed of two parts: T1 (scanning the young generation) and T2 (copying live objects).

Default : the survival time of an object in the Eden area is 500ms, and the time interval of Minor GC is 300ms, because the object survival time>interval time, then under normal circumstances, Minor

The time of GC is: T1+T2.

Solution 1 : The entire heap space is increased, but the new generation does not increase much. The survival time of the object in the Eden area is 500ms, and the Minor GC time may be expanded to 400ms, because this object is stored

Live time>interval time, then under normal circumstances, the time of Minor GC is: T1*1.5 (Eden area is enlarged) + T2

Solution 2 : When we increase the space of the young generation, the time interval of the Minor GC may be expanded to 600ms. At this time, an object that survives 500ms will be recycled in the Eden area, and there is no copy of the live object at this time, so The time when Minor GC reoccurs is: T1*2 (more space) + T2*0

It can be seen that after the expansion, T1 is increased in Minor GC, but the time of T2 is saved.

In the JVM, the cost of copying objects is much higher than the cost of scanning. If there are more long-lived objects in the heap memory, increasing the young generation space at this time will increase the Minor GC time. If there are many short-term objects in the heap, the new generation will be expanded and the time of a single Minor GC will not increase significantly. Therefore, the time of a single Minor GC depends more on the number of surviving objects after the GC, rather than the size of the Eden area.

This explains why in the previous memory adjustment scheme, the performance of scheme one was worse, but the performance of scheme two increased significantly.

Recommended strategy

1. New generation size selection·

Response time priority application: Set as large as possible until it is close to the minimum response time limit of the system (selected according to the actual situation). In this case, the frequency of the new generation collection is also the smallest. At the same time, reduce the objects that reach the old generation . (Scene with fewer large objects and fewer undead objects)
Throughput-first applications: Set as large as possible, which may reach the level of Gbit. Because there is no requirement for response time, garbage collection can be performed in parallel, which is generally suitable for applications with 8 CPU or more.
Avoid setting too small. When the young generation is set too small, it will cause: 1. MinorGC times are more frequent 2. It may cause the MinorGC object to directly enter the old generation. If the old generation is full at this time, FullGC will be triggered.

2. Old age size selection

Response time priority applications: The old age uses concurrent collectors, so its size needs to be set carefully. Generally, some parameters such as concurrent session rate and session duration should be considered. If the heap setting is small, it may cause memory fragmentation, high recovery frequency and The application is suspended and the traditional mark removal method is used;

If the heap is large, it will take a long time to collect. The most optimized solution generally needs to be obtained by referring to the following data:

Concurrent garbage collection information, the number of concurrent collections in the persistent generation, traditional GC information, and the proportion of time spent on recycling in the young and old generations.

Throughput-first applications: Generally, throughput-first applications have a large young generation and a small old generation. The reason is that most short-term objects can be recycled as much as possible, and the medium-term objects can be reduced, so that the old generation Try to store long-lived objects.