How to reduce long GC pauses?

How to reduce long GC pauses?

Garbage collection is very important, but if it is not handled well, it will become a performance killer. Use the following steps to ensure that the GC pause time is the least and shortest, generally less than 50ms.

A long GC pause is very unfriendly to the application, not only affects the service level (SLA), but also the user experience will decrease, causing immeasurable damage to the service of the core application.

The following analysis and solutions for various reasons

1. Create objects at a high rate

If the application's object creation rate is high, then in order to keep up with it, the garbage collection rate will also increase. A high garbage collection rate will also increase the GC pause time . Therefore, optimizing the application to create fewer objects is to reduce GC pauses. It is an effective strategy, but it is a time-consuming work and requires code modification, which is not feasible. However, if time is sufficient, it is still worth doing. You can check the object creation rate through JVisualVM or upload the GC log to GCEasy for online analysis. The following information can be obtained:

  • What objects were created?
  • What is the rate at which these objects are created?
  • How much space do they occupy in memory?
  • Who is creating them?

Always try to optimize the object that takes up the largest amount of memory.

Insert picture description here

2. The young generation lacks space

When the young generation is too small, the object will enter the old generation prematurely. Collecting garbage from the old generation takes more time than collecting garbage from the young generation. Therefore, increasing the size of the young generation may reduce the long GC pause, which can be set The JVM parameter increases the size of the young generation.

**-Xmn** :指定年轻代的大小。**-XX:NewRatio** :指定年轻代相对于老年代的大小比例。例如,设置 -XX:NewRatio=2 表示年轻代与老年代之间的比率为 1:2。年轻代的大小将是整个堆的 1/3。因此,如果堆大小为 2 GB,则年轻代大小将为 2G / 3 = 667 MB

3. Choose GC algorithm

The GC algorithm also has a great impact on the GC pause time. If you are a GC expert or plan to become one (or someone in your team is a GC expert), you can adjust the GC parameter configuration to get the best GC pause time. If you don't have a lot of GC expertise, then I recommend using the G1 GC algorithm because it has the ability to automatically adjust . In G1, you can use the system property -xx: MaxGCPauseMillis to set the expected maximum pause time of the GC. E.g:

-XX:MaxGCPauseMillis=200

Follow the example above. The maximum GC pause time is set to 200ms, which is a goal, and JVM will try its best to achieve it.

4. The process uses Swap

Sometimes due to insufficient physical memory (RAM), the operating system may swap out the data that the application program does not use temporarily from the memory. This swap action is very expensive because it needs to access the disk, which is much slower than physical memory access.

It is recommended that in a production environment, any important application should not be exchanged. When the process uses Swap, GC will take a long time to complete. The following script is from StackOverflow (thanks to the author). When the script is executed, it will display all the processes that are swapping. Please make sure that your application process does not use Swap.

#!/bin/bash 
# Get current swap usage for all running processes
# Erik Ljungstrom 27/05/2011
# Modified by Mikko Rantalainen 2012-08-09
# Pipe the output to "sort -nk3" to get sorted output
# Modified by Marc Methot 2014-09-18
# removed the need for sudo

SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d -regex "^/proc/[0-9]+"`
do
    PID=`echo $DIR | cut -d / -f 3`
    PROGNAME=`ps -p $PID -o comm --no-headers`
    for SWAP in `grep VmSwap $DIR/status 2>/dev/null | awk '{ print $2 }'`
    do
        let SUM=$SUM+$SWAP
    done
    if (( $SUM > 0 )); then
        echo "PID=$PID swapped $SUM KB ($PROGNAME)"
    fi
    let OVERALL=$OVERALL+$SUM
    SUM=0
done
echo "Overall swap used: $OVERALL KB"

If you find that the process uses the swap partition, you can do one of the following:

  • Allocate more physical memory
  • Reduce the number of processes running on the server so that it can free up memory (RAM)
  • Reduce the heap size of the application (I don't recommend this, it may cause other side effects, but it may solve your problem)

5. Adjust the number of GC threads

For each GC event reported in the GC log, the user, system, and actual execution time are printed. E.g:

[Times: user=25.56 sys=0.35, real=20.48 secs]

If in a GC event, you always notice that the real time is not significantly less than the user time, then it may indicate that there are not enough GC threads. Consider increasing the number of GC threads. Assuming that the user time is 25s and the GC thread count is configured to 5, then the real should be close to 5s (because 25s/5=5s).

Warning: Adding too many GC threads will consume a lot of CPU and thus consume the resources of the application. Therefore, before increasing the number of GC threads, sufficient testing is required.

6. Background I/O activity

If there is a lot of file system I/O activity (that is, a lot of read and write operations occur), it may also cause a long GC pause. This heavy file system I/O activity may not be caused by the application. It may be caused by another process running on the same server. But it will still cause the application to suffer a long GC pause.

When there is severe I/O activity, you will notice that the real time is significantly higher than the user time. E.g:

[Times:user=0.20 sys=0.01, real=18.45 secs] 

When this happens, here are some possible solutions:

  • If high I/O activity is caused by the application, then optimize it.
  • Eliminate processes that cause high I/O activity on the server.
  • Move the application to another server with less I/O activity.

Tip: How to monitor I/O activity

In Unix-like systems, you can use the SAR command (system activity report) to monitor I/O activity. E.g:

sar -d -p 1

The above command will report the statistics of read/sec and write/sec every 1 second. For more details about the SAR command, you can refer to the relevant information yourself.

7. System.gc() call

When the System.gc() or Runtime.getRuntime().gc() method is called, it will cause a stop-the-world Full GC. During Full GC, the entire JVM is frozen (that is, no user activity will be performed during this period). System.gc() calls generally come from the following situations:

1. The developer may explicitly call the System.gc() method. 2. Third-party libraries, frameworks, and sometimes even application servers used. Any one of them may call the System.gc() method. 3. It can also be triggered from external tools (such as VisualVM) by using JMX. 4. If your application is using RMI, then RMI will periodically call System.gc(). This call interval can be configured using the following system properties:

-Dsun.rmi.dgc.server.gcInterval=n
-Dsun.rmi.dgc.client.gcInterval=n

Evaluate whether an explicit call to System.gc() is absolutely necessary. If you don’t need it, please delete it. On the other hand, you can force the System.gc() call -XX:+DisableExplicitGC to be disabled by passing JVM parameters .

Prompt: How to know if System.gc() is called

Upload GC logs to the general GC log analyzer tool GCeasy. This tool has a section called GCAuses. If GC activity is triggered due to a System.gc() call, this section will report the situation. Take a look at the figure below (taken from the report directory generated by GCeasy), which shows that System.gc() is done four times in the life cycle of this application.

Warning : All the above strategies can only be rolled out to production after thorough testing and analysis. All strategies may not necessarily apply to your application. If used improperly, it may lead to negative results.

Reference: GCEasy original

Guess you like

Origin blog.csdn.net/qq_40093255/article/details/115380670