Remember the off-heap memory leak troubleshooting process

Phenomenon

Recently, services A and B were merged into one service. Two days after the merger and going online, the server started to alarm and the memory was full.
Use the top command on the server to check that the jvm process has occupied 11g of memory (referring to rss).
image.png
Among them
, rss actually occupies memory, and vsz virtual memory
RSS (Resident Set Size) resident memory set size indicates how much memory the corresponding process occupies in RAM, and does not include the virtual memory occupied in SWAP. Even the memory size in memory using shared libraries is included in the calculation, including the complete memory in the stack and heap.
VSZ (Virtual Memory Size) indicates the virtual memory size and indicates all memory that the process can access, including swapped memory and shared library memory.
However, through the jstat -gc, jmap -heap commands to check the heap memory, and related startup parameters, we found that the upper limit of the heap memory is set to 4g. So where does this extra memory come from.

Troubleshooting

Tool troubleshooting

Although I first guessed that it was an issue with off-heap memory, I still observed it by dumping heap files, using mat analysis, and connecting to jconsole. I found that the 4g heap memory is not full, and the heap memory is completely fine.
Use arthas to view memory

curl -O https://arthas.aliyun.com/arthas-boot.jar

It is found that the in-heap and off-heap memory statistics in arthas are much smaller than the memory occupied by the jvm process, but the non-heap seen here only includes code_cache and metaspace, nio-related direct, and mapped parts. Is it possible that it is other non-heap Is there a problem with the part?
Here is a picture I found randomly. There is no screenshot. If you look at the memory display effect of Arthas, you can see the maximum value of non-heap memory and the total memory currently occupied.
image.png

Native Memory Tracking

You need to add startup parameters to the jvm (note that adding this parameter will cause the jvm performance to drop by 5%-10%. You need to remove this parameter after troubleshooting the problem.

-XX:NativeMemoryTracking=detail

heap: Heap memory, that is, the maximum heap size memory limited by -Xmx.
class: The loaded class and method information is actually metaspace, which consists of two parts: One is
metadata, which is maximum limited by -XX:MaxMetaspaceSize and has no limit by default. Memory will always be requested.
Metaspace is divided into two areas: non-class part and class part.

  • class part: To store Klass objects, a continuous memory of no more than 4G is required.
  • non-class part: contains all other metadata

thread: Threads and thread stacks occupy memory. The size of each thread stack is limited by -Xss, but the total size is not limited. Generally, one thread occupies 1M code: The code after JIT instant compilation (C1 C2
compiler optimization) occupies memory, which is affected by -XX:ReservedCodeCacheSize limits
gc: Garbage collection occupies memory, such as the CardTable required for garbage collection, the number of marks, area division records, and mark GC Root, etc., all require memory. This is unrestricted and generally not very large. Parallel GC will not occupy much memory, G1 will occupy up to about 10% of the heap memory, and ZGC will occupy up to about 15~20% of the heap memory. However, these are constantly being optimized. (Note that it does not occupy heap memory, but the size is related to the occupancy of objects in the heap memory)
compiler: C1 C2 The memory occupied by the code and tags of the compiler itself. This is not limited and is generally not very large
internal: command For line parsing, the memory used by JVMTI is not limited and is generally not very large. Direct memory will also be counted here. You can limit the maximum value by XX:MaxDirectMemorySize. The default is the same as -Xmx (this is why the maximum heap memory of jvm is usually set to half of the memory in the machine). symbol
: the size occupied by the constant pool. The string constant pool is limited by the number of -XX:StringTableSize. Limit, the total memory size is not limited
Native Memory Tracking: The memory size occupied by the memory collection itself. If the collection is not turned on (then you cannot see this)
Arena Chunk: All memory allocated through the arena method, this is not limited, generally It won’t be big.
Keep observing through commands.

jcmd pid VM.native_memory
24927:

Native Memory Tracking:

Total: reserved=6484560KB, committed=5263488KB
-                 Java Heap (reserved=4194304KB, committed=4194304KB)
                            (mmap: reserved=4194304KB, committed=4194304KB)

-                     Class (reserved=1181067KB, committed=149859KB)
                            (classes #25039)
                            (malloc=3467KB #51179)
                            (mmap: reserved=1177600KB, committed=146392KB)

-                    Thread (reserved=378725KB, committed=378725KB)
                            (thread #368)
                            (stack: reserved=377092KB, committed=377092KB)
                            (malloc=1203KB #1853)
                            (arena=430KB #734)

-                      Code (reserved=262843KB, committed=81171KB)
                            (malloc=13243KB #18122)
                            (mmap: reserved=249600KB, committed=67928KB)

-                        GC (reserved=227905KB, committed=227905KB)
                            (malloc=39489KB #32009)
                            (mmap: reserved=188416KB, committed=188416KB)

-                  Compiler (reserved=401KB, committed=401KB)
                            (malloc=271KB #619)
                            (arena=131KB #3)

-                  Internal (reserved=190870KB, committed=190870KB)
                            (malloc=190838KB #47664)
                            (mmap: reserved=32KB, committed=32KB)

-                    Symbol (reserved=30255KB, committed=30255KB)
                            (malloc=27114KB #288969)
                            (arena=3141KB #1)

-    Native Memory Tracking (reserved=7206KB, committed=7206KB)
                            (malloc=251KB #3796)
                            (tracking overhead=6955KB)

-               Arena Chunk (reserved=2792KB, committed=2792KB)
                            (malloc=2792KB)

-                   Unknown (reserved=8192KB, committed=0KB)
                            (mmap: reserved=8192KB, committed=0KB)

It is found that the above items (committed) are basically consistent with the memory viewed by the tom command.
Two indicators found abnormal
class and internal
observation found that the class space is the largest. After running for several hours, it increased from 100m to 500m and is still increasing.
Internal is also increasing
, so naturally I thought of setting the upper limit through parameters first, so I added two parameters and prepared to continue observing.

-XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=512m

Solve the problem of excessive meta space usage

Seeing that the class area is highly occupied, I first thought about whether there is an operation to dynamically load classes somewhere. For example, cglib and dynamic proxy are often used. So the first person to be suspected was the Aviator in the project. Scripts can be compiled dynamically for execution.
So I wrote a piece of code locally for testing.

for(int i =0 ;i<=10000;i++) {
    
    
Expression compile = instance.compile("textHandler.textLabelStyleAddAttribute(text,'max-width:100%', seq.array(java.lang.String, 'img', 'video'))",false);
  Map<String, Object> parameterMaps = new HashMap<>();
        
}

Sure enough, I monitored this code and found that the class space memory continued to increase. In fact, arthas could also see the changes in the meta space.
So I checked the relevant information and found that Aviator does have a memory leak problem.



"Each expression will generate an anonymous class (the anonymous class corresponding to Java lambda), so if your expressions are dynamically generated, a large number of anonymous classes may be generated, and full gc will be triggered if the metaspace is filled." , it turns out that the pit was finally found here.
** Note: metaspece and old generation**
In jdk1.8, the permanent generation no longer exists. The stored class information, compiled code data, etc. have been moved to the metaspace (MetaSpace). The metaspace is not in the heap memory. , but directly occupies local memory (NativeMemory).
The essence of the metaspace is similar to that of the permanent generation. They are both implementations of the method area in the JVM specification. However, the biggest difference between the metaspace and the permanent generation is that the metaspace is not in the virtual machine, but uses local memory.


Our project does not set a maximum limit on the metaspace, so as the running time increases, the memory usage of the metaspace is getting higher and higher. In fact, fullgc will also clean up direct memory, metaspace, etc. And because the garbage collector we use is also G1. In the G1 garbage collector, the best optimization state is to continuously adjust the partition space to avoid full gc, which can greatly improve throughput . Under normal circumstances, G1 will not trigger fullgc, causing these memories to be never released and continue to increase.
Modify according to the solution, compile and add cache parameters. Caching to avoid constantly re-reading compiled scripts
[Solution]
Aviator usually uses the method of directly executing expressions. In fact, Aviator does the compilation and execution work for you behind the scenes. Of course, you can compile the expression yourself first, return a compiled result, and then pass in different envs to reuse the compiled results and improve performance. This is the more recommended way of use. Example:

for(int i =0 ;i<=10000;i++) {
    
    
Expression compile = instance.compile("textHandler.textLabelStyleAddAttribute(text,'max-width:100%', seq.array(java.lang.String, 'img', 'video'))",true);
  Map<String, Object> parameterMaps = new HashMap<>();
        
}

Unrestricted internal

After making the above changes, I posted them online and observed that the class space was indeed stable and 100M+.
Just when I thought the problem was solved, I found that the jvm memory usage was still getting higher and higher as time went by.
Using VM.native_memory analysis
, I found that the internal was still constantly increasing. increase, I didn’t panic at this time, because I had already added parameters to limit the maximum direct memory. At worst, the maximum value is reached and a full gc is triggered. It is still possible to limit the increase of memory.
After observing, I was dumbfounded. Internal memory is not limited by parameters at all. It gradually accounts for 1g of memory and is still increasing.
(Actually, this should have been discovered when using arthas to view the memory, but the direct occupied by arthas is very small. The internal seen on native_memory is very large, which made me analyze the memory of arthas. Doubtful)
Then it means that the internal memory is not direct memory.
At this time, I added all the parameters that JVM can add to limit memory, but I can't limit it.
It can be said that Guizhou Donkey has run out of skills.

useful stupid method

Fortunately, in testing a commonly used business process online, we reproduced the scenario where internals are constantly increasing.
Then the next step is to continuously troubleshoot this code process.
The troubleshooting method is also very stupid, which is to test the suspected code individually, or comment out other codes and observe the code that may cause the internal memory to increase. , after half a day of continuous trying (very painful, because there are many internal rising factors), I finally found it.

private static final AviatorEvaluatorInstance instance = AviatorEvaluator.newInstance();

instance.addStaticFunctions("authorHandler", AuthorHandler.class);

It's still related to Aviator, and I checked its internal source code.
In ClassMethodFunction

 public ClassMethodFunction(Class<?> clazz, boolean isStatic, String name, String methodName, List<Method> methods) throws IllegalAccessException, NoSuchMethodException {
    
    
        this.name = name;
        this.clazz = clazz;
        this.isStatic = isStatic;
        this.methodName = methodName;
        if (methods.size() == 1) {
    
    
            this.handle = MethodHandles.lookup().unreflect((Method)methods.get(0)).asFixedArity();
            this.pTypes = ((Method)methods.get(0)).getParameterTypes();
            if (!isStatic) {
    
    
                Class<?>[] newTypes = new Class[this.pTypes.length + 1];
                newTypes[0] = this.clazz;
                System.arraycopy(this.pTypes, 0, newTypes, 1, this.pTypes.length);
                this.pTypes = newTypes;
            }

            if (this.handle == null) {
    
    
                throw new NoSuchMethodException("Method handle for " + methodName + " not found");
            }
        } else {
    
    
            this.methods = methods;
        }

    }

In the end, it was tested that a piece of reflection code would cause internal

   MethodHandle methodHandle = MethodHandles.lookup().unreflect(AuthorHandler.class.getMethods()[0]).asFixedArity();

Although I don’t understand why, I checked the same memory leak problem on the Internet.
jvm bug: https://bugs.openjdk.org/browse/JDK-8152271
is the above bug. Frequent use of MethodHandles-related reflections will cause expired objects to not be recycled, and will also cause YGC scan time to increase, leading to performance degradation.

problem solved

Since jvm 1.8 has made it clear that this issue will not be dealt with in 1.8 and will be refactored in java. But we can't upgrade to java in a short time. So there is no way to fix it by directly upgrading the JVM. Since the problem is frequent use of reflection, we considered adding a cache to reduce the frequency to solve the problem of performance degradation and memory leaks.
There is a problem with the usage of addStaticFunctions in our project.

instance.addStaticFunctions("authorHandler", AuthorHandler.class);

Before the change,
addStaticFunctions was added every time the script was executed, causing memory leaks all the time. But there is a problem with this way of writing because it is enough to execute addStaticFunctions once during initialization instead of executing it every time. Therefore adjustments were made

public static Object execute(String script, Object parameters) {
    
    
        log.info("aviator script :{} parameters:{}", script, parameters);
        try {
    
    
            instance.addStaticFunctions("authorHandler", AuthorHandler.class);
            instance.addStaticFunctions("categoryHandler", CategoryHandler.class);
            instance.addStaticFunctions("contentTypeHandler", ContentTypeHandler.class);
            instance.addStaticFunctions("coverHandler", CoverHandler.class);
            instance.addStaticFunctions("listHandler", ListHandler.class);
            instance.addStaticFunctions("localDateTimeHandler", LocalDateTimeHandler.class);
            instance.addStaticFunctions("objectHandler", ObjectHandler.class);
            instance.addStaticFunctions("tagHandler", TagHandler.class);
            instance.addStaticFunctions("textHandler", TextHandler.class);
            instance.addStaticFunctions("videoHandler", VideoHandler.class);
            instance.addStaticFunctions("keywordHandler", KeywordHandler.class);
            instance.addStaticFunctions("captionHandler", CaptionHandler.class);

            Expression compile = instance.compile(script, true);
            Map<String, Object> parameterMaps = new HashMap<>();
            if (parameters instanceof Map) {
    
    
                parameterMaps = (Map<String, Object>) parameters;
            } else {
    
    
                List<String> variables = compile.getVariableNames();
                for (String v : variables) {
    
    
                    parameterMaps.put(v, parameters);
                }
            }
            Object execute = compile.execute(parameterMaps);
            log.info("aviator execute: {}", execute);
            return execute;
        } catch (IllegalAccessException | NoSuchMethodException e) {
    
    
            log.error("aviator 脚本执行失败", e);
            throw new AviatorScriptBuildException(String.format("aviator execute error script:%s parameters:%s", script, parameters));
        } catch (ExpressionSyntaxErrorException e) {
    
    
            log.error(e.getMessage());
            throw new AviatorScriptBuildException("脚本编译失败");
        } catch (Exception e) {
    
    
            log.error(e.getMessage());
            throw new AviatorScriptExecuteException(e.getMessage());
        }
    }

adjusted

@Slf4j
public class AviatorUtil {
    
    


    public static final AviatorEvaluatorInstance instance = AviatorEvaluator.newInstance();

    static {
    
    
        try {
    
    
            instance.addStaticFunctions("authorHandler", AuthorHandler.class);
            instance.addStaticFunctions("categoryHandler", CategoryHandler.class);
            instance.addStaticFunctions("contentTypeHandler", ContentTypeHandler.class);
            instance.addStaticFunctions("coverHandler", CoverHandler.class);
            instance.addStaticFunctions("listHandler", ListHandler.class);
            instance.addStaticFunctions("localDateTimeHandler", LocalDateTimeHandler.class);
            instance.addStaticFunctions("objectHandler", ObjectHandler.class);
            instance.addStaticFunctions("tagHandler", TagHandler.class);
            instance.addStaticFunctions("textHandler", TextHandler.class);
            instance.addStaticFunctions("videoHandler", VideoHandler.class);
            instance.addStaticFunctions("keywordHandler", KeywordHandler.class);
            instance.addStaticFunctions("captionHandler", CaptionHandler.class);
        } catch (IllegalAccessException | NoSuchMethodException e) {
    
    
            log.error("aviator 脚本执行失败", e);
            throw new AviatorScriptBuildException("脚本初始化失败");
        }
    }

    /**
     * 执行 Aviator 脚本
     *
     * @param script     脚本
     * @param parameters 参数
     * @return 执行结果
     * @see <a href="https://www.yuque.com/boyan-avfmj/aviatorscript/fycwgt">编译和执行</a>
     * @see <a href="https://www.yuque.com/boyan-avfmj/aviatorscript/ugbmqm#Pikc4">未初始化全局变量</a>
     */
    public static Object execute(String script, Object parameters) {
    
    
        log.info("aviator script :{} parameters:{}", script, parameters);
        try {
    
    

            Expression compile = instance.compile(script, true);
            Map<String, Object> parameterMaps = new HashMap<>();
            if (parameters instanceof Map) {
    
    
                parameterMaps = (Map<String, Object>) parameters;
            } else {
    
    
                List<String> variables = compile.getVariableNames();
                for (String v : variables) {
    
    
                    parameterMaps.put(v, parameters);
                }
            }
            Object execute = compile.execute(parameterMaps);
            log.info("aviator execute: {}", execute);
            return execute;
        }  catch (ExpressionSyntaxErrorException e) {
    
    
            log.error(e.getMessage());
            throw new AviatorScriptBuildException("脚本编译失败");
        } catch (Exception e) {
    
    
            log.error(e.getMessage());
            throw new AviatorScriptExecuteException(e.getMessage());
        }
    }
}

At this point, the problem of off-heap memory leakage has been completely solved. It can be said that there are pits after pits.

Guess you like

Origin blog.csdn.net/qq_37436172/article/details/132006622