Java development, memory leak investigation is commonplace

what is a memory leak

Memory leak: Objects are no longer used by the application, but the garbage collector cannot remove them because they are still being referenced.

In Java, a memory leak is the existence of some allocated objects. These objects have the following two characteristics. First, these objects are reachable, that is, in a directed graph, there are paths that can be connected to them; second, these objects are useless , that is, the program will not use these objects in the future. If objects meet these two conditions, these objects can be judged as memory leaks in Java, these objects will not be collected by GC, but they occupy memory.

In C++, the scope of memory leaks is a bit larger. Some objects are allocated memory space and then become unreachable. Since there is no GC (Garbage Collection) in C++, these memory will never be recovered. In Java, these unreachable objects are collected by the GC, so programmers do not need to consider memory leaks in this part.

Through analysis, we know that for C++, programmers need to manage edges and vertices themselves, while for Java programmers, they only need to manage edges (no need to manage the release of vertices). In this way, Java increases the efficiency of programming.

Therefore, from the above analysis, we know that there is also a memory leak in Java, but the scope is smaller than that in C++. Because Java guarantees from the language that any object is reachable, all unreachable objects are managed by GC.

For programmers, GC is basically transparent and invisible. Although, we only have a few functions that can access the GC, such as the function System.gc() that runs the GC, but according to the Java language specification definition, this function does not guarantee that the JVM's garbage collector will execute. Because, different JVM implementers may use different algorithms to manage GC. Usually, GC threads have lower priority. There are also many strategies for the JVM to call GC. Some are that the GC starts to work when the memory usage reaches a certain level, some are executed regularly, some are executed smoothly, and some are executed intermittently. But generally, we don't need to care about that. Unless in some specific occasions, the execution of GC affects the performance of the application. For example, for web-based real-time systems, such as online games, the user does not want the GC to suddenly interrupt the execution of the application for garbage collection, then we need to adjust the GC parameters. , so that the GC can release memory in a gentle way, such as decomposing garbage collection into a series of small steps, which is supported by the HotSpot JVM provided by Sun.

A typical example of a Java memory leak is given below:

Vector v = new Vector(10); for (int i = 0; i < 100; i++) {    Object o = new Object();    v.add(o);    o = null; }

In this example, we cyclically apply for the Object object and put the applied object into a Vector. If we only release the reference itself, the Vector still refers to the object, so this object is not recyclable for GC. Therefore, if an object must be removed from the Vector after it is added to the Vector, the easiest way is to set the Vector object to null.

v = null

To understand this definition, we need to first understand the state of an object in memory. The following diagram explains what is a useless object and what is an unreferenced object .

As can be seen in the above figure, there are referenced objects and unreferenced objects in it. Unreferenced objects are collected by the garbage collector, but referenced objects are not. An unreferenced object is of course an object that is no longer used, because no object references it anymore. However, useless objects are not all unreferenced objects. Among them are also cited. It was this situation that caused the memory leak.

Xiameng's development notes official account reply [make money] to receive exclusive gift packs

Detailed interpretation

1. Java recycling mechanism

Regardless of the memory allocation method of any language, it is necessary to return the real address of the allocated memory, that is, to return a pointer to the first address of the memory block. Objects in Java are created by new or reflection methods. The creation of these objects is allocated in the heap (Heap), and the recycling of all objects is completed by the Java virtual machine through the garbage collection mechanism. In order to release objects correctly, GC will monitor the running status of each object, and monitor their application, reference, referenced, assignment, etc., Java will use the method of directed graph to manage memory, and monitor in real time whether the object can reach , if it is not reachable, it will be recycled, which also eliminates the problem of reference cycles. In the Java language, there are two criteria for judging whether a memory space complies with garbage collection: one is to assign a null value to the object, and the following is not called again; the other is to assign a new value to the object, thus re-allocating the memory space .

2. Causes of Java memory leaks

What is the root cause of Java memory leaks? A long-lived object holds a reference to a short-lived object, and memory leaks are likely to occur. Although the short-lived object is no longer needed, it cannot be recycled because the long-lived object holds its reference. This is what happens in Java. Scenarios where memory leaks occur.

Let's first take a look at the following example, why a memory leak occurs. In the following example, the A object refers to the B object, and the life cycle of the A object (t1-t4) is much longer than the life cycle of the B object (t2-t3). When the B object is not used by the application, the A object is still referencing the B object. In this way, the garbage collector has no way to remove the B object from memory, causing memory problems, because if A refers to more such objects, there will be more unreferenced objects existing and consuming memory space.

The B object may also hold many other objects, which will also not be collected by the garbage collector. All these unused objects will continue to consume previously allocated memory space.

There are mainly the following categories:

2.1 Static collection classes cause memory leaks

The use of HashMap, Vector, etc. is the most prone to memory leaks. The life cycle of these static variables is consistent with the application, and all the objects they refer to cannot be released, because they will always be referenced by Vector, etc.

E.g:

Static Vector v = new Vector(10); for (int i = 0; i < 100; i++) {    Object o = new Object();    v.add(o);    o = null; }

In this example, apply for an Object object cyclically, and put the applied object into a Vector. If only the reference itself is released (o=null), the Vector still refers to the object, so this object is not recyclable for GC. of. Therefore, if an object must be removed from the Vector after it is added to the Vector, the easiest way is to set the Vector object to null.

2.2 Listener

In java programming, we all need to deal with listeners. Usually, many listeners are used in an application. We will call methods such as addXXXListener() of a control to add listeners, but there are often no listeners when releasing objects. Remember to remove these listeners, increasing the chance of memory leaks.

2.3 Various connections

For example, database connections (dataSourse.getConnection()), network connections (socket) and io connections will not be automatically recycled by GC unless they explicitly call their close() method to close their connections. The Resultset and Statement objects may not be explicitly recycled, but the Connection must be explicitly recycled, because the Connection cannot be automatically recycled at any time, and once the Connection is recycled, the Resultset and Statement objects will be NULL immediately. But if you use a connection pool, the situation is different. In addition to explicitly closing the connection, you must also explicitly close the Resultset Statement object (close one of them, and the other one will also be closed), otherwise it will cause a large number of Statement objects to fail. freed, causing a memory leak. In this case, the connection is usually made in try, and the connection is released in finally.

2.4 References to Inner Classes and External Modules

The reference of the inner class is one of the easier to forget, and once it is not released, it may lead to a series of subsequent class objects that are not released. In addition, programmers should be careful about inadvertent references to external modules. For example, programmer A is responsible for module A and calls a method in module B such as:

public void registerMsg(Object b);

This kind of call needs to be very careful. If an object is passed in, it is likely that module B will keep a reference to the object. At this time, it is necessary to pay attention to whether module B provides the corresponding operation to remove the reference.

2.5 Singleton Pattern

Improper use of the singleton pattern is a common problem that causes memory leaks. The singleton object will exist for the entire life cycle of the JVM after initialization (in the form of static variables). If the singleton object holds an external reference, then this The object will not be properly reclaimed by the JVM, resulting in a memory leak. Consider the following example:

public class A {    public A() {        B.getInstance().setA(this);    }    ... } //B类采用单例模式 class B{    private A a;    private static B instance = new B();        public B(){}        public static B getInstance() {        return instance;    }        public void setA(A a) {        this.a = a;    }    public A getA() {        return a;    } }

Java memory leak strategy

There are three memory allocation strategies when a Java program is running, namely static allocation, stack allocation, and heap allocation. Correspondingly, the memory space used by the three storage strategies is mainly static storage area (also called method area), stack area and heap area.

Static storage area (method area): mainly stores static data, global static data and constants. This memory is allocated when the program is compiled and exists for the entire duration of the program's execution.

Stack area: When the method is executed, the local variables in the method body (including basic data types, object references) are created on the stack, and the memory held by these local variables will be automatically released at the end of the method execution. . Because the stack memory allocation operation is built into the processor's instruction set, it is very efficient, but the allocated memory capacity is limited.

Heap area: also known as dynamic memory allocation, usually refers to the memory directly new when the program is running, that is, the instance of the object. This part of the memory will be reclaimed by the Java garbage collector when not in use.

3.1 The difference between stack and heap

Some basic type variables and object reference variables defined in the method body (local variables) are allocated in the method's stack memory. When a variable is defined in a method block, Java will allocate memory space for the variable in the stack. When the scope of the variable is exceeded, the variable will be invalid, and the memory space allocated to it will also be released. removed, the memory space can be reused.

Heap memory is used to store all objects created by new (including all member variables of the object) and arrays. Memory allocated in the heap will be automatically managed by the Java garbage collector. After an array or object is generated in the heap, a special variable can also be defined in the stack. The value of this variable is equal to the first address of the array or object in the heap memory. This special variable is the reference variable we mentioned above. . We can access objects or arrays in the heap through this reference variable.

for example:

public class Sample {    int s1 = 0;    Sample mSample1 = new Sample();        public void method() {        int s2 = 1;        Sample mSample2 = new Sample();    } } Sample mSample3 = new Sample();

Both the local variable s2 and the reference variable mSample2 of the Sample class exist on the stack, but the object pointed to by mSample2 exists on the heap.

The object entity pointed to by mSample3 is stored on the heap, including all member variables s1 and mSample1 of this object, and it exists in the stack itself.

in conclusion:

The basic data types and references of local variables are stored on the stack, and the referenced object entities are stored on the heap. - Because they belong to the variables in the method, the life cycle ends with the method.

Member variables are all stored on the heap (including basic data types, references and referenced object entities) - because they belong to classes, class objects are eventually used by new.

After understanding Java's memory allocation, let's take a look at how Java manages memory.

3.2 How Java Manages Memory

Java's memory management is the allocation and release of objects. In Java, programmers need to apply for memory space for each object (except basic types) through the keyword new, and all objects are allocated space in the heap (Heap). In addition, the release of objects is determined and performed by the GC. In Java, the allocation of memory is done by the program, and the release of memory is done by the GC. This two-line approach really simplifies the programmer's work. But at the same time, it also increases the work of the JVM. This is one of the reasons why Java programs run slower. Because the GC can properly release the object, the GC must monitor the running status of each object, including the application, reference, reference, assignment, etc. of the object, and the GC needs to monitor it.

Monitoring the state of an object is to release the object more accurately and in a timely manner, and the fundamental principle of releasing an object is that the object is no longer referenced.

To better understand how GC works, we can think of objects as vertices of a directed graph, and reference relationships as directed edges of the graph, which point from the referrer to the referenced object. In addition, each thread object can be used as the starting vertex of a graph. For example, most programs start executing from the main process, then the graph is a root tree starting from the vertex of the main process. In this directed graph, all objects reachable by the root vertex are valid objects, and GC will not reclaim these objects. If an object (connected subgraph) is unreachable with this root vertex (note that the graph is a directed graph), then we consider that (these) objects are no longer referenced and can be recycled by GC.

Below, we give an example to illustrate how memory management can be represented by a directed graph. For each moment of the program, we have a directed graph representing the memory allocation of the JVM. The following picture on the right is a schematic diagram of the program on the left running to line 6.

public class Test {    public static void main(String[] args) {        // TODO Auto-generated method stub        Object o1 = new Object();        Object o2 = new Object();        o2 = o1;//此行为第6行    } }