Java learning garbage collection

Before talking about garbage collection, let me explain where the garbage collector of the JVM is needed in the memory structure, and which ones are not needed.
There are generally three ways of memory allocation:

  1. Allocated from the static storage area. Assigned at compile time, such as static variables.
  2. Created on the stack. Local variables of various basic data types are created on the stack. When the programmer exits the scope of the variable, its memory is automatically released, and the garbage collector is naturally not used.
  3. Created in the heap. Objects allocate memory in the queue when they are created. The space of the heap is controlled by the garbage collector. The garbage collector runs as a separate thread.

It can be seen that the garbage collector is more focused on the space in the heap.

Before talking about the garbage collection mechanism, let's first understand the life cycle of objects and variables

The life cycle of local variables and reference variables: As long as the method in which the local variable is located has not been executed, the local variable is still alive.
The life cycle of instance variables (member variables): Since the instance variable is in the object, the attached object is still alive, and the instance variable is also alive.
The life cycle of the object : If there is no reference to the object, the object will be kicked out of the heap.

When will the object be reclaimed by the garbage collector?

  1. Object references leave its scope permanently

    class Duck{
    
    }
    public class StackRef{
        public void foof(){
            barf();
        }   
        public void barf(){
            Duck d=new Duck();
        }
    }

    In the above code, when the barf() method is executed, the reference variable d also leaves its scope, so the garbage collector will reclaim the variable space applied by new Duck() in the heap

  2. References are copied to other objects

    public class ReRef{
        Duck d=new Duck();
        public void go(){
            d=new Duck();
        }
    }

    In the above example, the object reference is reset, and the space address of a new object is stored. Then the originally pointed space does not have other references pointing to it, and it will naturally be recycled by the garbage collector.

  3. Directly set the reference to null

public class ReRef{
    Duck d=new Duck();
    public void go(){
        d=null;
    }
}

Null represents an empty byte combination. This case is similar to the second case, except that the value of the reference variable is set to null, and again I won't do too much description.

The following content is reproduced from: http://blog.csdn.net/zhangerqing

One, garbage collection

There is a saying: There is a wall between Java and C++ with memory allocation and garbage collection technology. People outside the wall want to go in, and people inside the wall want to go out! The meaning of this sentence, please consider yourself. In general, C and C++ programmers sometimes suffer from memory leaks, and memory management is a headache, but Java programmers, and envy C++ programmers, can control everything by themselves so that they will not be in memory management. It seems helpless, but it is so. As Java programmers, it is difficult for us to control the memory recovery of the JVM. We can only adapt according to its principles to improve the performance of the program as much as possible. Let's start to explain Java garbage collection, namely Garbage Collection, GC. From the following four aspects:

1. Why do we need to carry out garbage collection?

As the program runs, more and more memory is occupied by instance objects, variables and other information in the memory. If garbage collection is not performed in time, the performance of the program will inevitably decrease, and some unnecessary memory may even be caused by insufficient available memory. The system is abnormal.

2. Which "garbage" needs to be recycled?

Of the five areas we introduced above, three do not require garbage collection: the program counter, the JVM stack, and the local method stack. Because their life cycle is synchronized with the thread, as the thread is destroyed, the memory they occupy will be automatically released, so only the method area and heap need to be GC. Specific to which objects, a brief overview: if an object no longer has any references, then it can be recycled. To explain it in layman's terms, if an object has no effect, it can be recycled as waste.

3. When will garbage collection be carried out?

According to a classic reference counting algorithm, each object adds a reference counter. Every time it is referenced, the counter is incremented by 1, and the reference is lost, and the counter is decremented. When the counter remains at 0 for a period of time, the object is considered to be eligible for use. Recyclable. However, this algorithm has obvious flaws: when two objects refer to each other, but the two have no effect, according to the convention, they should be garbage collected, but their mutual references do not meet the conditions of garbage collection, so they cannot be perfect. To deal with this memory cleanup, Sun's JVM does not use a reference counting algorithm for garbage collection. Instead, it uses a so-called root search algorithm, as shown in the figure below:

Write picture description here

The basic idea is: start with an object called GC Roots and search downwards. If an object cannot reach the GC Roots object, it means that it is no longer referenced and can be garbage collected (for the time being understood this way, in fact In fact, there are some differences. When an object is no longer referenced, it is not completely "dead". If the class overrides the finalize() method and has not been called by the system, then the system will call the finalize() method once to Finish the final work. During this period, if the object can be re-associated with any object referenced by GC Roots, the object can be "reborn", if not, it means it can be recycled completely), as above In the figure, Object5, Object6, and Object7, although they may still refer to each other, but in general, they have no effect. This solves the problem that the reference counting algorithm cannot solve.

Supplementary reference concept: After JDK 1.2, references have been expanded, introducing four types of references: strong, soft, if, and virtual, which are marked as the four types of referenced objects, which have different meanings in GC:

a> Strong Reference. It is a reference added to an object that has just been new. Its characteristic is that it will never be recycled.

b> Soft Reference. A class declared as a soft reference is an object that can be recycled. If the JVM memory is not tight, this type of object may not be recycled. If the memory is tight, it will be recycled. There is a question here. Since the object referenced as a soft reference can be recycled, why not recycle it? In fact, we know that there is a cache mechanism in Java. Take the literal cache as an example. Sometimes, the cached object is currently dispensable, but it is only left in the memory if there is a need, and there is no need to reallocate the memory. It can be used, so these objects can be quoted as soft references, which is convenient to use and improves program performance.

c> Weak Reference. Weak reference objects must be garbage collected. Regardless of whether the memory is tight or not, when GC is performed, the objects marked as weak references will definitely be cleaned up and recycled.

d> Phantom Reference. Phantom Reference is weak and can be ignored. JVM doesn't care about Phantom Reference at all. Its only role is to do some tracking records to assist the use of the finalize function.
In conclusion, what kind of classes need to be recycled? Useless classes, what are useless classes? Need to meet the following requirements:

1> All instance objects of this class have been recycled.

2> The ClassLoader that loaded this class has been recycled.

3> The corresponding reflection class java.lang.Class object of this class is not referenced anywhere.

4. How to carry out garbage collection?

The content of this block is mainly to introduce the garbage collection algorithm, as we have introduced earlier, the memory is mainly divided into three blocks, the new generation, the old generation, and the persistent generation. The characteristics of the three generations are different, resulting in different GC algorithms. The new generation is suitable for objects with a short life cycle and frequently created and destroyed. The old generation is suitable for objects with a relatively long life cycle. The persistent generation refers to Sun HotSpot. Method area (in some JVMs, there is no such thing as persistent generation at all). First introduce the concepts and characteristics of the new generation, old generation, and lasting generation:

Write picture description here

New generation : New Generation or Young Generation. The above is roughly divided into the Eden area and the Survivor area, and the Survivor area is divided into two parts of the same size: FromSpace and ToSpace. The newly created objects are allocated memory by the new generation. When the Eden space is insufficient, the surviving objects will be transferred to Survivor. The size of the new generation can be controlled by -Xmn, or you can use -XX:SurvivorRatio to control Eden and Survivor. The ratio of the
old generation: Old Generation. Used to store objects in the new generation that are still alive after multiple garbage collections, such as cached objects. The occupied size of the old generation is the value of -Xmx minus the value corresponding to -Xmn.

Permanent generation : Permanent Generation. In Sun's JVM, it means method area, although most JVMs do not have this generation. Some information that mainly stores constants and classes defaults to a minimum of 16MB and a maximum of 64MB. The minimum and maximum values ​​can be set through -XX:PermSize and -XX:MaxPermSize.

Common GC algorithms:

Mark-Sweep Algorithm (Mark-Sweep)

The most basic GC algorithm marks the objects that need to be reclaimed, and then scans them and reclaims the marked ones. This results in two steps: marking and clearing. This algorithm is not efficient, and memory fragmentation will be generated after the cleanup is completed. In this way, if there are large objects that require continuous memory space, it also needs to be defragmented. Therefore, this algorithm needs to be improved.

Copying algorithm (Copying)

As we mentioned earlier, the new generation of memory is divided into three parts, the Eden area and two Survivor areas. Generally, Sun’s JVM will adjust the ratio of the Eden area to the Survivor area to 8:1 to ensure that one Survivor area is free, so During garbage collection, put objects that do not need to be recycled in the free Survivor area, and then completely clean up the Eden area and the first Survivor area. There is a problem, that is, if the space of the second Survivor area is not enough What to do? At this time, when the Survivor area is not enough, you need to temporarily borrow the persistent memory to use it. This algorithm is suitable for the new generation.

Mark-Compact (or called compression) algorithm (Mark-Compact)

The same as the first half of the mark-clear algorithm, except that after marking the objects that do not need to be recycled, the marked objects are moved together to make the memory contiguous, so that only the memory outside the marked boundary is cleaned up. This algorithm is suitable for permanent generation.

Common garbage collectors:

According to the many algorithms mentioned above, JVM has different implementations every day. Let's take a look at some common garbage collectors:

Write picture description here

First introduce three actual garbage collectors: serial GC (SerialGC), parallel collection GC (Parallel Scavenge) and parallel GC (ParNew).

1. Serial GC . It is the most basic and oldest collector, but it is still widely used. It is a single-threaded garbage collection mechanism, and not only that, its biggest feature is that all threads that are executing need to be suspended during garbage collection. (Stop The World). For some applications, this is unacceptable, but we can think of it this way. As long as we can control the pause time within N milliseconds, we can still accept most applications, and The fact is that it has not disappointed us. A pause of tens of millimeters is completely acceptable to us as a client. This collector is suitable for applications with single CPU, small space for the new generation and not very high requirements for pause time. The above is the default GC method at the client level, which can be forcibly specified by -XX:+UseSerialGC.

2. ParNew GC . It is basically the same as Serial GC, but the essential difference is that it adds a multi-threading mechanism to improve efficiency, so that it can be used on the server side (Server), and it can cooperate with CMS GC, so there are more reasons to set it. On the Server side.

3. Parallel Scavenge GC. The entire scanning and copying process is carried out in a multi-threaded manner. It is suitable for applications with multiple CPUs and short pause time. It is the default GC method at the server level. You can use -XX:+UseParallelGC to force the designation, use- XX:ParallelGCThreads=4 to specify the number of threads. The following are several groups of use combinations:

4. CMS (Concurrent Mark Sweep) collector . The goal of this collector is to solve the pause problem of Serial GC in order to achieve the shortest recovery time. Common B/S architecture applications are suitable for this collector because of its high concurrency and high response characteristics. The CMS collector is implemented based on the "mark-sweep" algorithm. The entire collection process is roughly divided into 4 steps:

CMS initial mark, CMS concurrenr mark, CMS remark, and CMS concurrent sweep.

The two steps of initial marking and remarking still need to suspend other user threads. The initial marking only marks the objects that GC ROOTS can directly associate with, and the speed is very fast. The concurrent marking phase is the GC ROOTS root search algorithm phase, which will determine whether the object is alive. The re-marking phase is to correct the marking records of the part of the object whose markings are changed due to the continued operation of the user program during the concurrent marking. The pause time in this phase will be slightly longer by the initial marking phase, but shorter than the concurrent marking phase. . Since the collector thread can work with the user thread in the longest concurrent marking and concurrent clearing process in the whole process, overall, the memory recovery process of the CMS collector is executed concurrently with the user thread.

The advantages of CMS collector: concurrent collection, low pause, but CMS is far from perfect.

The CMS collector has three major shortcomings :

a>. The CMS collector is very sensitive to CPU resources. In the concurrent phase, although the user thread will not be stalled, it will take up CPU resources and cause the reference program to slow down and the total throughput to drop. The number of recycling threads started by CMS by default is: (CPU number + 3) / 4.

b>. The CMS collector cannot handle floating garbage, and "Concurrent Mode Failure" may appear. After the failure, another Full GC is generated. Since the user thread is still running during the concurrent cleanup phase of the CMS, new garbage will continue to be generated as the program runs self-heating. This part of the garbage appears after the marking process. The CMS cannot process them in this collection and has to save it for the next GC. Clean it up. This part of garbage is called "floating garbage". It is also because user threads still need to run in the garbage collection phase, that is, enough memory space needs to be reserved for user threads to use, so the CMS collector cannot wait until the old age is almost completely filled up before collecting it like other collectors. Part of the memory space is reserved for the operation of the program during concurrent collection. In the default setting, the CMS collector will be activated when 68% of the space is used in the old generation. The trigger percentage can also be provided by the value of the parameter -XX:CMSInitiatingOccupancyFraction to reduce the number of memory reclamations and improve performance. If the memory reserved during the operation of the CMS cannot meet the needs of other threads of the program, a "Concurrent Mode Failure" failure will occur. At this time, the virtual machine will start the backup plan: temporarily enable the Serial Old collector to re-carry out the garbage collection of the old generation, so The pause time is very long. Therefore, setting the parameter -XX:CMSInitiatingOccupancyFraction too high will easily lead to "Concurrent Mode Failure" failure, and performance will decrease instead.

c>. The last shortcoming, CMS is a collector based on the "mark-sweep" algorithm. After using the "mark-sweep" algorithm to collect, a large number of fragments will be generated. When there is too much space fragmentation, it will bring a lot of trouble to the object allocation. For example, for large objects, the memory space cannot find contiguous space to allocate and must trigger a Full GC in advance. In order to solve this problem, the CMS collector provides a -XX:UseCMSCompactAtFullCollection switch parameter, which is used to add a defragmentation process after Full GC. You can also use the -XX:CMSFullGCBeforeCompaction parameter to set how many uncompressed Full GCs are executed, followed by Let's do a defragmentation process.

5. G1 collector. Compared with the CMS collector, there are many improvements. Firstly, it is based on the mark-defragment algorithm, which will not cause memory fragmentation problems. Secondly, it can control the pause more accurately, which will not be described in detail here.

6. Serial Old. Serial Old is the old version of the Serial collector. It also uses a single thread to perform the collection, using the "mark-and-sort" algorithm. Mainly use virtual machines in Client mode.

7. Parallel Old. Parallel Old is the old version of the Parallel Scavenge collector, using multi-threading and a "mark-and-sort" algorithm.

8. The RTSJ garbage collector is used for Java real-time programming, which will be introduced later.

Two, Java program performance optimization

gc() call

Calling the gc method implies that the Java virtual machine has made some efforts to reclaim unused objects so that the memory currently occupied by these objects can be quickly reused. When control returns from the method call, the virtual machine has done its best to reclaim space from all discarded objects. Calling System.gc() is equivalent to calling Runtime.getRuntime().gc().

Call and rewrite of finalize()

gc can only clear the memory allocated on the heap (all objects in the pure java language use new allocated memory on the heap), but cannot clear the memory allocated on the stack (when using JNI technology, memory may be allocated on the stack, For example, when java calls the c program, and the c program uses malloc to allocate memory). Therefore, if some objects are allocated in the memory area on the stack, then gc is out of control, and the memory recovery of the objects on the stack depends on finalize(). For example, when java calls a non-java method (this method may be c or c++), the malloc() function of c may be called inside the non-java code to allocate memory, and unless the free is called () Otherwise, the memory will not be released (because free() is a function of c). At this time, the work of releasing the memory must be performed. GC does not work, so it is necessary to call free() in an inherent method inside finalize().

Excellent programming habits

(1) Avoid creating an object in the loop body, even if the object does not occupy much memory space.
(2) Try to make the object meet the garbage collection standard in time.
(3) Don't adopt too deep inheritance hierarchy.
(4) Accessing local variables is better than accessing variables in the class.

This section will be updated continuously!

Three, common problems

1. Memory overflow

That is, the memory of the java virtual machine you requested to allocate exceeds what the system can give you, and the system cannot meet the demand, so overflow occurs.
2. Memory leak

It means that you apply to the system to allocate memory for use (new), but you do not return it (delete) after you use it. As a result, you can no longer access the memory you have applied for, and the allocated memory can no longer be used. Use, with the continuous consumption of server memory, and more and more unusable memory, the system cannot allocate it to the required program again, resulting in leakage. As it continues, the program gradually runs out of memory and overflows.

The content of this chapter is based on theory. I will continue to add some practical operations, such as verifying the effect of garbage collection, or memory monitoring. At the same time, I hope that readers will continue to give guidance and suggestions. If you have any questions, please contact: egg:

Email: [email protected]

Weibo: weibo.com/xtfggef

If reprinted, please indicate the source ( http://blog.csdn.net/zhangerqing ), thank you!

The End

Guess you like

Origin blog.csdn.net/dypnlw/article/details/82687447