Depth understanding of Java garbage collection and memory management mechanisms of the Java garbage collector

Outline

We all know that Java memory management is very "Automation" allows us to Java engineers can avoid distress memory management, but we are learning and memory allocation GC also makes sense: When you need to troubleshoot a variety of memory overflow, memory leaks when, when the garbage collection system becomes the bottleneck to achieve higher concurrency, only by understanding its principles, we can better monitor and regulate these issues. Speaking of garbage collection (Garbage Collection, GC), most people regard this technology as a companion product of the Java language. Among GC really need to address three issues deserve all of us who want to understand and learn to think GC:

1. What memory need to recycle? 2. When recovery of memory? 3. How to reclaim the memory?

With these three issues to think and learn, and I believe must be able to make us more understanding of Java's garbage collection mechanism.


First we look at the first question, which needs to recover memory? We know that in almost all the Java heap to store the object instance, the garbage collector in front of the heap for recycling, the first thing is to determine which of these objects still "alive" with, what has been "dying" (that is impossible then the object referenced by any means ). Then the "death" of these objects is that we need to recover the object. Here are just objects to determine whether the survival of the algorithms used.

First, determine whether the survival of the subject algorithm

1. The reference count algorithm (Reference Counting)

Analysis of Algorithms

Reference counting garbage collector strategy is early in its principle is to add a reference to the object counter whenever a place to refer to it, the counter value is increased by one; when make references fail, the counter value is decreased by 1; any time counter 0 of the object is no longer possible to be used.

Advantages and disadvantages

Advantages: reference counting collectors can be performed quickly, interleaving in the program run. More favorable procedures need not be interrupted for a long time real-time environment.

Cons: circular reference can not be detected. As a parent there is a sub-object references, in turn, the child object reference to the parent object. In this way, they can never be the reference count is zero. Reference circular reference counting algorithm can not solve the problem, for example:

public class Main {    
       public static void main(String[] args) {
            MyObject object1 = new MyObject();        
            MyObject object2 = new MyObject();

            object1.object = object2;        
            object2.object = object1;

            object1 = null;        
            object2 = null;

           //假设在这行发生GC,object1和object2能否被回收?          
            System.gc();    
      }
 }复制代码

The last two will face object1 and object2 assignment is null, that is to say object1 and object2 point has been the object can no longer be accessed, but because they refer to each other, causing them to reference counter is not 0, then the garbage collector they never recovered.

It is precisely because there is a problem with this method, so the current mainstream Java virtual machine which does not have the choice of reference counting algorithm to determine whether the object alive (memory management).


2. reachability analysis algorithm (Reachability Analysis)

2.1 Analysis of Algorithms

Voice in mainstream business applications (Java, C #) mainstream implementation, to determine whether the object is survived by reachability analysis algorithm (also known as the root search algorithm). The algorithm is from discrete mathematics of graph theory introduced the basic idea of the algorithm is: Through a series of objects called "GC Roots" as a starting point, to start the search down from these nodes, search the traveled road is called when a reference chain (refence chain), when an object to GC Roots is not connected to any list (in the words of graph theory, the GC Roots to this object is unreachable), then it proves that this object is not available . The following figure shows, objects object5, object6, object7 although there are interrelated, but they are not up to the GC Roots, so they will be determined to be recyclable objects.

Java can be used as GC Roots objects have

1. VM stack objects referenced (local variable table)

2. The method of the object in a static property references

3. The method of the object in the constant reference

4. The objects referenced in native method stacks (Native objects)


Next, we need to think about the second question, what time to reclaim memory? This problem can be to find answers in the algorithm HotSpot

Two, HotSpot algorithm

1. Enumeration root

        Reachability analysis to find this reference chain operated from GC Roots node, for example, can be used as the primary node GC Roots global reference (e.g., constant or static property class) with the execution context (e.g., a local variable stack frame table ), and now the only method for many applications there are hundreds of megabytes area, if there is individually checked references, it will definitely consume a lot of time.

  In addition, the reachability analysis of sensitive execution time is also reflected in GC pauses, because the analysis must be able to ensure a consistent snapshot - consistency mean here is the implementation of the entire system over the entire period looks like frozen at some point in time, can not appear object reference relationship is constantly changed during the analysis, the point is not satisfied, then the accuracy of the analysis results can not be guaranteed. This leads must stop all Java threads execute on GC (Sun said this thing is "Stop The World") is one of the important reasons, even in the CMS collector known as (almost) pause does not occur, the enumeration when the root is also necessary to pause.

  Current mainstream Java virtual machine is accurate formula GC, so when the implementation of the system to a halt, does not need to be left out of all checked and global execution context reference position, the virtual machine should have a way to directly learn which places the existence of an object reference. In achieving the HotSpot, it is a set of data structures referred to OopMap to achieve this purpose, when the loaded class, the objects are put HotSpot what type of data is calculated on what offset, the JIT compiler process, which will register the stack and a reference position is recorded at a specific position. In this way, GC when scanning can directly know the information.

2. Point Security

With the assistance of OopMap, HotSpot can quickly and accurately complete GC Roots enumeration, but a very real problem cropped up: a reference relationship can lead to change, or changes in the content of the instruction OopMap very much, if each instruction generates a corresponding OopMap, it will require a lot of extra space, this space will become a high cost GC.

  In fact, HotSpot indeed are not generated for each instruction OopMap, as already mentioned, only in the "special position" record this information, the problem is called (Safepoint), ie program execution, not all places can come to a halt began GC, only to reach a safe point to pause. Selected Safepoint neither be too small to allow GC to wait too long, not too often that an excessive increase in run-time load. So, basically selected safety point of procedure "whether to make the program long-running features' selected as the standard - because each instruction execution time are very short, less likely because the program instruction stream length too the reason long run too long, "long-running" the most obvious feature is the sequence of instructions reuse. For example: a method call, jump cycle, abnormal jumps, etc., instructions having these functions will produce Safepoint.

  For SafePoint, another issue to consider is how all the threads (this does not include execution threads JNI calls) GC occurs when all went to the nearest safe point and then come to a halt, there are two options:

    1. preemptive interrupt (Preemptive Suspension): When GC occurs, first of all threads in all the disruption, if it is found where there is not a security thread break point, then resume the thread, let it go to the point of security; (almost no virtual machine this way)

    2. Active interrupt (Voluntary Suspension): When the GC needs to be interrupted thread, the thread does not directly operate, just simply set a flag to flag this initiative to poll each thread of execution and found true interrupt flag is interrupted himself hanging play. Where the polling place and safety signs point are coincident, plus the need to create objects allocated local memory.

Now the mainstream way of virtual machines are used in active interrupted want GC events.

3. Security Zone

About the above Safepoint it seems to have the perfect solution to the problem of how to get into the GC, but not necessarily the actual situation. Safepoint mechanism ensures that the program is executed, in a short period of time will encounter Safepoint can enter the GC. However, when the program does not perform it? The so-called program does not perform is not assigned CPU time, a typical example is the thread is in sleep state or the Blocked state, this time unable to respond to the JVM thread interrupt request, "walk" to a safe place to interrupt pending, apparently not too JVM waiting thread may re-allocate CPU time. This case, we need to address the security zone.

  Security in the region refers to a code fragment, the reference relationship will not change. GC is safe to start anywhere in this area. The security zones can be seen as an extension of a security point.

  When thread executes the code security in the region, first of all identify themselves into a secure area, so that when the JVM to initiate GC this time, do not control identified itself as the safe area status of the thread; the thread to leave the secure area when it is to check whether the system has completed the enumeration root (or the entire process GC), if completed, then the thread continues, otherwise it must wait until it receives the signal can safely leave the security zone so far.


Well, the most exciting and the most core of the moment, is about how to reclaim memory. Knowledge point of this piece is usually the interviewer will test several times to the place, we need to focus on learning!

Third, the garbage collection algorithm

1. mark - sweep algorithm

The most basic garbage collection algorithm, the algorithm is divided into "mark" and "clear" two stages: first mark all objects need to be recovered, after the completion of uniform recycling mark out all the objects are marked.

Advantages: simple principle

There are two disadvantages: 1 efficiency, marking and removal efficiency is not high. 2, a large amount of contiguous memory fragmentation is not clear after the mark, too much can lead to space debris when the program needs to be unable to find enough contiguous memory to allocate memory when larger objects had to trigger another garbage collection operation in advance.

2. Copy the algorithm

Available memory capacity is divided by two of equal size, wherein one uses only when used over this memory, it will also live objects copied to another block of memory up, and then used to clean out a memory space.

Advantages: the ability to solve the memory fragmentation problems.

Disadvantages: heap space efficiency is extremely low (after all, divided into two halves, one half only)

3. Mark - Collation Algorithm

Mark - sorting algorithm marks - Clear improvements have been made on the basis of the algorithm, the mark phase is the same mark all objects need to be recovered, not directly for recyclable objects to clean up after completion mark, but to all surviving objects to end mobile, clean out the recyclable objects during the move, this process is called consolidation.

Pros: Memory does not produce a large number of discrete memory fragmentation problems later finishing.

Cons: replication algorithm will perform at a high survival rate of more cases of Object copy operation, efficiency will be low, use of a mark in the low survival rate of cases of objects - sorting algorithm efficiency will be greatly improved.

4. generational collection algorithm:

Depending on the survival period of objects in memory, the memory is divided into a few pieces, java virtual machine memory is generally divided into the old generation and the new generation, the general allocation of memory space in the new generation when a new object is created when the new generation recycling garbage collector after a few still live objects are moved to the old generation memory, also created directly in the older generation when a large object can not find enough contiguous memory in the new generation.

Generational garbage collection strategy is based on this fact: the life cycle of different objects is not the same . Therefore, objects of different life cycle can take different recovery algorithms, in order to improve the recovery efficiency.

Young Generation (Young Generation)

1. First of all newly created objects are placed on the young generation. The young generation's goal is to quickly collect those that fall short of the life cycle of the object as possible.

2. A new generation memory in accordance with 8: 1: 1 ratio into a eden region and two survivor (survivor0, survivor1) region. A Eden area, two Survivor areas (in general). Most objects generated in the Eden area. First eden viability in the recovery of an object is copied to a survivor0 area, then emptied eden region, when the survivor0 area also hosts the full, the district will eden region and survivor0 surviving copy objects to another survivor1 area, then emptied and the eden survivor0 region, this time survivor0 area is empty, then the swap area and survivor1 survivor0 region, i.e. survivor1 holding area is empty, and so forth.

3. When survivor1 zone is not sufficient to store the object eden survival and survivor0, it will be live objects placed directly on to the old era. If it's also full of old would trigger a Full GC, which is the new generation, have recovered years old

GC 4. Cenozoic is also known as Minor GC, MinorGC relatively high frequency of occurrence (not necessarily the Eden area is full and so we have to trigger)

The old generation (Old Generation)

1. Object went through N times after garbage recycling is still alive in the young generation, the older generation will be put. Therefore, it is considered the old generation are stored in some of the longer life cycle of the object.

2. Memory is also much larger than the new generation (roughly the ratio is 1: 2), when the trigger Major GC old's memory is full That is Full GC, Full GC occurrence frequency is relatively low, the old target's survival time is relatively long, high survival mark .

Permanent generation (Permanent Generation)

Used to store static files, such as Java classes and methods. Permanent generation no significant effect on garbage collection, but some applications may call some or dynamically generated class, such as Hibernate, etc., at this time need to set a relatively large permanent generation space to store these processes run in the new class.


Fourth, the garbage collector

Use of the new generation of collectors Collector: Serial, PraNew, Parallel Scavenge

Old collector's use of collectors: Serial Old, Parallel Old, CMS

Serial Collector (replication algorithm)

New generation of single-threaded collectors, mark and sweep are single-threaded, the advantage of simple and efficient.

ParNew collector (stop - replication algorithm) 

The new generation of collectors, can be considered a multi-threaded version of the Serial collector, it has better performance than the Serial multi-core CPU environments.

Parallel Scavenge collector (stop - replication algorithm)

Parallel collector, the pursuit of high throughput, efficient use of CPU. Throughput is generally 99%, a certain time user thread = / (user thread time + GC thread time). Suitable for less demanding scenes backstage interaction corresponding applications.

Serial Old collector (mark - Collation Algorithm)

Old's single-threaded collector's edition Serial old collector.

Parallel Old collector (stop - replication algorithm)

Parallel Scavenge collector's version of the old, the parallel collector, the throughput priority

CMS (Concurrent Mark Sweep) collector (labeled - cleaning algorithm)

High concurrency, low pause, seek the shortest response time to select a high recovery GC pause time, cpu occupancy is relatively high, fast response time, dwell time is short, the pursuit of polynuclear cpu

G1 (Garbage-First) collector (the forefront of the most sophisticated collectors)

Region regionalization garbage collector: The biggest advantage is split up, avoid full memory scan, just need to follow the area to be scanned can be.

Five, GC enforcement mechanism

Since the object of the generational process, and therefore garbage collection zone, not the same time. GC There are two types: Scavenge GC and Full GC.

Scavenge GC

In general, when a new object is generated, and upon failure of Eden space applications, it will trigger Scavenge GC, the GC area of ​​Eden, remove non-live objects, and the object moving to yet survive Survivor areas. Then come up with two zones of Survivor. GC this way for younger generations of Eden zone, will not affect the old generation. Because most of the objects are from the beginning of the Eden area, while the Eden area does not allocate large, so the GC Eden area occur frequently. Thus, in general use here need fast speed, high efficiency of the algorithm, the Eden to be able to free up as soon as possible.

Full GC

The whole heap sort, including Young, Tenured and Perm. Full GC because of the need to recover the entire heap, so slower than Scavenge GC, and should therefore reduce the number of Full GC as possible. In the process of tuning the JVM, a large part of the work is adjusted to FullGC. The following reasons may lead to Full GC:

1. old generation (a position of tenured) are filled

2. Persistence Generation (Perm) is filled

3.System.gc () call is displayed

4. Dynamic changes after a single GC Heap allocation policy of each domain


Six, Java GC will also have a memory leak occurs

1. static collection most likely to appear as memory leaks using HashMap, Vector, etc., consistent application of these life cycle and static variables, all objects Object can not be released because they will always be applied with Vector and so on.

   Static Vector v = new Vector();
   for (int i = 1; i<100; i++) {
        Object o = new Object();
        v.add(o);
        o = null;
   }复制代码

, Vector object code exists in this case the stack referenced Object object references and v o. In the For loop, we continue to generate new object, and then add it to the Vector object, after which the o reference blank. The problem is that when o reference is made blank, if GC occurs, Object objects we create whether it can be recovered GC? the answer is negative. Because, when GC referenced in the tracking code stack, you will find references v, and continue to track down, you will find references to v memory space and there point to Object object. That o Although references have been left blank, but there are still other objects Object references it can be accessed, so the GC can not be released. If after this cycle, Object Object program has no effect, then we believe this Java program creates a memory leak.

2. The various connections, database connections, network connections, IO connectivity does not appear close close call, not GC recovery of a memory leak.

3. Use of listener, while there is no corresponding release object to remove listeners, it also can lead to memory leaks.


References:

Books: Zhou Zhiming - "in-depth Java Virtual Machine," in Chapter 3

Blog: 1. depth understanding of Java garbage collection

2. the Java garbage collection mechanism with several garbage collection algorithm

3. An article to get the garbage collection mechanism in the java interview questions


Guess you like

Origin juejin.im/post/5d75162ee51d453c12504e78
Recommended