Learning the underlying principles of JVM (4) garbage collection

Previous post: Learning the underlying principles of JVM (3) JVM memory area
Java programmers never seem to write any garbage collection code in the code like C++ programmers, so that programmers can focus more on the development of business logic. In this way, does it mean that JAVA is running without garbage? In fact, this is not the case, thanks to the garbage collection mechanism of the JVM

Garbage collection

Object reference type

  • Strong references (strong reference objects cannot be recycled, OOM appears)
    For strongly referenced objects, even if OOM appears, the object will not be recycled.
    Strong references are our most common ordinary object references. As long as the strong reference points to an object, It means that this object is still alive, and the garbage collector will not reclaim this object.
  • Weak references (recycle as long as there is GC)
    Weak references need to be implemented using the java.lang.ref.WeakReference class. For objects with only weak references, as long as there is garbage collection, the memory space occupied by the object will be reclaimed regardless of whether the memory space of the JVM is sufficient or not.
  • Soft reference (recycled when the memory is not enough)
    Soft reference is a relatively strong reference weakened some of the references, need java.lang.ref.SoftReference class to achieve
    • When the system memory is sufficient, it will not be recycled
    • When the system memory is insufficient, it will be recycled. For
      example, the cache can use soft references. When the memory is enough, it will be reserved, and when it is not enough, it will be recycled
  • Phantom reference (recycling mechanism for detecting objects)
    Phantom reference needs to be implemented by the java.lang.ref.Phantomreference class. Phantom reference is mainly used to track the state of the object being garbage collected.
    The sole purpose of setting the virtual reference association is that the object is recycled. When receiving a system notification or adding further operation processing

How to judge whether it is garbage

  • Reference counting method
    As long as the object in the JVM is referenced by others (holding a reference to the object), it means that the object is not garbage. If an object is referenced by no pointer in the JVM, it is a garbage object.
    Principle: To put it plainly is to count.
    Disadvantages: If the A object and the B object hold each other's references, but no other objects refer to them, then the number of references to these two objects will always be 1, and will never be recycled

Insert picture description here

  • Reachability analysis
    starts to look down through the GC Root object to see if an object is reachable. Unreachable objects are memory garbage. In this way, the mutual pointing problem of reference counting can be avoided. It is also current garbage The default garbage analysis marking algorithm of the collector.
    Insert picture description here
    Which objects in JAVA can be used as GC ROOT
    • Objects referenced in the virtual machine stack (local variable table in the stack frame).
    • Object referenced by class static property
    • Objects referenced by constants in the method area
    • Surviving Thread object
    • Bootstrap ClassLoader
    • Load class objects through bootStrap classLoader or ext classLoader.
    • Object being locked by synchronized
    • Objects in the local method stack

Classification of GC

According to the division of the shared memory area, we can speculate that there may be different GCs

  • Minor GC: The new generation GC refers to the garbage collection process in the young area
  • Major GC: Old GC refers to the process of garbage collection in the old area
  • Full GC: The new generation + old generation young area and the old area perform the garbage collection process together.
    However, there is no separate Major GC when the JVM is running. Major GC will definitely be accompanied by Minor GC, which is Full GC.

When to perform GC

GC is automatically completed by JVM and depends on the JVM system environment, so the timing of triggering is uncertain. Our program actively calls the System.gc() method to inform the JVM that a garbage collection is needed, but the specific GC running time cannot be specified and controlled. That is to say, System.gc() only informs the JVM to perform garbage collection, and when the GC collection is triggered is determined by the JVM.

When to trigger GC

  • When Eden area or S area is not enough
  • There is not enough space in the old age
  • The method area is not enough
  • System.gc()

How to perform GC (algorithm)

Garbage collection algorithm


  • Before mark-sweep cleaning
    Insert picture description here

As can be seen from the above figure: after the mark is cleared, a large number of discontinuous memory fragments will be generated. Too many fragments may lead to the need to allocate larger objects during the running of the program. The allocation requirements cannot be met and the GC operation is performed again. This is a vicious circle.

  • Mark-compact
    Insert picture description here

Problem analysis: In the
end, there will be no problems of space fragmentation and waste of space, but the calculations brought about by the process of arranging cannot be underestimated

  • Mark-copy
    Insert picture description here

There is always a memory area that is unused... resulting in a waste of space. Suitable for a large number of objects to live and die.

Garbage Collector (Garbage Collector)

Is the implementation of the above garbage collection algorithm

Garbage collection principle

In the JVM releases over the years, most of the garbage collectors of the JVM are based on the principle of generational garbage collection,
that is, different regions (old generations, new generations), different generations, garbage collection using different garbage collection algorithms Device

Where: Young district: mark to copy algorithms majority (object after being allocated, most of the right.
Like the life cycle is short, replication efficiency Young area is relatively high) Old District: Mark Clear or tags to organize (Old compare survival time zone objects long)

  • Serial (Serial) Old & Serial
    Serial Serial & Old collector is the most basic, the development of the oldest garbage collector, the only king (selection) before JDK1.3
    Serial is for the new generation of the garbage collector, the use of mark -The replication algorithm Serial Old is a garbage collector of the old age. It uses the mark-and-sort algorithm
    Serial. The combined garbage collector is characterized by a single-threaded garbage collector. It will only use a single collection thread to complete the garbage cleanup work. More importantly, It needs to pause other threads during garbage collection (Stop The World) STW The whole world is quiet
    Insert picture description here

Summary:
Advantages: simple and efficient, single-threaded GC collection efficiency is high
Disadvantages: the collection process requires STOP THE WORD, multi-core CPU resources cannot be used, single-threaded collection (the designer designed the JVM, was designed for embedded development at the time , I didn’t expect that web development is so popular nowadays, so the designer thinks that STW is not a big problem)
Algorithm: Copy Algorithm

  • ParNew
    ParNew garbage collector is a new generation of multi-threaded garbage collector. It
    can be said that ParNew is only a multi-threaded version of the Serial collector. In other respects, it is almost the same as the Serial features. There are no other innovations.
    Insert picture description here
    ParNew is At present, JVM runs in Server mode and is the preferred new-generation garbage collector for CMS.
    Because in the JDK1.5 version, a cross-age collector for the Old area, CMS (Concurent Mark Sweep), is in
    addition to the single-threaded Serial collector. At present, only the ParNew new-generation collector can work with the CMS collector. There is no way. If you want to choose CMS as the old-generation collector, you can only choose the Serial or ParNew collector to match it. So ParNew is multi-CPU It is preferred in the Server scenario.
    ParNew settings:
    -XX:UseParNewGC forces the ParNew collector to be specified
    -XX:+UseConcMarkSweepGC specifies that the Old area uses the CMS collector and the ParNew collector is set to match it by default
    -XX:ParallelGCThreads specifies the parallel GC of the ParNew collector Number of threads
  • Parallel Scavenge & Parallel Old
    Parallel Scavenge collector is a new-generation collector. It is also a collector that uses a replication algorithm. It is also a parallel multi-threaded collector. The working mechanism is the same as that of our ParNew. See the figure above. But Parallel Scanvenge pays more attention to the throughput of the system.
    System throughput = user working time / user working time + GC time

User working time 99S GC time 1S
throughput = 99/100 = 99%

GC thread working time-
XX:MaxGCPauseMillis

Set the working time of the GC thread / Set the time when the user thread is suspended when the GC is running, and the time is forced to run.
Bring more GC operations

Parallel Old
Parallel Old is Parallel Scavenge's old garbage collector. The mark sorting algorithm used
is the Old area parallel garbage collector that was officially launched in jdk1.6.
So Parallel Scavenge was embarrassing before Parallel Old came out. Because
The combination of Parallel Scavenge + Serial Old is very tasteless. The new generation uses parallel collectors, but the Old area can't keep up.

Parallel Scavenge + Parallel Old schematic diagram
Insert picture description here
CMS (concurrent mark sweep)

  • The CMS garbage collector is the garbage collector in the Old area
  • Is a collector that implements concurrent collection first
  • The design goal is based on the shortest pause time as the design principle. CMS uses the mark-and-sweep algorithm
    CMS's four collection steps
  1. Initial mark-(requires STW)

    The time to only mark the
    STW of the object directly pointed to by GCRoots will be very short, because only mark the object directly associated with GCRoots. No need to continue tracking, the speed is very fast.

  2. Concurrent marking-(GC marking thread works with user thread)

    Scan all old areas. If the scanned object can find GCRoot, it does not need to be cleaned. If
    GCRoot cannot be found, it needs to be cleaned. Based on the process of GCRoots TRACING and marking.
    However, in the concurrent marking process, users may generate new garbage or users Changes in the program operation product

  3. Remark-(requires STW)

    During the correction of concurrent marking, the generation of new garbage or changes
    will take a short time, because the previous concurrent marking phase has already marked all objects based on GCRoots, and correction only needs to correct a small number of objects.

  4. Concurrent cleanup-(GC cleanup thread works with user thread)

    Work together with the user to clean up the work.
    Use the mark-sweep algorithm. This algorithm only needs to clean up the garbage objects, without the need to organize the memory

work process:
Insert picture description here

Advantages:
make full use of CPU resources for concurrent collection and low pause Disadvantages:
1. The mark-sweep algorithm will generate space fragmentation problems. -->promotion failure concurrent mode failed ----> Serial Old
2. Concurrent mark and concurrent cleanup phase due to parallel If you don't go all out to GC work, it will definitely cause GC time to be too long and affect throughput.
3. Incomplete cleaning will produce floating garbage, and floating garbage can only be processed in the next garbage collection.

  • G1 (Garbadge First) garbage priority
    G1 collector is a server-oriented garbage collector, suitable for multi-core processors, large memory capacity (4G, 6G and above) server system. It pursues a short GC pause while achieving a high throughput, and solves the problems of promotion failure and concurrent mode failed in the CMS garbage collector.
    G1 has been launched in the JDK7 version, and JDK9 G1 has become the default garbage collector
    G1 memory divided
    G1 garbage collector heap memory structure is no longer continuous Old Eden S1 S2 up. instead, use Region of distributed memory.
    the whole heap space is divided into a such as a large Region . the default is 2048 Ge Region.
    Region size is generally set to the N power of 2 (1,2,4,8,16,32) from 1M to 32M. The
    parameter specification method -XX:G1HeapRegionSize
    Insert picture description here
    Each Region can be the following four different types of space storage, That is to say, the storage content of these regions can be changed based on the actual situation
    • Eden
    • Survivor (S1 , S2)
    • Old
    • Humongous
      when the entire size of an object is instantiated >= 0.5 Region will pick up the object separately to open up a new Region storage. And call the area H area. If the size of the entire object> 1 Region, it will open up Multiple contiguous regions are used to save objects. This contiguous region is marked as H zone.
      It may run for a period of time as follows:
      Insert picture description here
      G1 garbage collection process
    • Young GC is
      based on Region's replication algorithm. Multi-threaded parallel collection of Eden and S Region (this process requires Stop the world)
      Insert picture description here
      Insert picture description here

    • Before Mixed GC officially explains the Mixed GC process of G1, let's first popularize a few concepts.
      RememberSet will open up an extra space in each Region to store the Region
      Root Region referenced by externally. The Region where the GCRoot object is located.
      Insert picture description here
      • Initial Mark (Stop The World)-The initial mark
        relies on the Stop The Word phase triggered by Yong GC to scan for GCRoots. Find out all Root Regions; this phase does not do
        the GCRoot Tracing process, and is accompanied by YoungGC's Stop The World Process so efficient
      • Root Region Scanning (Concurrent Phase)
        -Survival Region Scanning Based on the process of GCRoot Tracing, find the Region set that needs to be scanned through the Remember Set of each Region.
        Concurrent Marking (Concurrent Phase)-Concurrently mark
        the scanned Region obtained from the Root Region Scanning phase In, distinguish live objects and garbage objects and mark them.
      • Remark(Stop The World)
        -Remarking In the Concurrent Marking phase, the user and the marking thread are processed concurrently. Therefore, re-correction work is required at this stage
      • Clean up (Stop The World)-Clean up is the copy algorithm used. So it needs Stop The world operation.
        G1 uses the priority sorting method for garbage collection. Priority collection takes less time and regions with high garbage ratio

Why use G1GC

  • The use of G1 no longer needs to pay attention to the ratios of Old, Young, eden s1 s2 so that the program can adaptively use each Region

  • Pursue controllable response time
    -XX:MaxGCPauseMillis must end the garbage collection mechanism within 200 -200ms.
    G1 uses a priority sorting method for garbage collection. Priority collection takes less time and has a high garbage ratio Region
    -XX+UseG1GC

  • ZGC (jdk14, Meituan seems to be using it)

    • Pause time within 10ms
    • Support TB level memory
    • After the heap memory becomes larger, the pause time is still within 10ms

Classification of garbage collectors

Insert picture description here
Insert picture description here

  • Serial collectors Serial and Serial Old

Only one garbage collection thread can execute, and the user thread is suspended.

  • Parallel collector [in pursuit of throughput]->Parallel Scanvenge, Parallel Old

Multiple garbage collection threads work in parallel, and user work pauses and waits.

  • Concurrent collector [pause time priority]->CMS, G1

The user thread and the garbage collection thread execute concurrently, and the garbage collection thread does not suspend the operation of the user thread during execution.

Next: Tuning tools for learning the underlying principles of JVM (5)

Guess you like

Origin blog.csdn.net/nonage_bread/article/details/108170036