Garbage collection mechanism and garbage collector in JVM

1. What is garbage collection?

One of the advantages of Java compared to C and C++ languages ​​is that it comes with a garbage collector. Garbage collection refers to cleaning up unreachable objects in the heap memory from time to time. Unreachable objects will not be directly recycled immediately. The execution of the garbage collector in a Java program is automatic and cannot be forced. The only thing the programmer can do is to suggest the execution of the garbage collector by calling the System.gc method. , but whether it can be executed and when it will be executed are unknown. This is also the main disadvantage of the garbage collector. Of course, this shortcoming is outweighed by the great convenience it brings to programmers.

2. Why is garbage collection needed?

If garbage collection is not performed, the memory will be consumed sooner or later because we are constantly allocating memory space without recycling it. Unless the memory is infinite, we can allocate it arbitrarily without recycling it, but this is not the case. Therefore, garbage collection is necessary.

3. Four reference types in java

  • strong reference
Object obj = new Object(); //只要obj还指向Object对象,Object对象就不会被回收
obj = null;  //手动置null,可以通过System.gc方法进行回收处理

We generally declare objects as references generated by the virtual machine. In a strong reference environment, garbage collection needs to strictly determine whether the current object is strongly referenced. If it is strongly referenced, it means that it is not garbage and will not be garbage collected.

  • Soft references
    Soft references are generally used as caches. The difference from strong references is that when soft references are garbage collected, the virtual machine decides whether to recycle the soft references based on the remaining memory of the current system. If the remaining memory is relatively tight, the virtual machine will reclaim the space referenced by the soft reference; if the remaining memory is relatively rich, it will not be reclaimed. In other words, when an OutOfMemory occurs in the virtual machine, there must be no soft references.

  • Weak references
    Weak references are also used to describe non-essential objects, but their strength is weaker than soft references. Objects associated with weak references can only survive until the next GC occurs. When the garbage collector works, regardless of the current memory Whether it is sufficient or not, objects of this type will be recycled . Weak reference objects will be put into the reference queue (ReferenceQueue) when they are recycled.

  • Virtual reference
    Virtual reference is called a ghost reference or phantom reference, which is the weakest kind of reference relationship. Whether an object has a virtual reference has no impact on its lifetime, and object instances cannot be obtained through virtual references. It may be recycled at any time. It is generally used to track the activities of objects being recycled by the garbage collector and serves as a sentinel. Must be used in conjunction with ReferenceQueue.

These concepts may be a bit abstract, but it should be clear that these four types of references have different garbage collection timings. We will find that the strength of the reference decreases in order from strong, soft, weak, and virtual. The objects referenced by the later references are more likely to be garbage collected.

4. Garbage identification mechanism

4.1 Reference counting algorithm

The reference counting algorithm is one of the algorithms for determining whether an object is alive: it adds a reference counter to each object. Whenever there is a reference to it, the counter value is increased by 1; when the reference expires, the counter value is decremented by 1; any Objects with a time counter of 0 cannot be used and will be recycled by the garbage collector.

shortcoming:

  • It requires a separate field to store the counter, which increases the cost of storage space.
  • The problem of objects reducing circular references to each other cannot be solved. That is, when two objects are cyclically referenced, the reference counters are both 1. When the object cycle ends, it should be recycled but cannot be recycled, causing a memory leak.
public class GcTest {
    
    
    public static void main(String[] args) {
    
           
      MyObject myObject_1 = new MyObject();       
      MyObject myObject_2 = new MyObject();                
      myObject_1.instance = myObject_2;        
      myObject_2.instance = myObject_1;
      myObject_1 = null;        
      myObject_2 = null;  
      System.gc();    
      }  
      // 对象循环引用,用引用计数算法时,无法回收这两个对象         

4.2 Reachability analysis algorithm

Currently, the mainstream use of reachability analysis algorithms is to determine whether an object is alive. The basic idea of ​​the algorithm: take "GC Roots" as the starting point of the object, start searching downward from this node, and the path traveled by the search becomes the reference chain (Reference Chain). When an object has no reference chain connected to GC Roots, it is proved that This object is unavailable.

  • What objects can serve as GC Roots?

Objects referenced in the virtual machine stack (local variable table in the stack frame);
objects referenced by class static properties in the method area;
objects referenced by constants in the method area;
objects referenced by JNI (Native methods) in the local method stack;
active threads reference object.

5. finalize() gives the object rebirth

finalize() is a protected method of Object. Subclasses can override this method to implement resource cleanup. The GC calls this method before recycling the object.

Objects that are marked as unreachable in the reachability analysis algorithm may not necessarily be recycled. They have a second chance to be reborn. Each object must be marked twice before being recycled. Once, if there is no associated reference chain, it will be marked. The second time, it will be judged whether the finalize() method is covered by the object. If it is not covered, it will really be sentenced to death. .

Code

public class FinalizeTest {
    public static FinalizeTest ft;
    /**
     * 用于判断对象是死亡还是存活
     */
    public static void judge(){
        if(ft == null){
            System.out.println("i am dead");
        }else{
            System.out.println("i am alive");
        }
    }
    public static void main(String[] args) throws InterruptedException {
        ft = new FinalizeTest();
 
        // 将引用指向null,那么对象就没有任何关联了
        ft = null;
        // 触发一次gc
        System.gc();
        // 因为Finalizer线程的优先级低,因此sleep 1秒后再看结果
        Thread.sleep(1000);
        //因为FinalizeTest对象覆盖了finalize方法,并在该方法中重新建立与引用的关联,所以对象会复活
        judge();
        //下面的代码和上面的一模一样,但是对象不会再复活了,因为finalize方法最多执行一次
        ft = null;
        System.gc();
        Thread.sleep(1000);
        judge();
    }
 
    @Override
    protected void finalize() throws Throwable {
        super.finalize();
        System.out.println("执行finalize方法");
        // 对象复活的关键:重新建立与引用的关联
        ft = this;
    }
}

6. Four Garbage Collection Algorithms

After confirming the garbage, you must find a way to recycle the garbage. There are four main methods for recycling garbage: mark and clear algorithm, mark sorting algorithm, copy algorithm, and generational collection algorithm.

6.1 Mark and clear algorithm

Algorithm idea: The algorithm is divided into two steps: "marking" and "cleaning". First, mark all objects that need to be recycled. After the marking is completed, all marked objects will be recycled uniformly.

defect:

  • Both processes of marking and cleaning are inefficient;
  • Memory fragmentation is prone to occur, and too much fragmented space may prevent large objects from being stored.

Applicable to situations where there are a majority of surviving objects.

6.2 Marking sorting algorithm

Algorithm idea: The marking process is the same as the mark-clean algorithm, but the latter one is different. It moves all surviving objects to one end, and then directly cleans up the memory outside the end boundary.

6.3 Replication Algorithm

Algorithm idea: Divide the available memory into two blocks of equal size, and only use one of them at a time. When this block of memory is used up, copy the surviving objects to another block, and then clean up the used memory space at once.

Disadvantage:
The available memory is reduced to half of the original size
. The algorithm has high execution efficiency and is suitable for situations where there are a small number of surviving objects.

6.4 Generational collection algorithm

Most current garbage collection uses the generational collection algorithm. This algorithm does not have any new ideas. It just divides the memory into several blocks according to the different life cycles of the objects, and each block uses a different above-mentioned algorithm to collect. Before jdk8, it was divided into three generations: young generation, old generation, and permanent generation. After jdk8, the term permanent generation was canceled and replaced by metaspace.

In the new generation, each time the garbage collector finds that a large number of objects have died and only a few survive. Using the copy algorithm, the collection can be completed only by paying the copy cost of a small number of surviving objects.
Old generation
: In the old generation, because the object survival rate is high and there is no extra space to guarantee its allocation, the "mark-clear-compression" algorithm must be used for recycling.

7. Garbage collector

If
the collection algorithm is the methodology of memory recycling, then the garbage collector is the specific implementation of memory recycling. Although the various collectors are compared, the purpose is not to single out a best collector. Because until now, there has not been a most useful garbage collector, let alone a universal garbage collector. All you can do is to choose a garbage collector that suits you according to the specific application scenario. Just think about it: if there was a perfect collector that was applicable in any scenario, then the Java virtual machine would not implement so many different garbage collectors.

Some classic collectors (serial, parnew, CMS, G1)

7.1 New generation collector

7.1.1Serial collector
  • Single thread work
  • Stop The World
  • Client mode default collector
  • Simple and efficient
  • mark-copy algorithm
  • For environments with limited memory resources, it consumes the least additional memory among all collectors;
  • It’s basically no longer applicable
  • Client mode default collector
7.1.2ParNew collector is a concurrent version of Serial collector
  • Multi-threaded parallel GC is possible
  • Relatively high dependence on cpu
  • Replication algorithm
  • *The ParNew collector will not have a better effect than the Serial collector in a single-core processor environment
  • *Parallel processing
    Parallel (Parallel): Parallel describes the relationship between multiple garbage collector threads, indicating that there are multiple such threads working together at the same time. Usually the user thread is in a waiting state by default at this time.
    Concurrent: Concurrency describes the relationship between the garbage collector thread and the user thread, indicating that both the garbage collector thread and the user thread are running at the same time. Since the user thread is not frozen, the program can still respond to service requests. However, since the garbage collector thread occupies some system resources, the application's processing throughput will be affected to a certain extent.
7.1.3Parallel Scavenge Collector
  • The focus is on throughput, also known as throughput-first processors
    • *Throughput is the ratio of the time the processor spends running user code to the total time consumed by the processor, that is, throughput = time running user code / (time running user code + garbage processing time)
  • adaptive adjustment strategy
    • *Multi-threaded collector for parallel collection
    • *Garbage cleaning via copy algorithm
    • *The virtual machine collects performance monitoring information based on the current operating conditions of the system, and dynamically adjusts these parameters to provide the most appropriate pause time or maximum throughput.
    • *Suitable for analysis tasks that operate in the background and do not require too much interaction. Excessive pauses may affect the interactive experience.

7.2 Collector for the old generation

7.2.1Serial Old
  • *Serial Old collector Serial old version
  • *Single-threaded
  • *mark-collation algorithm
  • *For use by HotSpot virtual machines in client mode
  • *If in server mode, it may also have two uses:
  • *One is used with the Parallel Scavenge collector in JDK 5 and previous versions
  • *The other is as a backup plan when the CMS collector fails, and is used when Concurrent Mode Failure occurs in concurrent collection.
7.2.2Parallel Old collector
  • tag sorting algorithm
  • Focus on throughput or situations where processor resources are scarce
  • Parallel multi-threading
7.2.3CMS Collector
  • Minimum recycling pause time
  • Mark and sweep algorithm
  • four steps
    • The initial mark
      Stop The World
      marks the objects that GC Roots can directly associate with, which is very fast;
    • Concurrent marking is
      the process of traversing the entire object graph starting from the directly associated objects of GC Roots. This process takes a long time but does not require pausing the user thread and can be run concurrently with the garbage collection thread.
    • Re-mark
      the mark records of that part of the objects that Stop The World has changed due to the continued operation of the user program. The pause time in this phase is usually slightly longer than the initial marking phase, but it is also much shorter than the concurrent marking phase.
    • Concurrent
      cleanup cleans up and deletes dead objects judged in the marking phase. Since there is no need to move surviving objects, this phase can also be concurrent with the user thread.
  • Concurrent collection, low pause
  • Relatively high requirements for processor resources
  • Unable to handle floating garbage, a "Con-current Mode Failure" failure may occur, which may lead to another complete "Stop The World" Full GC.
  • A large amount of memory fragmentation also needs to be unified and sorted with the mark-clear algorithm.
7.2.4Garbage First (G1) collector
  • Mainly for the server side,
    the server side has a small delay. But when the function is to calculate data, there is nothing wrong with waiting for 30 seconds before calculating. At this time, throughput priority is more appropriate. The client side is generally used by people, so naturally It is good for low latency
    *G1 is a mark-compact algorithm as a whole and a mark-copy algorithm locally, which means no space fragmentation will be generated.

  • Predictable pause time garbage collector

  • It is the memory layout form of Region.
    G1 no longer insists on a fixed size and a fixed number of generational region divisions. Instead, it divides the continuous Java heap into multiple independent regions (Regions) of equal size. Each Region can play a role as needed. Eden space, Survivor space, or old generation space in the new generation.
    According to different types of Regions, choose different strategies for collection management.

  • It is also a generation theory, but it is no longer divided into old generation and new generation.

  • Can manage the entire heap memory

  • In four steps, the world stops for a shorter time

    • Initial mark
      *Stop The World, make another short pause for the user thread
      *Mark the objects that GC Roots can be directly associated with, and modify the value of the TAMS pointer so that when the user threads run concurrently in the next stage, they can be correctly available Allocate new objects in the Region.
      *This process requires pausing the thread, but it is completed synchronously when the new generation collects the thread, so there is actually no additional pause.
    • Concurrent marking
      * starts from the GC Root to perform reachability analysis on the objects in the heap, and recursively scans the
      object graph in the entire heap to find the objects to be recycled. This stage takes a long time, but it can be executed concurrently with the user program. After the object graph scan is completed, the objects recorded by SATB that have reference changes during concurrency must be reprocessed.
    • The final mark
      *Stop The World
      * is used to handle the last few SATB records left after the concurrent phase ends.
    • Screening and recycling
      *Stop the world
      * is responsible for updating the statistics of Regions, sorting the recycling value and cost of each Region, and formulating a recycling plan based on the pause time expected by the user. You can freely select any number of Regions to form a recycling collection and then make the decision
      . The surviving objects of the recovered Region are copied to the empty Region, and
      then all the space of the entire old Region is cleared.
      The operation here involves the movement of surviving objects, and the user thread must be paused and completed in parallel by multiple collector threads.
      *The biggest advantage of G1 compared to PS/PSold
      is that the pause time is controllable and predictable
      *Revenue priority
      The pause prediction model of the G1 collector is based on the theory of attenuation mean, which means that it is judged based on this theory. higher returns.
      Which piece of memory stores the largest amount of garbage and has the greatest recycling benefits? Recycle that
      *Humongous area, which is specially used to store large objects. It is regarded as the old generation by default.
      If the large object exceeds one region, multiple Humongous areas will be used. Region and storage

*G1 is a mark-compact algorithm as a whole and a mark-copy algorithm locally, which means no space fragmentation will be generated.

https://blog.csdn.net/m0_59879385/article/details/127516655

Guess you like

Origin blog.csdn.net/qq_39813400/article/details/129530528