JVM in action: three-color notation

insert image description here

Some processes of the garbage collection process

Which objects are garbage?

When we perform garbage collection, we first need to determine which objects are alive?

There are two commonly used methods:

  1. reference counting
  2. accessibility analysis

Python's algorithm for determining object survival uses reference counting, while Java uses reachability analysis.

Objects reachable through GC ROOT cannot be recycled, while unreachable objects can be recycled. The path traversed by the search is called the reference chain

Unreachable objects will be marked twice. If they are unreachable through GC ROOT, they will be marked for the first time. If the finalize() method needs to be executed, the object will be put into a queue to execute finalize(), and if it is successfully associated with other objects on the reference chain in the finalize() method, the collection of recyclable objects will be removed ( Generally you do not recommend that you use the finalize method ), otherwise it will be recycled

Common GC ROOTs are as follows

  1. Objects referenced in the virtual machine stack (local variable table in the stack frame)
  2. The object referenced by the static property of the class in the method area
  3. The object referenced by the constant in the method area
  4. Objects referenced by JNI (Native methods) in the native method stack

Looking at it this way, there are many GC ROOTs in the program. Every garbage collection has to analyze the reference chain of GC ROOTs, which takes a long time. Is it possible to reduce the GC ROOTs for each scan?

Generational and cross-generational references

In fact, most of the current virtual machines are designed according to the "generational collection" theory, and its implementation is based on two generational hypotheses

  1. Most objects are ephemeral
  2. Objects that survive multiple garbage collections are harder to die

Therefore, the heap is generally divided into the young generation and the old generation. The GC for the young generation is called MinorGC, and the GC for the old generation is called OldGC. But there is a problem after generation. In order to find the surviving objects of the new generation, we have to traverse the old generation, and vice versa.
Please add image description
When performing MinorGC, if we only traverse the new generation, then ABCD can be determined as the surviving object. But E will not be judged as a live object, so there will be problems.

In order to solve this kind of cross-generational referenced objects, the most stupid way is to traverse the objects of the old generation to find out these cross-generational referenced objects. But this method has a great impact on performance

Then we have to mention the third hypothesis

Cross-generational citations are relatively rare compared to contemporaneous citations.

According to this hypothesis, we do not need to scan the entire old generation for a small number of cross-generational references. In order to avoid the performance overhead of traversing the old generation, the garbage collector will introduce a memory set technology, which is a table used to record cross-generation references

For example, the memory set of the new generation preserves the reference relationship held by the old generation in the new generation

Therefore, when performing MinorGC, you only need to add the memory area containing cross-generational references to GC ROOT to scan together.

card table

Earlier we mentioned that the garbage collector uses memoized sets to record cross-generational references. In fact, you can understand the memory set as an interface, and the card table as an implementation, analogous to Map and HashMap.

In its simplest form, the card table can be just a byte array, and the HotSpot virtual machine does exactly that. The following line of code is HotSpot's default card table marking logic:

CARD_TABLE [this address >> 9] = 0;

Please add image description
HotSpot uses an array element to store the corresponding memory address and there are cross-generational reference objects (from the right shift of this address by 9 bits, it can be seen that each element maps 512 bytes of memory)

When the value of the array element is 0, it indicates that the corresponding memory address does not have a cross-generational reference object, otherwise it exists (called this element in the card table is dirty)

How to update the card table?

The process of making the card table elements dirty, HotSpot is implemented by the write barrier , that is, when other generation objects refer to the current generation object, the card table is updated in the reference assignment stage. The specific implementation method is similar to AOP

void oop_field_store(oop* field, oop new_value) {
    
     
// 引用字段赋值操作
*field = new_value;
// 写后屏障,在这里完成卡表状态更新 
post_write_barrier(field, new_value);
}

three-color marking

Execution ideas

How to judge that an object is reachable? This has to mention the three-color marking method

White: All objects are white at the beginning of the traversal
Gray: Accessed by the garbage collector, but at least one reference is not accessed
Black: Accessed by the garbage collector, and all references to this object are accessed , is a safe surviving object (GC ROOT will be marked as black)

Please add image description
The above figure is taken as an example, the execution flow of the three-color marking method is as follows

  1. First mark the objects B and E referenced by GC ROOT as gray
  2. Then mark the objects A, C and F referenced by B and E as gray, and at this time mark B and E as black
  3. And so on, the object that is finally marked as white needs to be recycled

three-color notation problem

The enumeration of the root node of the reachability analysis algorithm must be analyzed in a snapshot that can guarantee consistency, so the user thread (Stop The World, STW) needs to be suspended. With the blessing of various optimization techniques, the pause time has been very long. short.

In the process of scanning from the root node, STW is not required, but some problems can also occur. Since the garbage collection thread and the user thread are running all the time, the reference relationship will change.

  1. Objects that should be reclaimed are marked as not to be reclaimed
  2. Objects that should not be reclaimed are marked as should

insert image description here
The first case has little impact, and it can be recycled later. But the second case will cause a fatal error

So after research, it has been shown that the second situation will only occur if both conditions are met at the same time.

  1. One or more black-to-white object references are inserted
  2. Removed all references to grey to white objects

As shown in the figure below
insert image description here
To solve this problem, we can break either of the 2 conditions, resulting in 2 solutions, incremental update and original snapshot . CMS uses incremental updates, G1 uses raw snapshots

What the incremental update needs to destroy is the first condition . When the black object is inserted into a new reference relationship pointing to the white object, the newly inserted reference will be recorded, and after the concurrent scan is over, the recorded reference relationship will be recorded. The black object in the root is the root, and it is scanned again. This can be simplified as, once the black object has newly inserted a reference to the white object, it turns back to the gray object

insert image description here
What the original snapshot needs to destroy is the second condition . When the gray object wants to delete the reference relationship to the white object, the reference to be deleted is recorded. The gray object is the root, and it is scanned again. This can also be simplified to mean that no matter whether the reference relationship is deleted or not, the search will be performed according to the snapshot of the object graph at the moment when the scan is just started.

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/123383605