Rather than being the same, it’s better to understand how the three-color mark handles the problem of missing mark in an article.

Mark removal algorithm

Before the three-color mark, there is an algorithm called the mark removal algorithm. This algorithm will set a flag to record whether the object is used. At the beginning, all the objects are marked as 0, and the root reachability analysis algorithm finds that the object is alive, it will be set to 1. Step by step, it will show a tree-like structure. After the marking step is completed, the unmarked objects will be cleaned up, and all the unreclaimed marks will be reset to 0 again, and the execution will be repeated.

The biggest problem with this algorithm is that GC needs to completely suspend the entire program during execution, and cannot perform GC operations asynchronously. Because the flag bits 0 and 1 of the flag removal algorithm at different stages have different meanings, if the processing is performed concurrently, the newly added live objects may be additionally deleted (all objects are 0 at the beginning).

For systems with high real-time requirements, this type of mark that needs to be suspended for a long time is unacceptable when it is cleared. At this time, an algorithm is needed to avoid this STW phenomenon. This algorithm is the three-color marking method.

Three-color marking algorithm

The biggest advantage of the three-color mark is that it can be executed asynchronously, so that GC can be performed at the cost of minimal interruption time or no interruption at all.

Three-color mark, see the word to know the meaning. An algorithm that uses three colors to mark objects.

*Black: *The root object, as well as the object and its child objects have been scanned.

*Gray* : The object itself is being scanned, and its sub-objects have not been scanned yet.

*White* : Objects that have not been scanned. If all objects are still white after scanning, it is judged that the root is unreachable and it is garbage.

Missing label problem

In the process of three-color marking, the marking thread and the user thread are executed concurrently, so it is possible that in our marking process, the user thread modifies the reference relationship and incorrectly marks the object that should be recycled as alive. (To put it simply, the GC has marked black objects, and the user thread reference chain is broken during the concurrent process, resulting in white objects that should actually be garbage but still black, that is, floating garbage). What about the garbage generated at this time? The answer is not to do, leave it to the next garbage collection process.

The problem of missing labeling means marking the garbage that should have survived as death. This can lead to very serious errors. So how does this kind of garbage come about?

image.png

On the way, the object A is marked as black. At this time, the two objects B and C referenced by it are in the marked gray stage. At this time, the user thread deletes the reference relationship between B->D, and establishes a reference between A->D. At this time, object B has not yet been scanned, and object A has already been scanned, so it will not continue to scan. Therefore, the D object will be treated as garbage.

Summary: There are two necessary conditions for the occurrence of the under-label problem:

1: At least one black object points to a white object after being marked.

2: The direct or indirect references from gray objects to white objects are deleted.

Is the newly added object counted as a missing bid problem?

image.png

The answer is: it doesn't count.

I don't know if you have any doubts when learning the three-color marking. Why do you have to delete the direct or indirect references from gray objects to white objects? Then why is this kind of directly added object not judged as missing by the three-color mark?

TAMS

To achieve concurrent operation with GC users, it is necessary to solve the allocation of new objects during the collection process (otherwise, after a GCRoots mark is completed, the new objects will be treated as garbage collection), so G1 sets two names in each Region For the TAMS pointer, a part of the space is allocated in the Region to record new objects in the concurrent collection process. They are considered to be alive and not included in the garbage collection. Therefore, when we analyze the issue of under-label, we can strictly abide by the two conditions caused by the issue of under-label.

Solve the problem of missing labels

Therefore, we need to break any of these conditions to deal with the problem of missing bids. For the above problems, CMS and G1 each adopt a set of solutions to deal with.

CMS---incremental update

image.png

Incremental updates break the first condition. When A inserts a new reference relationship D, it records the A object of the inserted reference record. Wait for the re-marking stage after the scan is over, and then turn the recorded objects that reference the new object (as shown in Figure A above) to gray again. It will scan A again.

image.png

image.png

image.png

It can be seen from the figure that this method can be executed again to deal with missing D objects. *But B and C objects belong to the objects that are scanned repeatedly. *

 

G1---SATB

In the process of learning JVM, I believe that everyone has heard of incremental updates and found it easy to understand. However, SATB is difficult to understand. (Anyway, the blogger only had some experience after N brushing the JVM).

We know that STAB breaks Condition 2: It deletes direct or indirect references from gray objects to white objects.

Case 1: Break the direct reference B->D of the gray reference

image.png

Case 2: Break the indirect reference D->E of the gray reference.

image.png

In either case, as long as the white objects belonging to the B object reference chain become garbage. The reason why STAB is called a snapshot at this time is that it records the snapshot before the reference chain is broken, and records object B.

When it comes to the final marking stage, the GC thread will turn the B object into GCRoots. Regardless of whether the subsequent objects are alive or not, they will all be scanned according to the survival state in the snapshot. This way the entire reference chain will be preserved.

It is not difficult to analyze that SATB adopts conservative principles . If it is found during the concurrent marking process that the object after the gray object becomes garbage * (because once an object becomes garbage, it may be rescued by other black objects that are confirmed to be alive). * So all objects in the entire reference chain are judged to be alive.

image.png

Here, the gray object is rescanned by GCRoots, which can minimize the cost of rescanning with original GCRoots. It can be seen that SATB's processing strategy is somewhat like the floating garbage processing strategy.

to sum up

CMS pays attention to adding new objects, but it will turn the previous reference of the new object to gray and re-traverse all the sub-objects, which will cause a certain performance loss. The breaking of G1 references does not require re-traversal of the object, so the efficiency is higher. But maintaining snapshots and its conservative strategy add a certain amount of memory cost.

Guess you like

Origin blog.csdn.net/weixin_47184173/article/details/113622421