It’s outrageous. Last night, I dreamed that the interviewer asked me about the three-color marking algorithm.

Table of contents

One day, a secret interview was being held in an ordinary room on the planet Java:

Interviewer: Let’s start with the basics of JVM. Do you understand the three-color marking algorithm?

Me: Uh... I don't know.

Interviewer: Remember to close the door when you go out.


Nowadays, Java interviews are really getting more and more complicated. If you come up directly and ask about the principles, you will be confused.

In the last article, we talked about memory sets. In this article, we will talk about the " three-color marking algorithm ", which is also a frequent visitor in Java interviews. Having a good conversation will make the interviewer think that you are a young man who has something good.

Three-color labeling algorithm

Since it is called the three-color marking algorithm, first we need to figure out which three colors it is. The three colors are: black, white, and gray.

Mark the objects encountered during the reachability analysis process of traversing the object graph into the following three colors according to the condition of " whether they have been visited ":

  • White : Indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all objects are white. If they are still white at the end of the analysis, it means they are unreachable.
  • Black : Indicates that the object has been accessed by the garbage collector and all references to this object have been scanned. The black object represents that it has been scanned and it is safe to survive. If there are other object references pointing to the black object, there is no need to scan it again. It is impossible for a black object to point directly (without passing through a gray object) to a white object.
  • Gray : Indicates that the object has been accessed by the garbage collector, but there is at least one reference on this object that has not been scanned.

The book "In-depth Understanding of the Java Virtual Machine" has a very good picture of this area. It is clear at a glance. Just go to the original picture:

From the above paragraph, let us extract the key points:

  • In the initial stage, the GC Root is black and all objects are white. If it is still a white object at the end of the analysis, it means it is unreachable.

  • If there are other object references pointing to the black object, there is no need to scan it again.

  • It is impossible for a black object to point directly to a white object.

Let me explain the second and third points a little bit to you.

I drew a schematic diagram above. The first and second drawings are correct, but the third drawing is wrong.

Let’s analyze the second point first. If there are other object references pointing to the black object, then the object can only be gray or black , and naturally there is no need to scan it again.

Then let’s talk about the third point. It is impossible for a black object to point directly to a white object.

We can see from the above that the definition of a black object is: " All references to the object have been scanned ", while the white object is: " The object has not been accessed by the garbage collector ."

So here comes the problem. If the black object points directly to a white object, then it conflicts with the definition of the black object.

Because the white object has not been accessed yet, how can it be considered that all references have been scanned, so it cannot be black.

The above is very important. After understanding this thoroughly, let's look at some problems with the three-color marking algorithm:

Since some garbage collectors have concurrent garbage collection threads and user threads (such as the concurrent phase of CMS), there are two problems with three-color marking:

  • One is to mistakenly mark the originally dead object as alive. This is not a good thing, but it is actually tolerable. It just generates a little floating garbage that escapes this collection. Just clean it up next time. It is not a big problem. .
  • The other is to mistakenly mark an originally living object as dead. This is a very fatal consequence, and the program will definitely cause errors.

The first point is innocuous, so we focus on solving the problem on the second point.

It was theoretically proven in 1994 that " if and only if the following two conditions are met at the same time ", the problem of " object disappearance " will occur , that is, objects that should be black are mistakenly marked as white:

  • The evaluator inserts one or more new references from the black object to the white object.
  • The evaluator removes all direct or indirect references from the gray object to the white object.

In fact, to put it bluntly, it is: " Disconnect from the gray object and establish a connection to the black object ."

Therefore, if we want to solve the problem of object disappearance during concurrent scanning, we only need to destroy either of these two conditions.

This resulted in two solutions: " Incremental Update (Incremental Update) " and " Original Snapshot (Snapshot At The Beginning, SATB) ".

Both solutions break one condition each.

Incremental update

Incremental updates break the first condition.

When the black object inserts a new reference relationship pointing to the white object, the newly inserted reference is recorded. After the concurrent scan is completed, the black object in these recorded reference relationships is used as the root and scanned again.

This can be simplified to understand that once the black object has a new reference to the white object, it changes back to a gray object.

This is actually a bit like the idea of ​​OopMap discussed before. The essence is to maintain a mapping relationship. When the scan is completed, the mapping relationship is scanned again without a global scan.

As shown in the figure, record the newly inserted reference relationship. After the scan is completed, use the black object 1 in the recorded reference relationship as the root, scan again, and it will be OK.

original snapshot

What the original snapshot breaks is the second condition.

When the gray object wants to delete the reference relationship pointing to the white object, the reference to be deleted is recorded. After the concurrent scan is completed, the gray object in the recorded reference relationship is used as the root and scanned again.

This can also be simplified to understand that no matter whether the reference relationship is deleted or not, the search will be based on the " object graph snapshot " at the moment when the scan just started, hence the name " original snapshot ".

As shown in the figure, record the deleted reference relationship. After the scan is completed, use the gray object 2 in the recorded reference relationship as the root, scan again, and it will be OK.

So there is a question. Both incremental updates and original snapshots need to record reference relationships. So when does this recording time point occur?

I don’t know if you still remember the “ write barrier ” mentioned before . Yes, that’s right.

Whether it is an incremental update or a raw snapshot, the recording operation of the virtual machine is implemented through a write barrier.

Write barriers, which we introduced before when talking about memory sets and card tables, can be understood as AOP in Spring. So far, the maintenance of card table status, incremental updates, and original snapshots are all based on write barriers.

In addition, for extracurricular expansion, CMS uses incremental updates, while G1 uses original snapshots.

That’s it for this article. The length of this article may be a bit short, but it can explain things clearly. Then let’s talk about the garbage collector. Let’s look forward to it.

I write the article carefully, if you have gained something, I hope you can give it a like and encourage me, thank you.

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/132542369