Detailed explanation of the three-color marking algorithm for accessibility analysis

This article is written by the blogger based on his own understanding. There may be some differences from the specific implementation of the virtual machine. The three-color mark is a basic algorithm, and different products may have different implementations.

1. Three-color marking algorithm

  It is mentioned in the CMS garbage collector that garbage objects generated during the concurrent cleanup phase of CMS will be treated as floating garbage and will be cleaned up in the next GC. In fact, in the concurrent marking phase, because the user threads are running concurrently, it may also cause the reference relationship to change, resulting in inaccurate marking results, which will cause more serious problems. These changed data will be processed in the remarking phase. What's the problem? How is it handled?
  The basis of the CMS algorithm is to find surviving objects through reachability analysis, and then mark the surviving objects. Finally, when cleaning up, if an object is not marked, it means that the object is unreachable and needs to be cleaned up. Marking algorithm It is the three-color marker used. The concurrent marking phase is the process of enumerating from the objects directly associated with the GC Root.
  For the three-color marking algorithm, objects are divided into three colors based on whether they have been visited (that is, whether they have been checked during the accessibility analysis process): white , gray, and black :

  • White : This object has not been visited yet. In the initial stage, all objects are white, and all objects that are still white after enumerating will be cleaned up as garbage objects.
  • Gray : This object has been visited, but at least one of the objects directly referenced by this object has not been visited, indicating that this object is being enumerated.
  • Black : The object and all objects directly referenced by it have been visited. Here, as long as it has been visited, for example, A only refers to B, and B refers to C and D, then as long as both A and B have been visited, A is black, even if the C or D referenced by B has not been visited yet, at this time B is gray.

  According to these definitions, we can draw:

  • In the initial stage of reachability analysis, all objects are white. Once this object is accessed, it becomes gray. Once all directly referenced objects of this object have been accessed (or no other objects are referenced), then it becomes black
  • After the initial mark, the GC Root node turns black (GC Root will not be garbage), and the objects directly referenced by GC Root turn gray
  • Under normal circumstances, if an object is black, then the directly referenced object is either black or gray, and cannot be white (if a black object directly references a white object, it means that the label is missed, which will lead to The object is deleted by mistake, and how to solve it will be introduced later). This feature can also be said to be a prerequisite for the correctness of the three-color marking algorithm.

  The general flow of the algorithm is (all objects in the initial state are white):

  1. First, we start enumerating from GC Roots, and all their direct references become gray, and themselves become black. Imagine that there is a queue for storing gray objects, and these gray objects will be placed in this queue
  2. Then take a gray object from the queue for analysis: turn all direct references of this object to gray, put it in the queue, and then the object becomes black; if the gray object taken out is not directly referenced, then it directly becomes black
  3. Continue to take out a gray object from the queue for analysis, the analysis steps are the same as the second step, and repeat until the gray queue is empty
  4. Objects that are still white after the analysis is completed are unreachable objects and can be cleaned up as garbage
  5. Last reset flag state

  The previous descriptions are relatively abstract. Here is an example to illustrate. Assuming that there are the following references:
Insert picture description here
  First, all direct references (A, B, E) of GC Root become gray and put in the queue, and GC Root becomes black. :
Insert picture description here
  Then take out a gray object from the queue for analysis, such as take out the A object, turn its direct references C and D to gray, put it in the queue, and turn the A object into black:
Insert picture description here
  continue to take out a gray object from the queue, such as take out B object, turn its direct reference F to gray, put it in the queue, B object becomes black:
Insert picture description here
  continue to take out a gray object E from the queue, but the E object is not directly referenced, it becomes black:
Insert picture description here
  similarly take out C, D and F objects, none of them are directly referenced, so they become black (here I won’t draw them one by one):
Insert picture description here
  The analysis here is over, and one G object is white, which proves that it is a garbage object and is not accessible. , Can be cleaned up.

Second, the problems caused by concurrent marking

  If the entire marking process is STW, then there is no problem, but in the process of concurrent marking, the user thread is also running, then the object reference relationship may change, which leads to two problems.

2.1 Non-garbage becomes garbage

  For example, we return to this state in the above process:
Insert picture description here
  At this time, the E object has been marked as black, indicating that it is not garbage and will not be cleared. At this time, a user thread disconnects the association between GC Root2 and E object (such as xx.e=null;): The
Insert picture description here
  following figure does not need to be drawn. Obviously, the E object becomes a garbage object, but because it has been If it is marked as black, it will not be deleted as garbage, and it can also be called floating garbage .

2.2 Garbage becomes non-garbage

  If the floating garbage mentioned above does not matter to you, even if it is not cleaned this time, the next GC will be cleaned up, and the so-called floating garbage will be generated during the concurrent cleaning phase, which has little effect. But if a garbage becomes non-garbage, the consequences will be more serious. For example, we return to this state in the above process:
Insert picture description here
  the next step of marking is to remove the B object from the queue for analysis, but at this time the time slice of the GC thread is used up, the operating system schedules the user thread to run, and the user thread executes first This operation: Af = F; then the reference relationship becomes:
Insert picture description here
  Then execute: Bf=null; then the reference relationship becomes:
Insert picture description here
  OK, the user thread is done, the GC thread restarts to run, according to the previous mark The process continues: Take out the B object from the queue and find that the B object is not directly referenced , then turn the B object into black:
Insert picture description here
  then continue to take out the three gray objects E, C, and D from the queue, none of them are directly referenced, then Become a black object:
Insert picture description here
  Now that the analysis of all gray objects is complete, you must have found the problem. There has been a situation where the black object directly references the white object, and although F is a white object, is it garbage? Obviously it is not rubbish, if F is cleaned out as rubbish, then GG~

Three, incremental update and original snapshot (SATB)

  There are two problems in the above. From the results, it can be described as follows:

  • An object that should be garbage is considered non-garbage
  • An object that shouldn't be garbage is considered garbage

  Regarding the first question, we also mentioned it before, even if it doesn't deal with it, it doesn't matter. It's a big deal to wait until the next GC to clean up. The most important thing is the second question. If the object being used is cleaned up by mistake, it is a real bug. So how to solve this problem?
  The main reason for this problem is that an object is referenced by B and changed to be referenced by A. Then for A, there is one more direct reference, and for B, there is one less direct reference. We can solve this problem from these two aspects. There are also two solutions corresponding to the incremental update (Incremental Update) and the original snapshot (SATB, Snapshot At The Beginning) .

3.1 Read and write barriers

  Before I talk about the solution, I want to describe two terms: read barrier and write barrier . Note that the barrier here and the barrier in concurrent programming are two different things. The barrier here is very simple. It can be understood that the achievement is to insert a piece of code before and after the read and write operations to record some information, save some data, etc. The concept is similar to AOP.

3.2 Incremental update

  Incremental update is to solve the problem from the perspective of the newly referenced object (that is, the A object in the example). The so-called incremental update is to add a write barrier before the assignment operation, and record the new reference in the write barrier. For example, the user thread needs to execute: Af = F; then the newly added reference relationship is recorded in the write barrier. The standard description is that when a black object adds a reference to a white object, the reference relationship is recorded through the write barrier. Then, in the re-marking stage, take the black objects in the reference relationship as the root and scan again to ensure that the mark will not be missed.
  In our example, in the concurrent marking phase, A is a black object, F is a white object, and A refers to F, this reference relationship will be recorded, and then through this record in the re-marking phase, starting from the A object For one example, make sure that if A still keeps the reference of F, then F will be marked correctly; if the reference from A to F is broken during the concurrent marking phase, the enumeration will not be able to access it, and it deserves to be cleared.
  It is also very simple to implement. In the re-marking phase, the A object (and other objects that have the same situation) will be grayed out, put in the queue, and the enumeration process will be repeated. It should be noted that if the user thread continues to execute in the re-marking phase, then the GC may never be finished, so re-marking requires STW, but this time consumption is not too exaggerated. If the remarking phase takes too long, you can try to do a Minor GC before remarking. This is introduced in the CMS garbage collector , so I won’t repeat it here.

3.3 Original Snapshot (SATB)

  The original snapshot is to solve the problem from the perspective of reducing the referenced object (that is, the B object in the example). The so-called original snapshot, in simple terms, is to add a write barrier before the assignment operation (here, nulling) is executed, and record the nulled object reference in the write barrier. For example, the user thread should execute: Bf=null; then in the write barrier, Bf will be recorded first, and then the null operation will be performed. The recorded object can be called the original snapshot.
  What about after recording? It's very simple, just change it to black afterwards. It means that it is not considered garbage by default and does not need to be cleaned up. Of course, there are two situations in this way. One situation is that F is indeed not garbage. Until the moment of cleaning, there is still at least one reference chain that can access it. There is no problem; the other situation is that F changes again. Become rubbish. In the above example, if the reference chain from A to F is also broken, or A directly becomes garbage, then the F object becomes floating garbage. For floating garbage, I mentioned it more than once before, just ignore it. If it is still garbage by the next GC, it will naturally be cleaned up.

3.4 Scheme selection

  From the implementation of incremental update and original snapshot (theoretically), it can be found that the original snapshot is more efficient than incremental update, because there is no need to do enumeration traversal in the re-marking phase, but it may be Lead to more floating garbage. G1 uses the original snapshot, and CMS uses incremental updates.
  Since the original snapshot may have a more serious floating garbage problem, why not use incremental updates? The reason may be simple, just because it is simple. Imagine that although G1 is also based on the generational collection algorithm of the young and old generations, the young and old generations are weakened to logic, and the memory it manages is divided into many regions. The problems caused by object cross-generation references are G1 is more prominent than the traditional generational collector. Although there is a Remember Set solution to alleviate it, the cost of traversing the enumeration again in the re-marking phase will be relatively much higher. The most important thing is that the re-marking (final marking) phase will be STW. If this phase takes too much time to do accessibility analysis, then it violates the G1 low-latency concept. Of course, this is the blogger’s guess. If readers have better ideas, they are welcome.

Four, summary

  There is a point to note here, STW will be performed in the re-marking stage to ensure the correctness of the marking results (mainly missed marking). By now you may understand what is described in the garbage collector: the garbage generated in the concurrent cleanup phase will be treated as floating garbage, which can only be left for the next GC to be cleaned up. So what is actually going on? In fact, it is very simple, as long as the objects generated in the concurrent cleaning phase are directly regarded as black objects, all are not garbage. If an object eventually becomes garbage, then it is floating garbage. If it is not garbage, then it is not a problem to mark it as black. Because in the cleaning phase, the marking work has been completed, there is no way to find a suitable way to deal with this problem, otherwise a GC may never end.
  Having said that, you may still have a question about the above missing label: In the process of concurrent marking, in addition to the change of the reference relationship, if the user thread directly creates a new object, the object is white by default, and it is directly black. What about object associations? That is, the white object may be "transferred" from the reference chain of other objects, or it may be a new object. In fact, it is conceivable that for new objects added to the black node, we cannot use the original snapshot, but we can use incremental updates, or simply handle it, just like the concurrent cleanup phase: new objects created during this period are considered not garbage ( For example, if it is marked as black), if it becomes garbage, it is floating garbage, or it is left for the next GC processing. In short, the general principle of marking is, " Others can be let go, and no mistakes can be killed ."
  Regarding the three colors of black, white and gray, it is an abstract concept. Although garbage collectors that use reachability analysis basically adopt the idea of ​​three-color marking, they may also be different in implementation, such as how to identify colors and how to gray queues. Realization and so on. For example, not only Java, but also three-color marking algorithm has been implemented in go's GC. I personally think that as an ordinary developer, it is enough to understand the idea. If you want to see the specific implementation, you need to explore it in the actual implementation source code.

If there is an error, thanks for pointing it out!

Guess you like

Origin blog.csdn.net/huangzhilin2015/article/details/115282572