Thorough understanding of JVM garbage collection-understanding of important concepts (9)

Root node enumeration

The nodes that can be used as GC Roots mainly exist in global references (such as constants or static properties) and execution contexts (such as local variable tables in stack frames). Although the goal is clear, it is not easy to find these nodes efficiently. To 所有收集器的根节点枚举这一步都需要暂停用户线程的,毫无疑问枚举根节点需要面临”Stop the world“的困扰date, . Now the reachability analysis algorithm takes the longest time to find the reference chain can be achieved concurrently with the user thread (CMS), but the enumeration of the root node must always be in a way to ensure consistency (execution during the entire enumeration The subsystem seems to be frozen at a certain point in time). It can only be carried out in the snapshot. There will be no changes in the root node set and the node reference concern during the analysis process. If this point cannot be met, The accuracy of the analysis results cannot be guaranteed. This is one of the important reasons why the garbage collection process must pause all user threads. Even if the CMS, G1, ZGC and other collectors that claim that the pause time is controllable, or almost no pause, the root node must also be paused. .
At present, mainstream Java virtual machines use accurate garbage collection. When the user thread is stopped, it does not need to check all execution contexts and global reference positions. The virtual machine should have a way to get it directly. Where are references to objects stored. 在HotSpot的解决方案中,是使用一组成为OopMap(Ordinary Object Pointer,OOP)的数据结构来达到这个目的的. Once the class loading action is completed, HotSpot will calculate what type of data is in the offset in the object, and during the real-time compilation process, it will also record in the specific position in the stack and register which positions are references . In this way, the collector can directly know this information when scanning, and it does not need a true and complete search from the GC Roots such as the method area.

Safety point

With the assistance of OopMap, HotSpot can quickly and accurately complete the GC Roots enumeration, but a very real problem ensues: it may lead to changes in the reference relationship, or lead to changes in the content of OopMap. The instructions all generate OopMap, which will require a lot of extra memory space to store.
In fact, HotSpot does not generate an OopMap for each instruction, but only records this information at a specific location, which is called 安全点(SafePoint). With the setting of the safety point, it is determined that when the user program is executed, it is not possible to pause at any position in the code instruction flow for garbage collection, but it is mandatory that the safety point must be executed before it can be suspended. Therefore, the selection of safety points can neither be too small to make the collector wait too long, nor too much to excessively increase the memory load at runtime. The selection of the location of the safety point is basically based on "whether it has the characteristics of allowing the program to execute for a long time" as the standard, because the execution time of each instruction is very short, and the program is unlikely to be because the length of the instruction stream is too long And for a long time execution, when ”长时间执行“的最明显特征就是指令序列的复用,例如方法调用、循环跳转、异常跳转等属于指令序列复用,所以只有这些工功能的指令才会产生安全点。
garbage collection occurs, how to make all threads (excluding the thread that executes the JNI [Java Native Interface] call) run to the nearest safe point, and then pause, here are two ways:
(1) 抢断式中断: The stealing interrupt does not require the execution code of the thread. When garbage collection, the system first interrupts all user threads. If it is found that the place where the user thread is interrupted is not at a safe point, it will resume the execution of this thread and let it run. To the nearest safe spot. Now almost no virtual machine implements a stealing interrupt to pause the line in response to GC events
(2) 主动式中断: When garbage collection needs to interrupt the user thread, there is no need to directly operate the thread, just set a flag bit, each thread executes the process , Will actively poll this flag . Once the interrupt flag is found to be true, it will suspend itself actively at the nearest safe point. The location of the polling mark and the safety point are coincidentIn addition, all the created objects and other places that need to allocate memory on the Java heap must be added. This is to facilitate checking whether garbage collection is about to occur and avoid not having enough memory to allocate new objects.

Safe area

The use of security points seems to perfectly solve the problem of how to pause user threads and allow the virtual machine to enter the garbage collection state, but the actual situation is not necessarily the same. The security point mechanism ensures that when the program is executed, it will be encountered in a very long time. enter the security point garbage collection process 但是程序“不执行”的时候呢?程序不执行就是没有分配处理器时间,典型的场景就是用户线程处于Sleep状态或者Blocked状态,这时候线程无法响应虚拟机的中断请求,不能再走到安全的地方再中断挂起自己,虚拟机也显然不可能等待线程被重新激活分配处理器时间,对于这种情况采用安全区域(Safe Region)来解决.
When the user thread executes into the code fragment of the safe area, the reference relationship will not change, so it is safe to start junk phones anywhere in this area. We can also regard the safe area as a stretched safe point.
When the user thread executes the code in the safe area, it will first identify that it has entered the safe area, so that when the virtual machine initiates garbage collection during this time, it does not have to manage these threads that have declared themselves in the safe area. When the thread wants to leave the security domain, it checks whether the virtual machine has completed the enumeration of the root node (or other stages in the garbage collection process that need to suspend the user thread), if it is completed, then the thread continues to execute, otherwise it has been waiting, Until receiving a signal that can leave the security domain.

Memory set and card table

分代垃圾收集中为了解决对象跨代引用的问题,垃圾收集器在新生代中建立了名为记忆集(Remembered Set)的数据结构, To avoid adding the entire old generation to the GC Roots scanning range. In fact, it is not only the new generation and the old generation that have cross-band reference issues. All garbage collectors that involve mobile phone behavior in some regions, such as G1, ZGC, and Shenandoah collectors, will face cross-generation reference issues.
记忆集是用于记录从非收集区域指向收集区域的指针集合的抽象数据结构. If efficiency and cost are not considered, the simplest implementation can implement this data structure with an array of objects referenced across generations contained in the non-collection area. This only records the implementation scheme that contains all cross-generation reference objects, both in terms of space occupation and maintenance costs. In the garbage collection scenario, the collector only needs to determine whether a pointer to the collection area exists in a non-collection area through the memory set, and does not need to know all the details of these cross-band pointers. The designer can choose a coarser record granularity to save the storage and maintenance costs of the memory set. The following is a list of the record accuracy options (of course, you can also choose outside this range):
(1) 字长精度: the record is accurate to a machine Word length (that is, the addressing bits of the processor, such as the common 32-bit or 64-bit, this precision determines the length of the pointer that the machine accesses the physical memory address), the word contains cross-generation pointers.
(2) 对象精度: Each record is accurate to an object, and there are fields in the object suffering from cross-generation pointers.
(3) 卡精度: Each record is accurate to a memory area. There are objects in this area that contain cross-generation pointers.

Among them, the card accuracy is a 卡表(Card Table)way to realize the memory set, which is also the most commonly used form of memory set implementation. The memory set is an abstract "data structure", and the card table is a concrete implementation of the memory set, which defines the recording accuracy of the memory set, the mapping relationship with the heap memory, and so on. The simplest form of the card table can be just a byte array, the HotSpot virtual machine does exactly the same. The following code is the implementation of HotSpot's default card table

CARD_TABLE [this address >> 9] =0

Each element of the byte array CARD_TABLE corresponds to a memory block of a specific size in its identified memory area. This memory block is called 卡页. In general, the size of the card page is the number of bytes of the power of N, and it can be seen from the above code that the card page used in HotSpot is the power of 9, which is 512 bytes (the address is shifted to the right by 9 bits, which is equivalent Yu divided by 512). If the starting address of the card table memory is 0x0000, the elements 0, 1, and 2 of the CARD_TABLE array correspond to the 0x0000 ~ 0x001FF, 0x0200 ~ 0x03FF, and 0x0400 ~ 0x05FF cards. As shown in the figure:
Card Table and Card Page
the memory of a card page usually contains more than one object, as long as there is one (or more) object field in the card page with this cross-band pointer, then the value of the array element of the corresponding card table Marked as 1, this element is called Dirty, no mark is 0. When garbage collection occurs, as long as the dirty elements in the card table are filtered out, it is easy to find out which card page memory blocks contain cross-generation pointers, and add it to the GC Roots to scan.

Write barrier

We use the memory set method to solve the problem of GC Roots scanning range, but have not solved the problem of "card table" maintenance, such as when they become dirty and who is responsible for making them dirty.
When the card table becomes dirty is clear-when there are objects in other generational areas that refer to objects in this area, the corresponding card table element should be dirty. Dirty is the point in time that should occur in the reference type assignment At that moment, but how to get dirty, that is, how to update the maintenance card table at the moment of object assignment? Joining is to explain the execution of the bytecode, which is relatively easy to handle. The virtual machine is responsible for the execution of each bytecode, with sufficient intervention space, but in the context of compilation and execution, the code after instant compilation is already a purely broken machine The instruction flow, this must find a means at the machine code level, put the action of maintaining the card table in each assignment operation.
In the HotSpot virtual machine, the 写屏障(Write Barrier)status of the card table is maintained through technology. The write barrier can be seen as an AOP aspect of the action of "reference type field assignment" at the virtual machine level. When the reference type assignment is made, an surround notification (Around) is generated for the program to perform additional actions, that is, before and after the assignment Within the coverage of the write barrier. 在赋值前的部分的写屏障称为写前屏障(Pre-write Barrier),在赋值之后的称为写后屏障。In addition to the G1 collector of the HotSpot virtual machine, the other collectors only use the write barrier. The following code is to update the card table after the barrier is written:

void oop_field_store(oop* field,oop new_value){
   //引用类型字段赋值
   *field = new_value;
   // 写后屏障,更新卡表信息
   post_write_barrier(field,new_value);
}

After the write barrier is applied, the virtual machine generates corresponding instructions for all assignment operations. Once the collector adds an update card table operation to the write barrier, regardless of whether the old-generation reference to the new generation is updated, only the reference needs to be updated each time. , There will be additional overhead, but this overhead is much lower than the cost of scanning the entire old generation when Minor GC.
In addition to the overhead of write barriers, card tables also face the problem of "false sharing" in high concurrency scenarios. Pseudo-sharing is a problem that often needs to be considered when dealing with low-level details of concurrency. The cache system of the central processor is now stored in units of cache lines. When multiple threads modify independent variables, if these variables happen to be shared A cache line will affect each other (write back, invalid or synchronization) and lead to reduced performance, which is a pseudo-sharing problem.
Assuming that the cache line size of the processor is 64 bytes, since one card table element occupies one byte, 64 card table elements will share the same cache line. The total memory of the card pages corresponding to these 64 card table elements is 32KB (64 * 512 bytes), which means that different card tables are written to the same cache line and affect performance, only when the card table element is not marked. It will be marked as dirty when it is out of date, that is, the card table update will add the following judgment logic:

if(CARD_TABLE[this address>>9] !=0 ){
	CARD_TABLE[this address>>9] =0;
}

After JDK1.7, the HotSpot virtual machine added a new parameter -XX:+UseCondCardMarkto determine when to open the card table update judgment logic. After opening, an additional judgment overhead will be added, but it can avoid the problem of false sharing. Both have their own performance Loss, time to turn on to test trade-offs based on actual operating conditions.

Concurrent reachability analysis

The garbage collectors of current mainstream programming languages ​​basically rely on the reachability analysis algorithm to determine whether the object is alive. The reachability analysis algorithm theoretically requires the entire process to be analyzed based on a snapshot that can ensure consistency. This It means that the entire process must be based on a snapshot that guarantees consistency to be able to analyze, which means that the operation of the user thread must be frozen throughout the process. In the root node enumeration step, because GC Roots is relatively rare compared to all objects in the entire heap, and there are also various optimization methods (OopMap), the pause it brings is already very short and relatively fixed (No growth with heap streamlining). 可是从GC Roots再往下遍历对象图,这一步骤的停顿时间必定与Java堆空间容量成正比:堆空间越大,存储的对象越多,对象图结构越复杂,要标记更多对象而产生的停顿时间自然就更长久。
The "marking" stage is a common feature of all tracking chain garbage collection algorithms. If this stage will increase the pause time in proportion to the heap, the impact will affect almost all garbage collectors. If this part can be weakened If there is a pause, the benefits will be systematic.
To solve or reduce the pause of the user thread, first understand why the object graph can be traversed on a snapshot that can ensure consistency. Here we introduce the 三色标记(Tri-color Marking)most tools to assist in the derivation. We will traverse the objects in the object graph process. According to the "have you visited" condition, it is divided into the following three colors:
(1) 白色: indicates that the object has not been visited by the garbage collector . Obviously, at the beginning of the accessibility analysis, all objects are white. If at the end of the analysis, the object is still white, it means unreachable.
(2) 黑色: indicates that the object has been accessed by the garbage collector, and all references to this object have been scanned. The black object represents that it has been scanned, and it is safe to survive. If there are other object references pointing to the black object, there is no need to rescan it. A black object cannot directly (without passing through a gray object) point to a white object.
(3) 灰色: indicates that the object has been accessed by garbage collection, but at least one reference to this object has not been scanned .

During the reachability analysis, if the user thread is frozen, only the collector thread is working and there will be no problems. If the user thread and the collector are working concurrently, the collector marks the color on the object, and the user thread is modifying the reference relationship-that is, modifying the graph structure of the object, which may have two consequences. One is to erroneously mark the objects that were originally marked as dying as alive. This is tolerable, except that some floating garbage that has escaped collection has been generated, and it will be cleaned up next time. The other is to erroneously mark the objects that were alive as dead. This is a very fatal consequence, and the program will definitely make mistakes. The following figure is a schematic diagram of the process of generating an error when the object is concurrently marked:
Insert picture description here
Only when both conditions are met at the same time can the problem of "the object disappear", that is, the originally black object is mistakenly marked as white:
(1) An evaluator inserted Multiple new references from black objects to white objects.
(2) The evaluator deletes all direct or indirect references from gray objects to white objects.
So we solve the problem of object disappearance during concurrent scanning, only need to destroy any of these two conditions. This resulted in two solutions: 增量更新(Incremental Update)and 原始快照(Snapshot At The Beginning,SATB).
The first condition to be destroyed by incremental update is the first condition. When a black object inserts a new reference relationship that points to a white object, the newly inserted reference is recorded, and after the concurrent scan ends, it is scanned again. It can be simplified to understand that once a black object is newly inserted with a white reference, it will change back to a gray object .
The second condition is to destroy the original snapshot. When the gray object is to delete the reference relationship of the white object, record the reference to be deleted. After the concurrent scan is over, talk about the reference relationship in these records. The gray object is the root, and scan again. It can be simply understood as: no matter whether the reference relationship is deleted or not, the search will be performed according to the snapshot of the object graph at the moment when the scan is just started .
Regardless of the insertion or deletion of the reference relationship record, the recording operation of the virtual machine is achieved through the write barrier. In the HotSpot virtual machine, both the incremental update and the original snapshot are actually applied. For example, CMS is based on incremental update for concurrent marking, and G1 and Shenandoah are implemented with the original snapshot.

Insert picture description here

Published 41 original articles · Liked 14 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/Yunwei_Zheng/article/details/105296375