Concepts and Algorithms of Garbage Collection (Chapter 4)

"Practical Java Virtual Machine: JVM Troubleshooting and Performance Optimization (2nd Edition)"

Chapter 4 Concepts and Algorithms of Garbage Collection

Target:

  1. Understand what garbage collection is
  2. Learn several commonly used garbage collection algorithms
  3. Master the concept of accessibility
  4. Understanding Stop-The-World (STW)

4.1. Understanding Garbage Collection - Memory Management Cleaner

Garbage Collection (GC for short), garbage in GC refers specifically to objects that exist in memory and will no longer be used. If a large number of objects that will not be used continue to occupy space, there will be problems when memory space is needed. May cause memory overflow.
Garbage collection is not unique to the JVM. Garbage collection has been used in the Lisp language as early as the 1960s. Now, in addition to Java, languages ​​such as C# and Python all use the idea of ​​garbage collection.

4.2. Commonly used garbage collection algorithms - Cleaning tools PK

Commonly used garbage collection algorithms include: reference counting method, mark and sweep method, copy algorithm, mark compression method, generational algorithm and partition algorithm.

4.2.1. Reference Counting

1. The most classic and oldest garbage collection algorithm.
2. Implementation: For an object A, as long as any object references A, the reference counter of A will be incremented by 1. When the reference becomes invalid, the reference counter will be decremented by 1. As long as the reference counter value of object A is 0, object A can no longer be used.
3. Implementation of reference counter: Just equip each object with an integer counter.
4. Problem: Two very serious problems:
(1) Unable to handle circular references. Therefore, this algorithm is not used in Java's garbage collector.
(2) The reference calculator requires that each time a reference is generated and eliminated, it will be accompanied by an addition operation and a subtraction operation, which will have a certain impact on system performance.

(1) A simple circular reference problem is described as follows: There is object A and object B. Object A contains a reference to object B, and object B contains a reference to object A. At this time, the reference counters of both object A and object B are not 0. However, there is no third object in the system that references object A or object B. In other words, object A and object B are garbage objects that should be recycled, but because the garbage objects refer to each other, the garbage collector cannot identify them, causing a memory leak.

Insert image description here

4.2.2. Mark-Sweep

1. The mark-and-sweep method is the ideological basis of modern garbage collection algorithms;
2. The mark-and-sweep method divides garbage collection into two stages: the marking stage and the clearing stage;
3. Implementation: In the marking stage, first mark all objects from the root through the root node. The reachable object from the node. Therefore, unmarked objects are unreferenced garbage objects. Then, in the cleanup phase, all unmarked objects are cleared.
4. Problem: Space fragmentation may occur;
5. Note: The mark-and-sweep method first marks all reachable objects through the root node, and then clears all unreachable objects to complete garbage collection.

As shown in Figure 4.2, a continuous memory space is reclaimed using the mark-and-sweep method. Starting from the root node (two root nodes are shown here), all objects with reference relationships are marked as alive objects (arrows indicate references). Starting from the root node, unreachable objects are garbage objects. After the marking operation is completed, the system reclaims space for all unreachable objects.
As shown in Figure 4.2, the reclaimed space is discontinuous. During the heap space allocation process of objects, especially the memory allocation of large objects, the working efficiency of discontinuous memory space is lower than that of continuous space. Therefore, this is also the biggest shortcoming of this algorithm.
Insert image description here

4.2.3. Copying algorithm (Copying)

1. Core idea: Divide the original memory space into two blocks, and only use one of them at a time. During garbage collection, copy the surviving objects in the memory being used to the unused memory block, and then clear the memory block being used. All objects in the used memory block exchange the roles of the two memories to complete garbage collection;
2. Advantages:
(1) High efficiency: If there are many garbage objects in the system, the number of surviving objects that the copy algorithm needs to copy will be relatively small. Less;
(2) No fragmentation: objects are uniformly copied to the new memory space during the garbage collection process, ensuring that the recycled memory has no fragments; 3. Disadvantages: the cost of the copy algorithm is to halve the system memory
;
4 .New generation: Heap space for storing young objects. Young objects refer to objects that have just been created or have experienced few garbage collections;
5. Old generation: heap space for storing old objects. Old objects refer to objects that still survive after multiple garbage collections;
6. Note: The replication algorithm is more suitable for the new generation, because there are usually more garbage objects in the new generation than surviving objects, and the effect of the replication algorithm will be better;

As shown in Figure 4.3, A and B have two identical memory spaces. During garbage collection, A copies the surviving objects to B. B remains continuous after copying. After copying is complete, case A, and space B is set as the currently used space.
Insert image description here
In Java's new generation serial garbage collector, the idea of ​​​​a copy algorithm is used. The new generation is divided into three parts: eden area, from area and to area. The from area and to area can be regarded as two spaces of the same size, equal status and interchangeable roles for replication. The from area and to area are also called survivor areas, which are survivor spaces and are used to store objects that have not been recycled.

During garbage collection, the surviving objects in the eden area will be copied to the unused survivor area (assumed to be the to area), and the young objects in the survivor area being used (assumed to be from) will also be copied to the to area (large objects). Or the old object will directly enter the old generation. If the to area is full, the object will also directly enter the old generation). At this time, the remaining objects in the eden area and the from area are garbage objects and can be emptied directly. The to area stores the surviving objects after this recycling. This improved copy algorithm not only ensures the continuity of space, but also avoids a large amount of waste of memory space. As shown in Figure 4.4, it shows the actual recycling process of the copy algorithm. After all surviving objects have been copied to the survivor area (to in the figure), simply clear the eden area and the spare survivor area (from in the figure).
Figure 4.4 Improved replication algorithm

4.2.4. Mark-Compact

The efficiency of the replication algorithm is based on the premise that there are few surviving objects and many garbage objects. This situation often happens in the young generation, but in the old generation, it is more common that most objects are alive objects. If the replication algorithm is still used, the cost of replication will be very high due to the large number of surviving objects. Therefore, based on the characteristics of old generation garbage collection, other algorithms need to be used.

1. The recycling algorithm of the old generation is optimized based on the mark-and-sweep method;
2. Implementation: First, all reachable objects need to be marked starting from the root node. After that, all live objects are compressed to one end of the memory. Then, clear all space outside the boundaries.
3. Advantages:
(1) Avoids the generation of fragmentation;
(2) No need for two identical memory spaces, high cost performance;
4. The final effect is equivalent to performing a memory defragmentation after the mark and clear method is executed. Therefore, Also called mark-and-sweep compression;

As shown in Figure 4.5, after marking all reachable objects through the root node, move the objects along the dotted line, move all reachable objects to one end, maintain the reference relationship between them, and finally clean up the objects outside the boundary. space to complete the garbage collection work.
Insert image description here

4.2.5. Generational Collecting

Among the algorithms introduced earlier, no algorithm can completely replace other algorithms. They all have their own advantages and characteristics. It is a wise choice to use an appropriate algorithm based on the characteristics of the garbage collection object.

1. The generational algorithm divides the memory interval into several blocks according to the characteristics of the object, and uses different recycling algorithms according to the characteristics of each memory interval to improve the efficiency of garbage collection; 2. The new generation is more suitable to use the copy algorithm: Characteristics of the new
generation It is born and dies;

Generally speaking, the JVM will put all new objects into a memory area called the new generation. The new generation is characterized by objects coming and going. About 90% of new objects will be recycled quickly, so the new generation is more suitable to use. Replication algorithm. When an object still survives after several collections, the object will be placed into the memory space called the old generation. In the old generation, almost all objects survive several garbage collections. Therefore, it can be considered that these objects will be memory-resident for a period of time, or even throughout the life cycle of the application.

3. Use mark compression or mark removal in the old generation:

In extreme cases, the survival rate of old generation objects can reach 100%. If you still use the copy algorithm to recycle the old generation, you will need to copy a large number of objects. In addition, the recycling cost performance of the old generation is lower than that of the new generation, so this approach is not advisable. According to the idea of ​​generation, different mark compression methods or mark clearing methods can be used for the recycling of the old generation and the new generation to improve the efficiency of garbage collection. As shown in Figure 4.6, this idea of ​​generational recycling is shown.
Insert image description here

4. The frequency of recycling in the new generation is very high, but each recycling time is very short; the frequency of recycling in the old generation is relatively low, but it consumes more time;

In order to support high-frequency new generation collection, the virtual machine may use a data structure called a card table. The card table is a set of bits, and each bit can be used to indicate whether all objects in a certain area of ​​the old generation hold references to new generation objects. In this way, during the new generation GC, you don't need to spend a lot of time scanning all the old generation objects to determine the reference relationship of each object. You can scan the card table first. Only when the mark bit of the card table is 1, you need to scan a given area. The old generation object, and the old generation object with the card table bit 0 must not contain references to the new generation objects. As shown in Figure 4.7, each bit in the card table represents a 4KB space in the old generation. The old generation area with a card table record of 0 does not have any objects pointing to the new generation. Only the area with a card table bit of 1 has objects containing new generation references. , therefore, during the new generation GC, only the old generation space whose card epitope is 1 needs to be scanned. Using this method, the recycling speed of the new generation can be greatly accelerated.
Insert image description here

4.2.6. Partition algorithm (Region)

1. The partition algorithm divides the entire heap space into different continuous small intervals, and each small interval is used independently and recycled independently;
2. Advantages: The number of small intervals to be recycled at one time can be controlled;

Generally speaking, under the same conditions, the larger the heap space, the longer a GC will take and the longer the pause will be. In order to better control the pause time generated by GC, a large memory area is divided into multiple small blocks. According to the target pause time, several small intervals are reasonably recycled each time instead of recycling the entire heap space, thus reducing one GC. The resulting pause.
Insert image description here

4.3. Determining accessibility - who is the real trash

4.3.1. Resurrection of objects - finalize() function

1. Accessibility includes the following 3 states:

  • Reachable: Starting from the root node, this object can be reached;
  • Resurrectable: All references to the object are released, but the object may be resurrected in the finalize() function;
  • Untouchable (can be recycled): If the object's finalize() function is called and is not resurrected, it will enter an untouchable state. An untouchable object cannot be resurrected because the finalize() function will only be called once;

The basic idea of ​​garbage collection is to examine the reachability of each object, that is, whether the object can be accessed from the root node. If it can, it means that the current object is being used. If an object cannot be accessed from all root nodes, Object, indicating that the object is no longer used. Generally speaking, the object needs to be recycled. But in fact, an unreachable object may "resurrection" itself under certain conditions. If this is the case, then its recycling is unreasonable. For this reason, an object reachability state needs to be given Definition, and stipulates the state under which objects can be safely recycled.

2. The finalize() function is a very bad pattern. It is not recommended to use the finalize() function to release resources; (
1) Because the finalize() function may leak references and inadvertently resurrect the object;
(2) Because the finalize() function ) function is called by the system, and the calling time is unclear, so it is not a good resource release solution. It is recommended to release resources in the try-catch-finally statement.

4.3.2. Strength of citations and accessibility

1. Java provides four levels of references: strong references, soft references, weak references and virtual references.

Except for strong references, the other three types of references can be found in the java.lang.ref package. As shown in Figure 4.9, the classes corresponding to these three reference types are shown, and developers can use them directly in applications. Among them, FinalReference is the "final" reference, which is used to implement the object's finalize() function.
Insert image description here

2. Strong reference is the reference type generally used in programs. The objects of strong references are accessible and will not be recycled. Objects with soft references, weak references and virtual references are soft-reachable, weakly-reachable and virtual-reachable, and can be recycled under certain conditions.
3. Characteristics of strong references:
(1) Strong references can directly access the target object;
(2) The objects pointed to by strong references will not be recycled by the system at any time. The JVM would rather throw an OOM exception than recycle strong references. The object pointed to;
(3) Strong references may cause memory leaks;

4.3.3. Soft references - references that can be recycled

1. Soft reference is a weaker reference type than strong reference. If an object only holds soft references, it will be recycled when the heap space is insufficient. Soft references are implemented using the java.lang.ref.SoftReference class. GC may not recycle soft reference objects, but when memory resources are tight, soft references will be recycled, and soft reference objects will not cause memory overflow.

4.3.4. Weak reference - found and recycled

1. Weak reference is a reference type that is weaker than soft reference. During system GC, as long as a weak reference is found, the object will be recycled regardless of the system heap space. However, since the garbage collector's thread usually has a very low priority, it may not be able to quickly find objects holding weak references. In this case, weak reference objects can exist for a longer period of time. Once a weak reference object is collected by the garbage collector, it will be added to a registered reference queue (this is very similar to a soft reference). Weak references are implemented using the java.lang.ref.WeakReference class.
2. Note: Soft references and weak references are very suitable for maintaining dispensable cached data. If you do this, when the system memory is insufficient, these cached data will be recycled without causing memory overflow. When memory resources are sufficient, these cached data can exist for a long time, thus speeding up the system.

4.3.5. Virtual Reference - Object Recycling Tracking

1. Virtual reference is the weakest of all reference types. An object holding a virtual reference is almost the same as having no reference and may be recycled by the garbage collector at any time. When trying to obtain a strong reference through the get() method of a virtual reference, it always fails. Moreover, virtual references must be used together with the reference queue, which is used to track the garbage collection process;
2. When the garbage collector is preparing to recycle an object, if it finds that it still has a virtual reference, it will recycle the virtual reference after recycling the object. References are added to the reference queue to notify the application of the object's recycling status.
3. Since virtual references can track the recycling time of objects, some resource release operations can also be executed and recorded in virtual references.

4.4. Stop-The-World during garbage collection

The garbage collector's task is to identify and recycle garbage objects for memory cleanup. In order for the garbage collector to perform normally and efficiently, in most cases, the system will be required to enter a pause state. The purpose of pause is to terminate the execution of all application threads, so that no new garbage will be generated in the system. At the same time, pause ensures the consistency of the system state at a certain moment, and also helps the garbage collector to better mark garbage objects. . Therefore, application pauses will occur during garbage collection. When a pause occurs, the entire application will be stuck without any response, so this pause is also called "Stop-The-World" (STW).

Guess you like

Origin blog.csdn.net/weixin_39651041/article/details/129278809