Operation and Maintenance Series (15) -- Operation and Maintenance Series (1) -- Garbage Collection Mechanism of Java Technology

Garbage Collection Mechanism of Java Technology

Here is another link about the python garbage collection mechanism:
Detailed explanation of Python garbage collection mechanism (GC)

content

The garbage collection mechanism is one of the very important features of Java, and it is also a frequent visitor to interview questions. It allows developers to automatically collect garbage in the background in the form of a daemon process without paying attention to the creation and release of space. This not only improves development efficiency, but also improves memory usage.

Today, this article will explain the garbage collection mechanism, mainly involving the following issues:

  • What is heap memory?
  • What is rubbish?
  • What are the ways to recycle this garbage?
  • What is the generational recycling mechanism?

1. What is Java heap memory

The heap is created when the JVM starts , and is mainly used to maintain runtime data, such as objects and arrays created during the running process are based on this memory space. The Java heap is a very important element. If the objects we dynamically create are not recycled in time and continue to accumulate, the heap space will eventually be filled and the memory will overflow.

Therefore, Java provides a garbage collection mechanism that creates a daemon process in the background. The process will automatically jump out when the memory is tight, and reclaim all the garbage in the heap space to ensure the normal operation of the program.

2. What is garbage?

The so-called "garbage" refers to all objects that are no longer alive. There are two common methods for judging survival: reference counting and reachability analysis.

2.1 Reference counting method

A reference counter is allocated for each created object to store the number of references to the object. When the number is zero, it means that no one uses the object anymore, and it can be considered "object dead". However, there is a serious problem with this scheme, which is the inability to detect " circular references ": when two objects refer to each other, even if neither of them is referenced by anything outside, neither of them has a count of zero, so it will never be Recycle. In fact, for developers, these two objects are completely useless.

Therefore, no such scheme is used in Java to determine the "liveness" of an object.

2.2 Accessibility Analysis

This scheme is the object survivability judgment scheme currently used in mainstream languages. The basic idea is to imagine all the referenced objects as a tree, starting from the root node GC Roots of the tree, and continue to traverse to find all connected branch objects. These objects are called "reachable" objects, or "survival". object. The rest of the objects are considered "dead" "unreachable" objects, or "garbage".

Referring to the figure below, object5, object6 and object7 are unreachable objects, which are regarded as "dead states" and should be collected by the garbage collector.
write picture description here

2.3 Who exactly does GC Roots refer to?

We can guess that the GC Roots themselves must be reachable, so that the objects traversed from them can be guaranteed to be reachable. So, what objects in Java are definitely reachable? There are four main types:

  • Objects referenced in the virtual machine stack (local variable table in the frame stack).
  • The object referenced by the static property in the method area .
  • The object referenced by the constant in the method area .
  • Objects referenced by JNI in the native method stack .

Many readers may not understand these GC Roots, which involves the memory structure of the JVM itself, etc. Future articles will explain in depth. As long as you know that there are several types of GC Roots, each time the garbage collector will traverse from these root nodes to find all reachable nodes.

3. What are the ways to recycle these garbage?

As already known above, all objects unreachable by GC Roots are called garbage. Referring to the figure below, black represents garbage, gray represents surviving objects, and green represents empty space.
write picture description here

So, how do we recycle this garbage?

3.1 Marking - Cleaning

The first step, the so-called "marking" is to use the reachability to traverse the heap memory, mark the "surviving" objects and "garbage" objects, and the result is as shown in the figure above; the
second step, since the "garbage" has been marked, then We traverse it again and empty the space occupied by all "garbage" objects directly.

The result is as follows:
write picture description here

This is the mark-sweep scheme, which is simple and convenient, but prone to memory fragmentation.

3.2 Marking - finishing

Since the above method will generate memory fragmentation, well, when I clean up, I pile all the surviving objects into the same place and let them stay together, so that there is no memory fragmentation.

The result is as follows:

write picture description here

These two schemes are suitable for situations where there are many surviving objects and less garbage. It only needs to clean up a small amount of garbage, and then move the surviving objects.

3.3 Replication

This method is relatively crude. It directly divides the heap memory into two parts. Only one piece of memory is allowed to be allocated for a period of time. When this piece of memory is allocated, garbage collection is performed and all surviving objects are copied to another piece. On the memory, the current memory is directly cleared.

Refer to the image below:
write picture description here

At first, only the upper part of the memory is used, until the memory is used up, garbage collection is performed, all surviving objects are moved to the lower part, and the upper part is emptied.

This approach is not easy to generate fragmentation, and it is simple and rude; however, it means that you can only use a part of the memory for a period of time, and exceeding this part of the memory means that the heap memory is frequently copied and emptied.

This scheme is suitable for situations where there are few surviving objects and a lot of garbage, so that there is no need to copy many objects in the past, and most garbage is directly emptied.

4. Java's generational recycling mechanism

We saw above that there are at least three ways to reclaim memory, so how does Java choose to use these three recovery algorithms? Use only one or all three?

4.1 Java heap structure

Before choosing a recycling algorithm, let's take a look at the structure of the Java heap.

A piece of Java heap space is generally divided into three parts, which are used to store three types of data:

  • The object just created. New objects are created continuously while the code is running, and these newly created objects are put together uniformly. Because there are many local variables, etc., which will become unreachable objects and die quickly after being newly created, so this area is characterized by fewer surviving objects and more garbage. The image point describes this area as: Cenozoic;
  • Objects that live for a period of time. These objects were created early on and have always survived. We put together these long-lived objects, which are characterized by more surviving objects and less garbage. The image point describes this area as: old age;
  • A permanent object. For example, some static files, the characteristics of these objects are that they do not need garbage collection and live forever. The image point describes this area as: permanent generation. (However, the permanent generation has been deleted in Java 8, and this memory space has been given to the meta space, which will be explained in subsequent articles.)
    That is to say, the conventional Java heap includes at least two memory areas of the new generation and the old generation. And these two areas have obvious characteristics:

  • New Generation: Fewer surviving objects and more garbage

  • Old age: more surviving objects, less garbage

Combining the characteristics of surviving objects in the new generation/old generation and the several garbage collection algorithms mentioned above, the following recycling schemes can be obtained:

4.2 Cenozoic - Copy Recovery Mechanism

For the young generation area, since a large number of new objects die each time GC, only a few survive. Therefore, the copy recovery algorithm is adopted, and a small number of surviving objects can be copied in the GC.

So how to design this replication algorithm is better? There are several ways:

Idea 1. Divide the memory into 1:1 equal parts

Split the memory as shown below.
write picture description here

Only half of the memory is used each time. When this half is full, garbage collection is performed, the surviving objects are copied directly to the other half of the memory, and the current half of the memory is emptied.

The disadvantage of this division method is that it is equivalent to only half of the available memory. For the new generation, new objects are continuously created. If there is only half of the available memory, it is obvious that garbage collection must be continuously performed, but it affects the normal operation. The operation of the program is worth the loss.

Idea 2. The memory is divided by 9:1

Since the above division results in only half of the available memory, I make some adjustments to change 1:1 to 9:1,
write picture description here

It is used in the memory area of ​​9 at first. When 9 is about to be full, the copy recovery is performed, the objects still alive in 9 are copied to the 1 area, and the 9 area is emptied.

This seems to be better than the above method, but it has serious problems.

When we copy the surviving objects in area 9 to area 1, due to the large difference in the proportion of memory space, it is very likely that area 1 is not full, and then the objects have to be moved to the old area. And this means that there may be some non-aged objects in Zone 9 that are placed in the Senior Zone because Zone 1 can't fit. It is conceivable that this breaks the rules of the Senior Zone. In other words, to a certain extent, the elderly areas are not necessarily all elderly objects.

So how should we move the really older objects to the elderly area?

Idea 3. Divide the memory according to 8:1:1
write picture description here

Since 9:1 is likely to put young objects in the elderly area, then replace it with 8:1:1, and name them Eden, Survivor A, and Survivor B in turn, where Eden means the Garden of Eden, which describes that there are many new objects in it. Created; the Survivor area is the survivor, that is, the object that still survives after GC.

It works as follows:

  1. First, the Eden area is the largest and provides heap memory to the outside world. When the Eden area is almost full, perform Minor GC, put the surviving objects into the Survivor A area, and clear the Eden area;
  2. After the Eden area is cleared, continue to provide heap memory to the outside world;
  3. When the Eden area is filled again, Minor GC is performed on the Eden area and the Survivor A area at the same time, the surviving objects are put into the Survivor B area, and the Eden area and the Survivor A area are emptied at the same time;
  4. The Eden area continues to provide external heap memory, and repeats the above process, that is, after the Eden area is filled, put the surviving objects in the Eden area and a certain Survivor area into another Survivor area;
  5. When a Survivor area is filled and there are still objects that have not been copied, or when some objects are repeated Survive about 15 times, the remaining objects are placed in the Old area;
  6. When the Old area is also filled up, a Major GC is performed, and the Old area is garbage collected.
    [Note that in a real JVM environment, the ratio of the Eden area to a single Survivor area can be manually configured through the parameter SurvivorRatio, which is 8 by default. ]

So, how should the so-called Old Area Garbage Collection, or Major GC, be performed?

4.3 Old Age - Marking, Sorting and Recycling Mechanism

According to the above, we know that the old age generally stores objects with a longer survival time, so each GC, the surviving objects are relatively large, that is to say, only a small number of objects are recycled each time.

Therefore, according to the characteristics of different recycling mechanisms, a marking and sorting recycling mechanism with more surviving objects and less garbage is selected here. Garbage can be cleaned up only by moving objects in a small amount, and there is no memory fragmentation.

So far, we have understood the generation principle of Java heap memory, and learned that different generations adopt different recycling mechanisms according to their own characteristics, that is, the new generation adopts the recycling mechanism, and the old generation adopts the marking and sorting mechanism.

5. Summary

Garbage collection is a very important feature of Java and a must for senior Java engineers.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325562686&siteId=291194637