JVM principle | GC mechanism

CG mechanism

1. Where does GC happen?

GC generally occurs in the heap and method area , where the structure of the heap is as follows:
Insert picture description here
Heap:

  1. Old age: store large objects (a lot of contiguous memory) and long-lived objects,
  2. Cenozoic (divided into three areas):
    1. The Eden area (larger memory) is an Eden and one of Survivor each time you use it
    2. From Survivor区
    3. To Survivor区

Method area : store permanent generation: (objects generally will not be recycled, the garbage collection of permanent generation mainly recycles discarded constants and useless classes)

Extension:

When Eden District memory is not enough: the virtual machine will be launched once Minor GC
is not enough memory old year:Major GC/Full GC


2. How to determine whether an object needs to be recycled?

2.1 Determine whether a class is "useless"

  1. All instances of this class have been recycled, that is, there is no instance of this class in the Java heap;
  2. The ClassLoader that loaded the class has been recycled
  3. The java.lang.Class object corresponding to this class is not referenced anywhere, and the methods of this class cannot be accessed through reflection anywhere.

Here you need to understand the age counter :

  1. The virtual machine uses an object age counter to determine which objects should be placed in the young generation and which objects should be placed in the old generation.
  2. Every time the subject survives a Minor GC in Survivor, his age increases by 1 year. When his age increases to the maximum value of 15, he will be promoted to the old age.
  3. If the total size of all objects of the same age in the Survivor space is greater than half of the Survivor space, objects with an age greater than or equal to this age can directly enter the old age without waiting for the age required in MaxTenuringThreshold.

2.2 Principle of Judgment

Whether general objects need to be recycled can be roughly divided into the following two algorithms:

2.2.1 Reference counter algorithm

Reference counting algorithm (Reference Counting): Add a reference counter to the object. Whenever there is a place to reference it, the counter value is increased by 1; when the reference is invalid, the counter value is decreased by 1; the object whose counter is 0 at any time is not May be used again, this is the core of the reference counting algorithm.

Advantages: The reference counting algorithm is simple to implement, and the judgment efficiency is also high

Disadvantages: it is difficult to solve the problem of circular references between objects

2.2.2 Reachability analysis algorithm

Reachability Analysis: This is the algorithm used by the Java virtual machine to determine whether an object is alive . Through a series is referred to “GC Roots"the object as a starting point, the search starts from the nodes down, called search path traversed reference chain (Reference Chain), when an object to GC Rootswhen the chain is not connected to any reference, then prove that this The object is not available.

The specific steps of accessibility analysis:

  1. During the analysis, it is necessary to ensure that the object reference relationship does not change, otherwise the result will be inaccurate.因此GC进行时需停掉其它所有java执行线程(Sun把这种行为称为‘Stop the World’),即使是号称几乎不会停顿的CMS收集器,枚举根节点时也需停掉线程

  2. After the system stops, the JVM does not need to check the references one by one, but uses OopMap数据结构[HotSpot's name] to mark object references.

  3. The virtual machine first knows where to store object references, when the class is loaded. HotSpot calculates the offset in the object and what type of data. During the jit compilation process, it will also record which locations in the stack and registers are references in a specific location, so that the GC can know this information during scanning. [Currently mainstream JVM uses accurate GC]

  4. OopMap can help HotSpot quickly and accurately complete GC Roots enumeration and determine related information. But there is also a problem, which may lead to changes in the reference relationship.

  5. This time there is a safepoint(安全点)concept:
    1. HotSpot in the GC can not enter in any location, and can only be entered at the safepoint. For a Java thread during GC, it is either in safepoint or not in safepoint.
    2. The safepoint cannot be too few, otherwise the GC wait time will be long
    3. The safepoint cannot be too much, otherwise it will increase the burden of running GC

    The main storage location of the safety point:
    1 The end of the loop
    2 Before the method returns/after the call instruction of the method
    3 The location where an exception may be thrown


In the figure below, you can see that the objects on the left of GC Roots are related to the reference chain, so they are not dead objects, and there are several scattered objects on the right of GCRoots that are not related to the reference chain, so they will not judge the Java virtual machine as dead The object was recycled.

img

Extension: Which objects can be GC Roots?

① The referenced object in the virtual machine stack (local variable table in the stack frame).
② The object referenced by the static properties of the class in the method area generally refers to the object referenced by the static modification, and is loaded into the memory when the class is loaded.
③ Object referenced by constants in the method area
Object referenced by JNI (native method) in the native method stack

Even objects that are not reachable in the reachability algorithm do not have to be recovered immediately, and may be rescued . There are many examples on the Internet, basically the same as the in-depth understanding of the JVM book about the existence or death of objects :


2.3 What is a dead object?

When the java virtual machine determines the dead object, it will go through two processes

  1. A tag: placing the objects in F-Queuea queue (queue low priority) if finalizethe object is re-process is established associated with any object on the chain of references cited chain, i.e., attached on the object of any object, for example, this key Word, then the object will escape the garbage collection system
  2. Twice marking: If the object is not associated with any object in the finalize method, then the object will be marked a second time by the virtual machine, and the object will be recycled by the garbage collection system. It is worth noting that the finaliza method JVM system will only automatically call once , if the object faces the next recycling, its finalize method will not be executed again.

3. Garbage Collection Algorithm

3.1 Mark-Sweep Algorithm (Mark-Sweep)

Insert picture description here

Advantages: The algorithm execution is divided into two stages: marking and clearing . All recycling algorithms are basically optimized based on the mark recycling algorithm.

Disadvantages : efficiency issues, will generate memory space fragmentation (discontinuous space), which is also the biggest drawback of mark arranging: too much fragmentation may cause the subsequent process to allocate space for large objects and fail to find enough space and trigger new ones in advance A garbage collection action.

3.2 Copying Algorithm (Copying)

In order to solve the defects of the Mark-Sweep algorithm, the Copyingalgorithm was proposed. **It divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time. **When this block of memory is used up, copy the surviving objects to another block, and then clear the used memory space at once, so that memory fragmentation is not easy to occur. The specific process is shown in the figure below:img

Advantages: Compare mark removal algorithms to avoid memory fragmentation caused by recycling,

Disadvantages: Although this algorithm is simple to implement, efficient to run and not prone to memory fragmentation, it has made a high price for the use of memory space, because the memory that can be used is reduced to half of the original. At the expense of local memory space, but the waste of space is relatively small, the default ratio of 8:1 is wasteful.

Copying also has a certain efficiency and space cost. The efficiency of the Copying algorithm has a great relationship with the number of surviving objects. If there are many surviving objects, the efficiency of the Copying algorithm will be greatly reduced, so the copying algorithm is generally used for fewer objects. Memory copy (e.g. young generation)

3.3 Mark-Compact

In order to solve the shortcomings of the Copying algorithm and make full use of the memory space, an Mark-Compactalgorithm is proposed . The marking phase of this algorithm is the same as Mark-Sweep, but after marking is completed, it does not directly clean up recyclable objects, but moves all surviving objects to one end, and then cleans up the memory outside the end boundary. The specific process is shown in the figure below:

Insert picture description here

Advantages: avoided, waste of space, and memory fragmentation problems.

Disadvantages: There is an efficient cost for copying when sorting.

3.4 Generational Collection Algorithm (Generational Collection)

The generational collection algorithm is currently the algorithm used by most JVM garbage collectors.
Its core idea is to divide the memory into several different areas according to the life cycle of the object's survival. Under normal circumstances, the heap area is divided into 老年代(Tenured Generation)and新生代(Young Generation)

  1. The characteristics of the old generation: only a small number of objects need to be recycled each time garbage collection
  2. The new generation is characterized by a large number of objects that need to be recycled each time a garbage collection, so the most suitable collection algorithm can be adopted according to the characteristics of different generations.
  • 目前大部分垃圾收集器对于新生代都采取Copying算法, Because every garbage collection in the young generation will reclaim most of the objects, that is to say, the number of operations that need to be copied is less, but in practice, the space of the young generation is not divided according to the ratio of 1:1. Generally speaking, it is The new generation is divided into a larger Eden space and two smaller Survivor spaces. Each time the Eden space and one of the Survivor spaces are used, when recycling is performed, the surviving objects in Eden and Survivor are copied to the other one. In the Survivor space, then clean up Eden and the Survivor space just used.
  • And because the characteristic of the old generation is that only a small number of objects are collected each time, it is generally used Mark-Compact算法.

Note that there is a generation outside the heap area that 永久代(Permanet Generation)
is used to store classes, constants, method descriptions, and so on. The recycling of the permanent generation mainly recycles two parts: discarded constants and useless classes.

4. Garbage Collector

Young generation collectors
Serial, ParNew, Parallel Scavenge
Old generation collectors
Serial Old, Parallel Old, CMS collectors
Special collector
G1 collector [New, not in the category of young and old] — Commercial garbage collector released by JDK1.7

Combination of different collectors:
Collector, the connection representative can be used in combination
collectors, connection representatives can be used in combination

4.1 Young generation

4.1.1 Serial / Serial Old

The Serial collector is the most basic and oldest collector. It is a single-threaded collector , and all user threads must be suspended while it is performing garbage collection . The Serial collector is a collector for the new generation and uses the Copying algorithm, and the Serial Old collector is a collector for the old generation and uses an Mark-Compactalgorithm.

Advantages: simple and efficient to implement

Disadvantages: Will bring users a pause

4.1.2 ParNew

The ParNew collector is a multi-threaded version of the Serial collector, which uses multiple threads for garbage collection.

4.1.3 Parallel Scavenge

The Parallel Scavenge collector is a new-generation multi-threaded collector (parallel collector). It does not need to suspend other user threads during recycling. It uses the Copying algorithm. This collector is different from the previous two collectors. It is mainly to achieve a controllable throughput .

Throughput:
The main concern of this collector is throughput [throughput = code running time / (code running time + garbage collection time) if the code runs for 100 minutes and garbage collects for 1 minute, then it is 99%]

4.2 Old age

4.2.1 Serial Old

Introduced above

4.2. Parallel Old

Parallel Old is the old version of the Parallel Scavenge collector (parallel collector), using multi-threaded and Mark-Compact算法.

4.2. CMS

CMS(Concurrent Mark Sweep)收集器It is a collector that aims to obtain the shortest recovery pause time. It is a concurrent collector and uses an Mark-Sweepalgorithm. [Attaching importance to response can bring a good user experience, and is called a concurrent low-pause collector by sun]

启用CMS:-XX:+UseConcMarkSweepGC

As its name suggests, CMS uses the "Mark Sweep" algorithm and supports Concurrent.
Its operation is divided into 4 stages:

  1. Initial mark: mark GC Rootsthe objects that can be directly linked to, it is very fast
  2. Concurrent mark: GC Roots Tarcing process, namely reachability analysis
  3. Re-marking: In order to correct the marking record of the part of the object that has changed due to the operation of the user program during concurrent marking, there will be a little pause.时间上比较一般为 初始标记 < 重新标记 < 并发标记
  4. Concurrent cleanup

The above initial marking and re-marking are required stop the world(stop other running java threads)

The reason why the user experience of CMS is good is that the memory recovery work of the CMS collector can be executed concurrently with the user thread.

In general, CMS is an excellent collector, but it also has some disadvantages .

  1. The cms heap cpu is particularly sensitive. The concurrent execution of cms running threads and applications requires multi-core cpu. If the number of cpu cores is large, it can take advantage of its concurrent execution. However, when cms default configuration starts, the number of garbage threads is (cpu number + 3)/ 4. Its performance is easily affected by the number of cpu cores. When the number of cpus is small, for example, it is 2 cores. If the computing pressure of the cpu is relatively high at this time, it will be divided into half for cms operation, which may be very large The impact of computer performance.

  2. cms cannot handle floating garbage, which may cause Concurrent Mode Failure(并发模式故障)full GC to be triggered

  3. Since cms uses the "mark-clear" algorithm, there will be a problem of garbage fragments. In order to solve this problem, cms provides the -XX:+UseCMSCompactAtFullCollection option, which is equivalent to a switch [enabled by default] for CMS When you want to perform full GC, turn on memory fragment merging. The process of memory defragmentation cannot be concurrent, and turning on this option will affect performance (for example, the pause time becomes longer)

浮动垃圾:由于cms支持运行的时候用户线程也在运行,程序运行的时候会产生新的垃圾,这里产生的垃圾就是浮动垃圾,cms无法当次处理,得等下次才可以。

Summary of shortcomings: 1. CPU sensitive 2. Can not handle floating garbage 3. There may be garbage fragments

4.3 G1

G1( garbage first : 尽可能多收垃圾,避免full gc )The collector is the most cutting-edge achievement of today's collector technology development. It is a server-oriented collector that can make full use of the multi-CPU and multi-core environment. So it is a parallel and concurrent collector, and it can build a predictable pause time model.

The special feature of g1 is that it strengthens the partition and weakens the concept of generation. It is a regionalized and incremental collector. It does not belong to the new generation nor the old generation collector. The algorithm used is 标记-清理,复制算法

Like cms, it also focuses on reducing latency. It is used to replace cms with a more powerful new collector, because it solves a series of defects such as space debris generated by cms.

G1 workflow:

  1. g1 searches for surviving objects in the old generation through the concurrent (parallel) marking phase, and compresses surviving objects through parallel replication [this saves continuous space for use by large objects].
  2. g1 copies the surviving objects in one or more groups of areas to different areas in an incremental and parallel manner for compression, thereby reducing heap fragmentation. The goal is to reclaim as much heap space as possible [garbage first], and as far as possible not to exceed the suspension target. To achieve the purpose of low latency.
  3. g1 provides three garbage collection modes young gc, mixed gcand full gc, unlike other collectors, it can collect objects in the new generation and the old generation based on region rather than generation.

4.3.1 Minor GC、Major GC、FULL GC、mixed gc

4.3.1.1 Minor GC

Young spaceThe garbage collection in the young generation (including the Eden area and the Survivor area) is called Minor GC, and the Minor GC will only clean up the young generation.

4.3.1.2 Major GC

Major GC cleans up the old generation (old GC), but it can also usually refer to the equivalent of Full GC, because collecting the old generation is often accompanied by upgrading the young generation, collecting the entire Java heap. So when someone asks, you need to ask if it refers to full GC or old GC.

4.3.1.3 Full GC

Full gc is a unified collection of the new generation, old generation, and permanent generation [the concept is no longer available after jdk1.8].

[Knowing R's answer: Collect the entire heap, including young gen, old gen, perm gen (if it exists), meta space (1.8 and above) and all parts of the model]

4.3.1.4 mixed GC【g1 unique】

Mixed GC

Collect the GC of the entire young gen and part of the old gen. Only G1 has this mode


5. Expansion

5.1 When is GC triggered?

Simply put, the trigger condition is that the GC algorithm area is full or almost full.

minor GC(young GC): Triggered when the eden area in the young generation is full [It is worth mentioning that after the young GC, some of the surviving objects will have reached the old generation (for example, the object has survived 15 rounds), so the occupancy of the old gen will usually change afterwards High]
full GC:
①Manually call the System.gc() method [increase the full GC frequency, it is not recommended to use it but let jvm manage the memory by itself, you can set -XX:+ DisableExplicitGC to prohibit RMI from calling System.gc] ②Discover
perm gen (if If there is a permanent generation) space needs to be allocated but there is not enough space
③The space of the old generation is insufficient. For example, the promotion of a large array of large objects in the new generation to the old generation may lead to insufficient space in the old generation.
④Promotion Faield[pf] appears during CMS GC.
⑤The average size of Minor GC promoted to the old generation is larger than the remaining space in the old generation.
This is more difficult to understand. This is the FULL GC triggered by HotSpot in order to avoid the lack of space in the old generation due to the promotion of the new generation to the old generation.
For example, after the program triggers the Minor GC for the first time, there are 5m objects promoted to the old age. Let's say that the average is 5m now. Then next time the Minor GC occurs, first determine whether the remaining space in the old age exceeds 5m. If it is less than 5m, then HotSpot will trigger full GC (this is smart)

5.2 Will the CMS collector scan the young generation?

Yes, the new generation will be scanned during the initial marking.

Although cms is the old generation collector, we know that the objects of the young generation can be promoted to the old generation. In order to guarantee the space allocation, it is still necessary to scan the young generation.

5.3 What is space allocation guarantee

Before minor gc, JVM will first check whether the maximum available space in the old generation is greater than the total space of all objects in the new generation. If so, the minor gc can ensure that it is safe

If the guarantee fails, a configuration (HandlePromotionFailire) will be checked, that is, whether the guarantee is allowed to fail.

If allowed: Continue to check if 老年代最大可用可用的连续空间> 之前晋升的平均大小is true, for example, there are 10m left, every time there are about 9m from the new generation to the old generation, then a minor gc (greater than the case) will be tried, which will be more risky.

If it is not allowed, and it is less than the case, full gc will be triggered. [In order to avoid frequent full GC, this parameter is recommended to be turned on]

Why is it risky here because if a large object appears after the minor gc, because the new generation adopts the replication algorithm, survivor cannot accommodate it and will run into the old age, so it will calculate the previous average as a guarantee condition and the old age Comparing the remaining space, this is the allocation guarantee.

This kind of guarantee is a means of dynamic probability, but it is also possible that the average before it appeared was relatively low. Suddenly there was a time when the minor gc object became much higher than the previous average. This time it will lead to the failure of the guarantee [Handle Promotion Failure]. I had to trigger another FULL GC after failure

5.4 Why is the replication algorithm divided into two Survivors instead of directly moving to the old generation?

In this way, the efficiency may be higher, but the old area is generally a surviving object after multiple accessibility analysis algorithms. The requirements are more demanding and the space is limited, and it cannot be moved directly. This will cause a series of problems ( (For example, the old age is easy to burst)

There are two Survivor (from/to), naturally to ensure that the replication algorithm runs to improve efficiency

5.5 Commonly used collector parameter settings:

img

5.6 What is stop the world and is there any way to avoid it?

Simply put, stop the world is when gc, stop the java thread except gc .

No matter what gc is, it is difficult to avoid a pause, even g1 will occur in the initial marking stage, stop the world is not terrible, you can reduce the pause time as much as possible.

5.6 What kind of situation will the new generation be promoted to the old generation

Objects are allocated in the Eden area first, and a minor GC will be triggered when the Eden area is full

Object promotion rules

  1. Long-term surviving objects enter the old age, and each time the object survives the GC age +1 (the default age threshold is 15, which is configurable).
  2. The object is too large to accommodate the young generation, it will be allocated to the old
  3. The eden area is full. After the minor gc is performed, the eden and a survivor area still alive cannot be placed in the (to survivor area) and will be placed in the old generation through the allocation guarantee mechanism. In this case, the new generation after the minor gc survives Too many objects.
  4. Dynamic age determination, in order to make memory allocation more flexible, jvm does not necessarily require the age of the object to reach MaxTenuringThreshold (15) to be promoted to the old age. If the total size of objects of the same age in the survior area is greater than half of the space in the survior area, it is greater than or equal to the object of this age Will be moved to the old age in minor gc

5.6 How to understand g1 and what scenarios it is suitable for

G1 core keywords: Adaptive
G1 GC is a regionalized, parallel-concurrent, incremental garbage collector. Compared with other HotSpot garbage collectors, it can provide more predictable pauses. The incremental feature makes G1 GC suitable for larger heaps, and can still provide a good response in the worst case. The adaptive feature of G1 GC makes the JVM command line only need the maximum value of the soft real-time pause time target and the maximum and minimum Java heap size to start working.

g1 no longer distinguishes the memory space of the old generation and the young generation. This is a big difference compared with the previous collectors. All the memory space is divided into different sub-regions. Each region is 1m-32m in size, and the maximum supported memory It is about 64g, and because of its characteristics, it is suitable for large memory machines.


The heap memory situation during g1 recovery, as shown in the following figure:
Insert picture description here

Applicable scene:

  1. Like cms can be executed concurrently with the application, the GC pause is short [short and controllable], and the user experience is good.
  2. For server-side, large-memory, high-cpu application machines. [On the Internet, it’s almost 6g or more]
  3. Applications often generate a lot of memory fragments during operation, and need to compress space [one of the better places than cms, g1 has compression function ].

5.7 Memory allocation

img
  Memory allocation object, speaking to the general direction is allocated on the heap, mainly targeted at the new generation of distribution Eden Spaceand From Space, in a few cases will be allocated directly in the old era. If the space of the Eden Space and From Space of the new generation is insufficient, a GC will be initiated. If after the GC, Eden Space and From Space can accommodate the object, it will be placed in Eden Space and From Space. In the process of GC, the surviving objects in Eden Space and From Space will be moved to To Space, and then Eden Space and From Space will be cleaned up. If during the cleaning process, To Space cannot store an object enough, it will move the object to the old generation. After GC, Eden space and To Space are used, and the surviving objects will be copied to From Space in the next GC, and the cycle is repeated. When an object escapes a GC in the Survivor area, its age will be increased by 1. By default, if the object reaches 15 years of age, it will move to the old age.

Generally speaking, large objects will be directly allocated to the old generation. The so-called large object refers to the 需要大量连续存储空间的对象most common type of large object is a large array, such as:

byte[] data = new byte[4 * 1024 * 1024]

This usually allocates storage space directly in the old generation.

Of course, the allocation rules are not 100% fixed, it depends on which garbage collector combination and JVM related parameters are currently used.

5.8 Transfer steps in Eden / From / To area

Eden空间The default ratio of sum From Survivorand the To Survivorthree is 8:1:1. The
Eden area is used first. If the Eden area is full, the object will be copied to the second memory area. However, there is no guarantee that no more than 10% of the objects will survive each collection, so if the Survivor area is not enough, it will rely on the old age for distribution (guarantee mechanism).

When the GC starts, the object will only be stored in the Eden and From Survivor areas, and the To Survivor [reserved space] is empty.

When the GC is in progress, all surviving objects in the Eden area are copied to To Survivor区, and From Survivor区in the middle, the surviving objects will determine their destination according to their age value, and the age value reaches the age threshold (the default is 15 is, every time the new generation survives a garbage collection, Age+1), then move to the old age, if not reached, copy to To Survivor.

Guess you like

Origin blog.csdn.net/weixin_40597409/article/details/115244344