Super detailed explanation! Java virtual machine JVM knowledge and garbage collection of high-frequency test points

Introduction

In the previous article, we mainly explained the JVM structure and commonly used common JVM tuning and auxiliary tools, so in this article we mainly explain JVM garbage collection.

How to judge garbage?

Reference Counter Algorithm

  • The reference counting algorithm determines whether the object can be recycled by judging the number of references to the object. JVM will maintain an additional counter for each object, adding 1 for each reference, and subtracting 1 for each release. When it is reduced to 0, it will be judged as garbage object
  • Advantages: simple and efficient
  • shortcoming:
    • An additional counter needs to be maintained, wasting memory resources
    • Mutual references can lead to memory leaks
  • Therefore, the current mainstream Java virtual machines have abandoned this algorithm .

Reachability Analysis Algorithm

  • The basic idea of ​​this algorithm is to use a series of objects called " GC Roots" as the starting point, and start searching downward from these nodes. The path traveled by the node is called a reference chain. When an object does not have any reference chain to GC Roots If connected, it proves that the object is not available. As shown below.

    72762049.jpg

    In the Java language, objects that can be used as GC Roots include the following:

  • The reference object in the virtual machine stack (local variable table in the stack frame).

  • The object referenced by the class static property in the method area.

  • Objects referenced by constants in the method area.

  • The reference object of JNI (Native method) in the native method stack

The jdk uses the reachability analysis algorithm, and it is necessary to mark the object as recyclable at least twice.

Java's four reference types

Strong reference:

Similar to Object obj = new Object(); As long as the strong reference still exists, the garbage collector will never recycle the referenced object.

Soft references:

The SoftReference class implements soft references. These objects will be included in the recovery scope for secondary recovery before the system will experience a memory overflow exception. If there is not enough memory for this recovery, a memory overflow exception will be thrown. Soft references can be used to implement memory-sensitive caches.

Weak Quote:

The WeakReference class implements weak references. Objects only survive until the next garbage collection. When the garbage collector is working, objects that are only associated with weak references will be reclaimed regardless of whether there is enough memory.

phantom reference:

The PhantomReference class implements phantom references. An instance of an object cannot be obtained through a virtual reference. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

garbage collection algorithm

Copying algorithm (Copying)

  • Divide the memory into two parts, and only use one part. When the part is full, the surviving objects will be copied to the other part, and half of the full memory will be cleared. As shown in the figure below:

gc_copying.gif

  • advantage:
    • Object survival rate is not high Simple and efficient
    • no memory fragmentation
  • shortcoming:
    • Waste half of the memory. If you don’t want to waste half of the space, you need to have additional space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive. Therefore, this algorithm cannot be directly used in the old generation.
    • When the object survival rate is relatively high, it is necessary to copy a large number of objects and reset the pointer, which is not efficient.
  • HotSpot JVM divides the young generation into three parts: 1 Eden area and 2 Survivor areas (called from and to respectively). The default ratio is 8:1:1. Under normal circumstances, newly created objects will be allocated to the Eden area. Because the objects in the young generation basically live and die (more than 90%), the garbage collection algorithm in the young generation uses the copy algorithm. To sum it up:
    • Object survival rate is not high
    • And frequent garbage collection, need to pay attention to efficiency
    • The new generation also optimizes the replication algorithm:
      • eden:from:to = 8:1:1
      • Only 10% of memory resources were wasted

Mark-Sweep

The "Mark Sweep" algorithm is the most basic algorithm among several GC algorithms, because the subsequent collection algorithms are all based on this idea and improved on its shortcomings. As the name suggests, the algorithm is divided into 2 stages:

  • Traverse all objects to mark out garbage objects.
  • Iterate over all objects to remove garbage objects.

    mark_sweep.gif

  • advantage
    • relatively efficient
    • Does not waste general memory
  • shortcoming:
    • Will generate a lot of memory fragmentation
    • Two traversals are not efficient

Mark finishing (compression) algorithm (Mark-Compact)

The mark-and-sort method is an improved version of the mark-and-sweep method. Similarly, in the marking phase, the algorithm also marks all objects as living and dead; the difference is that in the second phase, the algorithm does not directly clean up dead objects, but passes through all living objects Move to one end, and then directly clear the memory outside the boundary.

  • Advantages: The mark/sort algorithm can not only make up for the shortcomings of the scattered memory areas in the mark/clear algorithm, but also eliminate the high cost of halving the memory in the copy algorithm.
    • Does not waste general memory
    • no memory fragmentation
  • Disadvantage: If there are too many surviving objects, more copy operations will be performed in the sorting phase, resulting in reduced algorithm efficiency.
    • Two traversals are not efficient
    • When the object survival rate is relatively high, a large number of copies and reset pointers are required
  • The old generation is generally implemented by mark sweeping or a mixture of mark sweeping and mark finishing.

Generational Collection Algorithm (Generational-Collection)

The generational recycling algorithm is actually a combination of the copy algorithm and the marking method. It is not really a new algorithm. It is generally divided into: the old generation (Old Generation) and the new generation (Young Generation). The old generation is rarely garbage needs For recycling, the new generation has a lot of memory space to reclaim, so different generations use different recycling algorithms to achieve efficient recycling algorithms.

  • Memory efficiency: copy algorithm > mark clearing algorithm > mark sorting algorithm (the efficiency here is just a simple comparison of time complexity, which is not necessarily the case in reality).
  • Memory tidiness: copy algorithm = mark sorting algorithm > mark clearing algorithm.
  • Memory utilization: mark sorting algorithm = mark clearing algorithm > copying algorithm.

garbage collector

STW: Stop The World, service pause time, all garbage collection algorithms or garbage collector optimization, the pursuit of low pause and high throughput , the words that will be mentioned later

Serial/Serial Old Collector

The serial collector is the oldest, most stable and efficient collector, which may generate long pauses and only use one thread to collect. The new generation and the old generation use serial recycling; the new generation copy algorithm, the old generation mark-organize; STW will be used during garbage collection.

  • single thread
  • serial

20180611160921828.png

Serial/Serial Old Collector

The Parallel Scavenge collector is similar to the ParNew collector, and the Parallel collector pays more attention to the throughput of the system. The adaptive adjustment strategy can be enabled through parameters, and the virtual machine collects performance monitoring information according to the current system operation status, and dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput; it is also possible to control the GC time by parameters not to exceed How many milliseconds or ratio; young generation copy algorithm, old generation mark-compression

866783-20190528164222369-2026364357.png

  • A multi-threaded version of serial
  • Generally used in combination with CMS (CMS will be explained below)

CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time. At present, a large part of Java applications are concentrated on the server side of Internet websites or B/S systems. This type of application pays special attention to the response speed of services, and hopes that the system pause time will be the shortest, so as to bring better experience to users.

  • Old Generation Garbage Collection Algorithm
  • divided into four stages
    • Initial markup:
      • May produce STW.
      • Mark GCRoot and the objects that GCRoot can reach directly
    • Concurrent marking:
      • Can be carried out simultaneously with user requests, no STW will be generated
      • Mark objects that were not marked in the previous stage
    • Relabel:
      • Will generate STW
      • Mark objects that were not marked in the previous stage
    • Concurrent purge:
      • Does not generate STW
      • clear garbage objects
  • mark-and-sweep algorithm
  • advantage:
    • low pause
    • high throughput
    • efficient
  • shortcoming:
    • prone to memory fragmentation
      • Solution: It can be solved by parameter setting, and it can be set to use mark finishing after several garbage collections

774371-20180821141926826-1266970658.jpg

G1 collector

G1 is one of the most cutting-edge achievements in current technological development. The mission entrusted by the HotSpot development team is to replace the CMS collector released in JDK1.5 in the future. Compared with the CMS collector, the G1 collector has the following characteristics:

  • For space integration , the G1 collector uses a markup algorithm, which will not generate memory space fragmentation. When allocating large objects, the next GC will not be triggered in advance because no continuous space can be found.
  • Predictable pause is another major advantage of G1. Reducing pause time is the common concern of G1 and CMS. However, in addition to pursuing low pause, G1 can also establish a predictable pause time model, allowing users to clearly specify when It is almost already a characteristic of real-time Java (RTSJ) garbage collectors that no more than N milliseconds should be spent on garbage collection in a time slice of length N milliseconds.

The garbage collector mentioned above, the scope of collection is the entire new generation or the old generation, but this is no longer the case with G1. When using the G1 collector, the memory layout of the Java heap is very different from other collectors. It divides the entire Java heap into multiple independent regions (Regions) of equal size. Although it still retains the concept of the new generation and the old generation, But the new generation and the old generation are no longer physically separated, they are all part of (can be discontinuous) a collection of Regions.

To summarize the features:

  • Garbage collector across young and old generations
  • There is no continuous young generation and old generation, and the heap is divided into discrete equal-sized blocks
  • An additional linked list is maintained to save the garbage rate of each block, and the blocks with high garbage rate are preferentially recycled
  • Markup Algorithm
  • predictable pause time
  • no memory fragmentation

Comparison of Garbage Collectors

1326194-20181017145352803-1499680295.png

If there is a wire between two collectors, they can be used together. The region the virtual machine is in indicates whether it belongs to the young generation or the old generation collector. Garbage collector selection strategy:

  • Client program: Serial + Serial Old;
  • Server programs with throughput priority (for example: computationally intensive): Parallel Scavenge + Parallel Old;
  • Response time priority server program: ParNew + CMS.
  • The G1 collector is implemented based on the mark finishing algorithm, does not generate space fragmentation, can precisely control the pause, divides the heap into multiple independent areas of fixed size, and tracks the degree of garbage accumulation in these areas, and maintains a priority list in the background , each time according to the allowed collection time, give priority to the area with the most garbage (Garbage First).

Image from:

image.png

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/132323857