Understanding the Java Virtual Machine - JVM

Table of contents

1. Getting to know JVM first

Two, JVM execution process

3. Memory area division (JVM runtime data area)

3.1 Local method stack (thread private)

3.2 Program counter (thread private, no concurrency issues)

3.3 JVM virtual machine stack (thread private)

3.4 Heap (thread sharing)

3.5 Metadata area

4. Class loading

4.1 The process of class loading

4.2 Timing of class loading

4.2.1 How to understand that class loading is a recursive process

4.2.2 How does the JVM limit the circular dependencies generated during recursive loading?

4.3 Parental delegation model

4.3.1 What is the Parental Delegation Model?

4.3.2 How do the above class loaders work together?

5. Garbage collection mechanism

5.1 What is GC?

5.2 STW problem (stop the world)

5.3 Introduce ZGC to solve STW problem

5.4 What is recovered by GC?

5.5 GC workflow

5.5.1 Finding Garbage/Determining Garbage

5.5.2 How to clean up garbage and release objects


1. Getting to know JVM first

JVM is the abbreviation of Java Virtual Machine, which means Java virtual machine. It is a virtual computer that can run Java bytecode files on different platforms.

Two, JVM execution process


Before the program is executed, the java code must be converted into a bytecode (class file). The JVM first needs to load the bytecode into the runtime data area (Runtime Data Area) in a certain way through a class loader ( ClassLoader). ), and the bytecode
file is a set of instruction set specifications of the JVM, which cannot be directly handed over to the underlying operating system for execution. Therefore, a specific command parser execution engine (Execution Engine) is required to translate the bytecode into the underlying system instructions and then It is handed over to the CPU for execution, and in this process, it is necessary to call the interface native library interface (Native Interface) of other languages ​​to realize the function of the entire program. This is the responsibility and function of these four main components.

3. Memory area division (JVM runtime data area)

The JVM runtime data area is also called the memory layout, but it should be noted that it is completely different from the Java Memory Model (JMM for short), and belongs to two different concepts. The JVM can be understood as an "application". When the JVM starts, it needs to apply for memory from the operating system, and then the JVM will divide the entire space into several parts according to the needs:

3.1 Local method stack (thread private)

Native means the C++ code inside the JVM, so the native method stack means: the stack space prepared for calling the native (JVM internal method) method.

3.2 Program counter (thread private, no concurrency issues)

In the JVM, the program counter (Program Counter) is a small memory area , which can be regarded as the line number indicator of the bytecode executed by the current thread, or as the memory of the address of the next instruction. It is used to record which instruction the current thread executes.

The program counter is only saved when the thread is switched, and is used to restore the execution site. When a thread executes a Java method, the program counter records the address of the bytecode instruction being executed by the virtual machine. If the Native method is executed, the counter value is empty.

The program counter is the only area in the memory area where OOM (Out Of Memory, insufficient memory) does not occur in the JVM.

3.3 JVM virtual machine stack (thread private)

The virtual machine stack is a memory area in the JVM that is used to store stack frames during method execution: each thread will create a virtual machine stack during execution to store the method executed by the thread. When a method is called, the virtual machine creates a new stack frame for the method and puts it on the virtual machine stack.

Since the virtual machine stack here is private to the thread, it means that there is often more than one virtual machine stack, but multiple. We can check the internal situation of the Java process according to jconsole:

It should be noted that the stack here also follows the principle of "first in, last out".

When a method is called, the virtual machine creates a new stack frame for the method and pushes it onto the top of the current thread's virtual machine stack. This is because the top of the virtual machine stack for the current thread is the last stack frame entered and the first stack frame to be executed. Therefore, pushing a new stack frame onto the top of the stack of the virtual machine can ensure that the calling order of the methods is correct, that is, the method entered last is executed first, and the method entered first is executed last. If a new stack frame is pushed to the bottom of the stack, it will break the first-in-last-out rule of the stack, resulting in program errors.

Each stack frame is used to store the following information: local variable table, operand stack, dynamic link, method exit and other information.

  • Local variable table: The local variable table in the JVM is a tabular data structure used to store local variables during method execution. It is an important part of the JVM virtual machine stack and is used to temporarily store method parameters and local variables defined inside the method, including basic data types (such as int, long, float, double, etc.), object references, and returnAddress types.
  • Operation stack: The operation stack (Operand Stack) is a last-in-first-out (LIFO) data structure, which is used to save temporary variables and operation results when executing methods.
  • Dynamic linking: Dynamic linking refers to deferring the process of linking program code and library functions until the program is running, so that the library functions can be loaded and linked into the program as needed when the program is running, thereby realizing code sharing and dynamic loading. Function
  • Return address: used to indicate which instruction needs to return to to continue execution after the method is executed.

What are the advantages of dynamic linking?

  1. Reduce the size of the executable program : during static linking, all library files will be linked into the executable program, which will result in a very large size of the executable program, while during dynamic linking, the library files will not be linked into the executable program In the execution program, it is dynamically loaded when the program is running, which can reduce the size of the executable program.

  2. Reduce memory usage : The dynamic link library will be shared among multiple processes, while the static link library will be loaded by each process, resulting in increased memory usage. Dynamic linking can reduce memory usage.

  3. More convenient upgrade and maintenance : When using dynamic link, the dynamic link library can be upgraded and maintained independently of the application program, without recompiling the entire program.

Dynamic linking is also used in the JVM. Java programs dynamically load class files and link them at runtime, instead of linking class files into executable programs at compile time. This reduces the size of the application and facilitates upgrades and maintenance.

The following is a simplified diagram of the virtual machine stack:

Explanation: The memory space on the stack follows the method. Calling a method will create a stack frame. After the method is executed, the stack frame will be destroyed.

It should be noted that the stack frames here in the virtual machine stack are continuous frame by frame. In addition, the stack space has an upper limit. When the JVM starts, parameters can actually be set. One of the parameters is to set the size of the stack space.

The size of the thread stack space can be set through the JVM parameter -Xss. Since setting the size of the stack space is not common in actual business scenarios, it will not be introduced here.

3.4 Heap (thread sharing)

The role of the heap: It is used to store object instances and arrays. It is the largest memory area in the JVM. Its memory is divided into two areas: the new generation and the old generation. The new generation occupies 1/3 of the heap memory by default, and the old generation occupies 2/3 of the heap memory by default.

ps: All objects created in the program are stored on the heap, so the member variables of the class are also stored on the heap.

The heap is divided into two areas: the new generation and the old generation. The new generation contains newly created objects, and objects that survive after a certain number of GCs are placed in the old generation. There are also three areas in the new generation: one Endn (Eden) + two Survivor areas (survival areas): S0, S1.

3.5 Metadata area

In the JVM, the metadata area refers to the memory area used to store Java class information. The metadata information of Java classes includes class names, method names, field names, access modifiers, static member variables, etc., and these information are regarded as metadata in the JVM.

What information will be stored in the metadata area?

 The following data must be stored in the metadata of the JVM:

  1. Type information: JVM will store the type information of each class in metadata, including class name, access flag, parent class information, interface information, etc.

  2. Field information: JVM will store field information of each class in metadata, including field name, access flag, type information, etc.

  3. Method information: JVM will store method information of each class in metadata, including method name, access flag, return value type, parameter type, etc.

  4. String constants: In Java programs, string constants are automatically stored in the constant pool, and the data in the constant pool is stored in the metadata of the JVM.

In addition, Java also has some built-in types and constants, such as numbers 0, 1, etc., which will also be stored in the metadata of the JVM.

Note that :

Basic types and string constants modified by final are not necessarily stored in the JVM metadata area. It depends on the specific implementation of the JVM and the optimization strategy for the code. In some cases, if the value of a variable modified by final is a constant determined at compile time and is frequently used in the program, the JVM will store it in the metadata area for quick access, which is also a common optimization strategy.

However, this is not the behavior mandated by the Java Language Specification.

Don't confuse constants modified by final with string constants.

A string instance modified by final is immutable, that is to say, its value cannot be modified after initialization, so it can be regarded as a string constant. However, whether it will be stored in the metadata area of ​​the JVM depends on the specific implementation details and the JVM's optimization strategy for the code. Generally, if the string is frequently used, the JVM will put it into the constant pool or metadata area to improve access efficiency. However, if the string is not frequently used, the JVM may not put it into the constant pool or metadata area, but directly create a new string object in the heap memory.

extension:

Before JDK8, the metadata area in the JVM was allocated in the permanent generation (Permanent Generation). However, since the size of the permanent generation is fixed, when there is too much metadata stored, it will cause memory overflow in the permanent generation. Therefore, starting from JDK8, the metadata area has been moved to an area called "Metaspace" in the heap.

Metaspace no longer has a fixed size limit like the permanent generation, but uses local memory to store metadata by default, so it can grow or shrink dynamically. At the same time, the size limit of the metaspace can also be configured through JVM parameters.

It should be noted that when using the metaspace, we need to pay attention to avoiding the problem of metadata leakage, because the metaspace has no memory size limit, and if there is a leak, the system memory may be exhausted. Therefore, it is necessary to use some tools to detect and monitor the usage of metadata, such as using the jstat tool that comes with the JDK for monitoring, and using third-party memory analysis tools to locate and solve problems.

summary:

Local variables are on the stack, ordinary member variables are on the heap, and static member variables are in the method area/metadata area.

4. Class loading

4.1 The process of class loading

Before explaining class loading, let's take a look at the life cycle of a class:

To put it simply: Class loading is the process of loading a .class file from a file (hard disk) into memory (metadata area).

Someone may ask where the .class (bytecode file) file comes from

  1. Usually we will first create a Java source file (.java file), which contains the definition of one or more classes.
  2. Call the Java compiler (javac) through the command line or the integrated development environment (IDE)
  3. The Java compiler compiles Java source files into Java bytecode files (.class files)
  4. Java bytecode files can run on a Java virtual machine that interprets Java bytecode into machine code and executes it.

Getting to the point: The process of class loading is mainly divided into the following five points: loading, verification, preparation, parsing, and initialization.

We do not analyze the underlying implementation of the JVM here, because these steps are mainly completed based on C++ code.

  1. Loading: Find the .class file and read the contents of the file.
  2. Verification: According to the JVM virtual machine specification, check whether the format of the .class file meets the requirements.
  3. Preparation: Allocate memory space for the class object (occupy a place in the metadata area first), at this time the memory initialization is all 0, and the static members are set to 0.
  4. Parsing: Initialize string constants and convert symbolic references to direct references .
  5. Initialization: call the constructor, initialize members (actually initialize the contents of the class object), execute code blocks, static code blocks, load parent classes...

You can view the JVM's virtual machine specification by browsing the official document Java SE Specifications (oracle.com) :

Some people may ask: What exactly is a class object?

 All the information of the class written in the Java code will be included here, which is reorganized in a binary way.

Many people may be confused about what is a symbolic reference and what is a direct reference.

We know that the analysis of class loading is: initialize string constants, and convert symbol references to direct references.

Analysis: Before the class is loaded, the string constant (you need to allocate a memory space for it to store the actual content of this character, and you need a reference to save the starting address of this memory space), at this time it is in the .class file In this case, what this reference records at this time is not the real address of the string constant, but an offset (equivalent to that in the file). It can also be understood that what is recorded at this time is a placeholder.

After the class is loaded, the string constant is actually put into the memory, and only then is the "memory address". At this time, the reference is actually assigned to the specified memory address.

During the above process:

  • The reference record is what the offset represents - a symbolic reference.
  • The reference record is what the real memory address represents - a direct reference.

Some friends may not understand this offset/placeholder, for example :

It is equivalent to going out to watch a movie in elementary school. At this time, everyone lined up and entered one by one. At this time, Zhang San didn’t know where he should sit in the movie theater at this time (before the class was loaded), but He can know who is behind him (equivalent to knowing the offset), and when he enters the cinema, the teacher organizes the students to sit down (equivalent to after the class is loaded), and only then does Zhang San know where he is sitting. (the reference is finally assigned to a memory address)

4.2 Timing of class loading

  1. A class will be loaded when it is used for the first time in a Java program, generally including the following situations:
  2. Create an instance of the class
  3. Access to static member variables or static methods of a class
  4. Dynamically load classes using the Class.forName() method
  5. Calling a static method of a class
  6. When the child class is loaded, the parent class is also loaded.
  7. When using JDK tools for reflection, analogy, dynamic proxy and other operations, it will also trigger class loading.

It should be noted that class loading is a recursive process. When a class is loaded, other classes it depends on will also be loaded. At the same time, class loading is also a caching mechanism. Classes that have already been loaded will be cached in memory to improve loading efficiency.

4.2.1 How to understand that class loading is a recursive process

In Java, class loading is a recursive process, which means: when a class is loaded, if other classes it depends on have not been loaded, then the JVM will load these dependent classes first, and then load the class itself. These dependent classes may depend on other classes, so a dependency tree is formed, and the JVM needs to recursively load all the classes on this tree.

This recursive process can be represented by the following diagram:

        +----------------+
        |     A.class    |
        +----------------+
                |
                |
        +----------------+
        |     B.class    |
        +----------------+
                |
                |
        +----------------+
        |     C.class    |
        +----------------+

In the diagram above, class A depends on class B, which in turn depends on class C. When loading class A, the JVM will first load class B, then load class C, and finally load class A itself. In actual development, the dependency tree may be very complex, and the recursive loading process will be very deep, which requires the JVM to have good recursive processing capabilities.

The process of recursive loading also needs to be careful to avoid circular dependencies, that is, A depends on B, and B depends on A. This situation will lead to infinite recursion, which will cause the JVM to fall into an infinite loop. In order to avoid this from happening, the JVM needs to control and limit the loading process of classes.

4.2.2 How does the JVM limit the circular dependencies generated during recursive loading?

The JVM uses a parental delegation model to limit the circular dependency problems that may arise during recursive loading.

The parent delegation model can solve the circular dependency problem in class loading, because its loading order starts from the parent class loader and loads down step by step until the required class is found. In this process, if a class has been loaded by the parent class loader, the child class loader will not load the class again, thus avoiding the problem of circular dependency.

Specifically, when a class needs to be loaded, its class loader will first delegate this request to its parent class loader to complete. If the parent class loader cannot complete the loading request, the child class loader will try to load the class itself. This process will continue recursively until the required class is found or all parent class loaders cannot complete the load request.

In this way, the parental delegation model ensures that each class is loaded only once, and that the parent class loader's loader path always precedes the child class loader's loader path in the class loader hierarchy. This avoids circular dependencies. If there is a circular dependency, since the child class loader will delegate to the parent class loader first, only one of the classes will be loaded in the end, thus breaking the circular dependency.

4.3 Parental delegation model

4.3.1 What is the Parental Delegation Model?

If a class loader receives a class loading request, it will not try to load the class by itself first, but will delegate the request to the parent class loader to complete. This is the case for every level of class loader, so all The loading request should eventually be transmitted to the top-level startup class loader. Only when the parent loader reports that it cannot complete the loading request (it does not find the required class in its search scope), the child loader will try. Do the loading yourself.

The parental delegation model actually describes the process of loading (the process of finding .class files) in the class loading process. 

 Analysis: The JVM provides three class loaders by default, each with a division of labor:

  • BootstrapClassLoader: responsible for loading the classes in the standard library (standard library: which classes are required by the Java specification. ps: no matter what kind of JVM implementation, these same classes will be provided)
  • ExtensionClassLoader: Responsible for loading classes in the JVM extension library (extension library: outside the specification, additional functions are provided by the manufacturer/organization that implements the JVM)
  • ApplicationClassLoader: Responsible for loading classes in third-party libraries/user projects provided by users

The above three classes have a "parent-child relationship", a non-parent-subclass relationship, which is equivalent to each class loader having a parent attribute pointing to its own parent class loader.

4.3.2 How do the above class loaders work together?

When loading a class first, it starts from ApplicationClassLoader, but ApplicationClassLoader will hand over the loading task to the father, and let the father do it, so ExtensionClassLoader is going to load, but it is not really loaded, but entrusted to his father , when BootstrapClassLoader is going to load, it also wants to entrust it to its father, but it turns out that its father is null.

When no parent/father is loaded and the class is not found, it will be loaded by itself.

At this time, BootstrapClassLoader will search for the relevant classes in the standard library directory it is responsible for, and if found, it will be loaded, if not found, it will continue to be loaded by the subclass loader.

The ExtensionClassLoader will actually search the directory related to the extension library, if found, it will be loaded, if not found, it will be loaded by the subclass loader.

The ApplicationClassLoader actually searches the directory related to the user project. If it finds it, it loads it. If it does not find it, it will be loaded by the subclass loader. However, since there are no subclasses, it can only throw an exception that cannot be found.

Why is there such a recursive process as above? Isn't it okay to load directly from the top BootstrapClassLoader?

First of all, we need to understand that the above sequence is actually derived from the logic of the JVM code implementation. This code is probably written in a "recursive" way.

The main purpose of this is to ensure that BootstrapClassLoader can be loaded first, and ApplicationClassLoader can be loaded later, so that users can avoid creating some strange classes and causing unnecessary bugs:

For example, if the user writes a java.lang.String class in his own code, according to the above loading process, the JVM will still load the standard library class at this time, and will not load the user-defined class.

It should be noted that the above three class loaders come with the JVM, and user-defined class loaders can also be added to the above process, so that they can be used in conjunction with existing loaders.

On the other hand: a user-defined class loader can be added anywhere in the parent delegation model, but it is usually added as a subloader under an existing loader, which ensures that it conforms to the parent The conventions of the delegation model.

5. Garbage collection mechanism

5.1 What is GC?

GC (Garbage Collection) refers to garbage collection, which is a mechanism for automatically reclaiming useless memory in computer memory management. The GC in Java is automatically executed by the JVM, which can monitor the life cycle of the object during the running of the program and reclaim the memory space that is no longer used. GC can effectively avoid memory leaks and memory-related exceptions such as OutOfMemoryError when the program is running.

What is the difference between a memory leak and a memory overflow?

Memory leak (Memory Leak) means that the dynamically allocated memory is not released during the running of the program, resulting in less and less total memory in the system, eventually exhausting all memory resources and unable to continue running. Memory leaks are usually caused by poor program design or wrong coding, such as forgetting to release dynamically allocated memory, circular references, etc.

Out of Memory (OOM) refers to that when the memory required by the program to run exceeds the memory space that the system can allocate, the program fails to run. Memory overflow is usually caused by the program itself needing to use a lot of memory, improper system configuration, memory leaks, and so on.

Therefore, although both memory leaks and memory overflows will cause the program to fail, their causes and performances are different. In the actual programming process, you need to pay attention to checking and avoiding memory leaks and memory overflows.

5.2 STW problem (stop the world)

Although the GC mechanism is introduced in Java, it allows programmers to write code more easily and is less prone to errors. But GC has a more critical problem that cannot be ignored: the STW problem (stop the world).

The STW (Stop-The-World) problem refers to that when the garbage collector is performing garbage collection, it will suspend the execution of the entire application until the garbage collection is completed. During this process, all application threads will be suspended and will not continue until the garbage collection is complete. This pause time can be very long, and even affect the performance and availability of the application.

Although sometimes the release is not so timely, which may cause the program to freeze, but it will not introduce bugs and the like.

The STW problem occurs because of Java's memory management mechanism. The garbage collector needs to scan the entire Java heap to find and recycle objects that are no longer used. As the application continues to run, new objects may be generated, causing the garbage collector to be unable to accurately determine which objects can be recycled.

In order to alleviate the impact of STW problems, JVM continuously optimizes GC algorithms, such as incremental GC and concurrent GC. At the same time, the application can also use some optimization methods to avoid generating a large number of garbage objects as much as possible, thereby reducing the frequency of GC and the time of STW.

5.3 Introduce ZGC to solve STW problem

ZGC (Z Garbage Collector) is a low-latency garbage collector introduced in the JDK 11 version. Compared with traditional garbage collectors, ZGC handles the memory recovery problem of very large heaps with extremely low pause time.

ZGC has the following characteristics:

  1. Concurrent processing ZGC uses an algorithm similar to G1 to divide the heap space into multiple areas, and uses read barrier and write barrier technologies at the same time to process garbage collection in a concurrent manner while the application is running.

  2. Handling very large heaps ZGC can handle terabytes of heap memory, which means that ZGC can provide extremely low pause times without sacrificing too much throughput.

  3. Software and hardware optimization ZGC is implemented based on C++ and optimized for the x86 platform. It also reduces application pauses during garbage collection through software optimization at the JVM runtime level.

Overall, the main goal of ZGC is to provide low-latency garbage collection in very large heap scenarios. Of course, this is not a perfect solution. Depending on the application scenario, you may need to choose a different garbage collector.

5.4 What is recovered by GC?

There are many areas in the JVM, mainly the heap, stack, program counter, and metadata area. First of all, we need to understand that GC is mainly for heap release.

GC uses "object" as the basic unit to recycle (rather than bytes).

: What the GC recycles is that the entire object is no longer used, and some objects that are used and some are no longer used will not be recycled for the time being

What is an object that is partly used and partly no longer used?

For example, an object has many attributes in it, maybe 10 of them will be used later, and 10 attributes will not be used later.

Summary: To recycle is to recycle the entire object, not the so-called "half" object.

5.5 GC workflow

5.5.1 Finding Garbage/Determining Garbage

How to judge whether an object is garbage, the key idea is to grab this object and see if it has a "reference" pointing to it.

This is because: In Java, there is only one way to use an object, and that is to use it by reference. If an object has a reference pointing to it, it can be used. If an object has no reference pointing to it, it will not be used again. .

Reference counting (non-Java approach, but Python/php)

The general implementation idea is: assign a counter (integer) to each object , every time a reference is created to point to the object, the counter will be +1, and every time the reference is destroyed, the counter will be -1.

As follows:

{
    Test t = new Test();//Test对象的引用计数为1
    Test t2 = t;//t2也指向了t,引用计数为2
    Test t3 = t;//引用计数为3
}
//大括号结束,上述三个引用超出作用域,失效了,此时引用计数就是0了。

Although this method is simple and effective, Java does not use it because it has the following two problems:

  1. There is a lot of waste of memory space (low utilization rate): Assume that each object needs to allocate a counter. This counter is calculated as 4 bytes. There are very few objects in the code. There will be a lot of extra space, especially when each object is relatively small (an object occupies 1k of memory, it does not matter if there are 4 more bytes, but if an object is 4 bytes, then 4 more words are required at this time node, which is equivalent to doubling the volume).
  2. There is a circular reference problem:
    class Test {
        Test t = null;
    }
    
    {
        Test a = new Test(); //1号对象,引用计数是1
        Test b = new Test(); //2号对象,引用计数是1
        a.t = b;             //a.t也指向2号对象,2号对象引用计数是2
        b.t = a;             //b.t也指向1号对象,1号对象引用计数也是2
    }
    
    Next, if the references of a and b are destroyed at the same time, the reference counts of object 1 and object 2 are both -1, but the result is still 1, not 0, but the memory should be released at this time, but because it is not 0, Unable to complete release.

Therefore, when Python/PHP uses reference counting, other mechanisms need to be used to avoid circular references.

Reachability analysis (Java approach)

The core idea of ​​this algorithm is: use a series of objects called "GC Roots" as the starting point, start searching downward from these nodes, and the path traveled by the search is called "reference chain". When an object reaches GC Roots without When any reference chain is connected (from GC Roots to this object is unreachable), it proves that this object is not available. Take the following picture as an example:

The starting point for reachability analysis traversal, called GCroots:

Local variables on the stack, objects in the constant pool, static member variables, etc. (There are many such starting points in a code, and each seven points are traversed down to complete a scanning process)

Here's an example in pseudocode:


class TreeNode{
    int value;
    TreeNode left;
    TreeNode right;
}
public class Demo27 {
    public static TreeNode build() {
        TreeNode a = new TreeNode();
        TreeNode b = new TreeNode();
        TreeNode c = new TreeNode();
        TreeNode d = new TreeNode();
        TreeNode e = new TreeNode();
        TreeNode f = new TreeNode();
        TreeNode g = new TreeNode();
        a.value = 1;
        b.value = 1;
        c.value = 1;
        d.value = 1;
        e.value = 1;
        f.value = 1;
        g.value = 1;

        a.left = b;
        a.right = c;
        b.left = d;
        b.right = e;
        e.left = g;
        c.right = f;
        return a;
    }
    public static void main(String[] args) {
        TreeNode root = build();
    }
}

The structure is as follows:

As shown in the figure above, although there is only a root reference, the above seven objects are all reachable:

  • root-> a
  • root.left -> b
  • root.left.left ->d
  • root.left.right ->e
  • root.left.right.left -> g

root.right.right = null will make f unreachable: f will be garbage collected.

root.right = null will cause c to be unreachable, if c is unreachable, then f must be unreachable.

Some students may think: what if the leaf node refers to the root node? Is it traversing all the time?

The answer is no. Reachability analysis is to mark the object as reachable. If the object has been traversed through other paths, it will be marked as reachable. Since it is already reachable, there is no need to continue Traversed down.

four great quotes

From the above, we can see the function of "reference". In addition to the earliest we used it (reference) to find objects, now we can also use "reference" to judge dead objects. Therefore, in JDK1.2, Java expanded the concept of references and divided references into four types: strong references, soft references, weak references, and phantom references. The four citations are in descending order of strength:

  1. Strong references: Strong references refer to references that commonly exist in program code, similar to "Object obj = new Object()". As long as strong references still exist, the garbage collector will never recycle the referenced object instance.
  2. Soft references: Soft references are used to describe objects that are useful but not necessary. For objects associated with soft references, before the system is about to overflow memory, these objects will be included in the scope of recycling for the second recycling. If there is still not enough memory for this recovery, a memory overflow exception will be thrown. After JDK1.2, the SoftReference class is provided to implement soft references.
  3. Weak references: Weak references are also used to describe non-essential objects. But its strength is weaker than soft references. Objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector starts working, no matter whether the current content is enough or not, it will recycle objects that are only associated with weak references. The WeakReference class is provided after JDK1.2 to implement weak references.
  4. Phantom reference: Phantom reference is also called ghost reference or phantom reference, which is the weakest kind of reference relationship. Whether an object has a virtual reference will not affect its lifetime at all, and an object instance cannot be obtained through a virtual reference. The only purpose of setting a phantom reference to an object is to receive a system notification when the object is reclaimed by the collector. After JDK1.2, the PhantomReference class is provided to implement virtual references.

Summarize:

Reachability analysis requires an operation similar to "tree traversal", which is definitely slower than reference counting.

But it doesn't matter if the speed is slow. The above-mentioned reachability analysis traversal operation does not need to be executed all the time. It only needs to be analyzed once in a while.

5.5.2 How to clean up garbage and release objects

mark clear

The "mark-and-sweep" algorithm is the most basic collection algorithm. The algorithm is divided into two stages of "marking" and "clearing": first mark all the objects that need to be recycled, and after the marking is completed, all marked objects are collected uniformly.

There are two main disadvantages of the "mark-clear" algorithm:

  1. Efficiency issues: Marking and clearing are both inefficient processes
  2. Space problem: Since the disadvantage of the mark-and-sweep algorithm is that it generates a large number of memory fragments, which are difficult to reuse, the efficiency of the mark-and-sweep algorithm is very low for large objects or long-lived objects

Description of memory fragmentation : When using the mark-and-sweep algorithm in the JVM for garbage collection, when the object is marked as garbage and recycled, the memory space it occupies will be released accordingly, but this piece of memory may not be continuous, but are scattered in different locations of the heap memory. This scattered, non-contiguous memory space is called memory fragmentation.

For example : the total free space is 10k, divided into 1k, a total of 10, if you apply for 2k memory at this time, the application will fail.

copy algorithm

The copy algorithm is an algorithm for garbage collection that divides the heap memory into two areas, generally called the "from" area and the "to" area. During the GC process, all surviving objects will be copied from the "from" area to the "to" area, and then all memory in the "from" area will be cleared.

The copy algorithm is a very simple garbage collection algorithm. It does not need to consider how to solve the fragmentation problem, because after each collection, the memory usage will become the state that the "to" area is used, but the "from" area is not used.

shortcoming:

Since all surviving objects need to be copied to the "to" area, it takes a certain amount of time and space. If there are fewer garbage and more valid objects, the copying cost will be greatly increased, and the space utilization rate will be low.

markup

The mark-compact algorithm is a memory recovery algorithm, and it is also an improvement based on the mark-clear algorithm. Different from the mark-sweep algorithm, the mark-compact algorithm will move these objects to one end of the memory after marking reachable objects, and then release all the memory at the other end. This can ensure the continuity of memory and avoid the problem of memory fragmentation.

ps: This will be a bit like moving an array.

 Based on the above strategy, a new algorithm is introduced: generational recycling

The generational algorithm is different from the three algorithms mentioned above. The generational algorithm realizes different areas and different garbage collection strategies through area division, so as to achieve better garbage collection. This is like our country's one country, two systems policy, which is more in line with local rules for different situations and geographical settings, so as to achieve better management. This is the design idea of ​​the time-based algorithm.

Analysis: Objects that have just been newly released, that is, objects with an age of 0, are placed in the Eden area. After a round of GC, the objects are placed in the Survival area. Although it seems that the Survival area is relatively small, the Eden area is relatively large. But in general it is enough (because most Java objects are "live and die" and have a short life cycle).

From the Eden area to the surviving area, the copy algorithm is used . After arriving in the surviving area, it must also be periodically tested by GC. If it becomes garbage, it will be released. If it is not garbage, it will be copied to another surviving area. (only one of the two surviving areas will be used at the same time), and the two will be copied to each other (implemented by the copy algorithm) .

If an object has been copied back and forth between the two surviving areas many times, it will enter the old age at this time.

The old generation is the oldest object with a long life cycle. For the objects in the old generation, GC scans are also performed, but the frequency is lower. However, if the objects in the old generation also become garbage, they will be released using the marking method. .

Summarize:

The typical garbage collection algorithms in the above GC are roughly divided into two strategies:

How to identify garbage, how to clean up garbage.

In fact, when the JVM is implemented, there will be certain differences, because there are many different garbage collectors (garbage collection implementations).

The specific implementation of the collector will be carried out based on the above algorithm ideas, but there will be some changes/improvements.

Different garbage collectors may have different focuses: for example, some pursue fast scanning, some pursue good scanning, and some pursue less disturbance to users. (STW as short as possible)
 

Guess you like

Origin blog.csdn.net/qq_63218110/article/details/130601425