Java Virtual Machine (JVM) Interview Questions

 JVM

Tell me about the main components of the JVM and their functions?

In general, the method area and the heap are memory areas shared by all threads; while the operation of the virtual machine stack, local method stack, and program counter is a thread-private memory area, and the runtime data area is what we often call the JVM memory

  • Class loading subsystem : according to the given fully qualified class name (such as: java.lang.Object) to load the class file into the method area in the runtime data area, the class loader is mainly divided into three types: startup class loading class loader, extension class loader, and application class loader.
  • Java heap: The heap is to allocate memory for objects to store object instances
  • Method area: It is used to store data such as class information, constants, static variables, and code compiled by the just-in-time compiler that have been loaded by the virtual machine (note that there are no local variables here, and local variables are stored in the stack)
  • Program counter: used to record the bytecode instruction address executed by the current thread
  • Virtual machine stack [Java method]: When each java method is executed, a stack frame (Stack Frame) will be created in the virtual machine stack at the same time to store information such as local variable table, operand stack, dynamic link, return method address, etc. . The process of each method being called until the execution is completed corresponds to the process of a stack frame from being pushed to being popped in the virtual machine stack. The Java virtual machine stack is also private to the thread, and its life cycle is the same as that of the thread.
  • Native method stack: Similar to the virtual machine stack, it is used to support Java to call native methods (Native Method). Native methods are methods written in C, C++, etc., which can directly call the underlying interface and hardware resources of the operating system to realize functions that Java programs cannot complete.
  • Execution engine (interpreter) : Responsible for translating compiled bytecode instructions into machine code and executing them. There are two main types of execution engines: interpreters and just-in-time compilers.
    • Extension: The interpreter is the most basic execution engine, which translates each bytecode instruction into machine code one by one and executes it. The advantage of the interpreter is that it is simple, easy to implement, and can run on any platform, but the disadvantage is that the execution efficiency is low, because each execution needs to translate bytecode into machine code. In order to improve execution efficiency, JVM also provides a just-in-time compiler (Just-In-Time Compiler, JIT Compiler), which compiles the bytecode of the entire method into local machine code and then executes it. The advantage of the just-in-time compiler is high execution efficiency, but the disadvantage is that it takes a long time to compile and takes up a lot of memory space.
    • In actual operation, the JVM will automatically choose to use an interpreter or a just-in-time compiler according to the running conditions of the program. For some hot codes (that is, codes that are frequently executed), the JVM will compile them into local machine codes and cache them to improve execution efficiency. This technique is called just-in-time compilation (Just-In-Time Compilation, JIT Compilation).
  • Native library interface : Native Interface (JNI) is a mechanism provided by Java for calling native methods (Native Method) in Java programs. Native methods are methods written in C, C++, etc., which can directly call the underlying interface and hardware resources of the operating system to realize functions that Java programs cannot complete.
    • Extension: JNI allows Java programs to call functions in the native library through native methods, and interact Java programs with native code. In a Java program, you can use the native keyword to declare a native method, and then call the native method through JNI. In the local code, you can access Java objects, call Java methods, etc. through the JNIEnv structure and Java Native Interface functions. The use of JNI needs to follow certain specifications, including naming specifications, data type mapping specifications, and exception handling specifications. To use JNI, you need to write local codes such as C and C++, so you need to be familiar with the syntax and features of the local programming language. At the same time, using JNI may bring some performance losses, because operations such as local method calls and data conversions are required. It should be noted that you need to be cautious when using JNI, because local methods may access the underlying interfaces and hardware resources of the operating system, posing security risks. When using JNI, you need to ensure the safety and reliability of native methods, and abide by relevant security specifications and best practices.

Sao Dai understands: The difference between the local method stack and the virtual machine stack is that the virtual machine stack is used to store the information of the Java method, while the local method stack is used to store the information of the local method. The local method stack is also thread-private, and each thread has its own local method stack without interfering with each other. It should be noted that the difference between the local method stack and the virtual machine stack is that the methods processed are different, but their functions and structures are very similar. Therefore, when implementing the JVM, the local method stack and the virtual machine stack are usually combined into one stack for implementation, which is why there is only a virtual machine stack in the HotSpot JVM and no local method stack.

The role of each component of the JVM

First, the Java code is converted into bytecode by the compiler, and then the class loader (ClassLoader) loads the bytecode into the memory, and puts it in the method area of ​​the runtime data area (Runtime data area), and the bytecode The code file is only a set of instruction set specifications for the JVM, and cannot be directly handed over to the underlying operating system for execution. Therefore, an execution engine (Execution Engine) is required to interpret and execute the bytecode at runtime, convert the bytecode into machine code and execute it. At the same time, it provides various resources and services required for the operation of Java programs, and in this process, it is necessary to call the local library interface (Native Interface) of other languages ​​​​to realize the functions of the entire program (calling the native library interface because the bottom layer of different operating systems is C Or written in C++ language, and then the virtual machine needs to call the native library interface to call the native library method if the virtual machine wants to convert this class bytecode into the machine code of the corresponding operating system, and there are interactions in other languages)

Java program operation mechanism detailed description

Simply put, the operating mechanism of a Java program is divided into three steps: writing, compiling, and running.

1. Write

Writing refers to editing program code in the Java development environment

2. compile

Compilation refers to the process of using a Java compiler to troubleshoot source files. After compilation, a bytecode file with the suffix .class will be generated, which can be normally read by the interpreter of the Java Virtual Machine (JVM).

3. run

Running refers to using a Java interpreter to translate a bytecode file into machine code, execute it, and display the result. A bytecode file is an intermediate code that has nothing to do with any specific machine environment and operating system environment. It is a binary file, which is an object code file generated after a Java source file is compiled by a Java compiler. Neither programmers nor computers can directly read the bytecode file, it must be interpreted and executed by a dedicated Java interpreter, so Java is a language that is interpreted and run on the basis of compilation.

When running a Java program, the JVM is first started, and then it is responsible for interpreting and executing the Java bytecode, and the Java bytecode can only run on the JVM. In this way, the Java bytecode program can be separated from the specific hardware platform and operating system environment by using the JVM. As long as the JVM for a specific specific platform is installed on a different computer, the Java program can run without having to consider The current specific hardware platform and operating system environment do not need to consider the platform on which the bytecode file is generated. The JVM hides the specific differences between different software and hardware platforms, thus realizing the real binary code-level cross-platform transplantation. The JVM is the platform-independent foundation of Java, and the cross-platform feature of Java is realized by running Java programs in the JVM. Next, let's take a look at the running process of Java, as shown in the figure.

Java running process

In the figure, from the compiled Java source file, to compiling it into a bytecode file, and then interpreting it into machine languages ​​of different platforms through the JVM interpreter, these machine languages ​​are finally executed on different platforms to obtain java programs, and then the The running results of the program are displayed to the user, which is a complete Java running process.

Talk about the JVM runtime data area

During the execution of a Java program, the Java virtual machine divides the memory area it manages into several different data areas . These areas have their own purposes, as well as the time of creation and destruction. Some areas exist with the start of the virtual machine process, and some areas are established and destroyed depending on the start and end of the thread. The memory managed by the Java virtual machine is divided into the following areas:

The runtime data area of ​​different virtual machines may be slightly different, but they all comply with the Java Virtual Machine Specification. The areas stipulated in the Java Virtual Machine Specification are divided into the following five parts:

  • Program Counter Register: The line number indicator of the bytecode executed by the current thread. The job of the bytecode parser is to change the value of this counter to select the next bytecode instruction to be executed, branch , loop, jump, exception handling, thread recovery and other basic functions all need to rely on this counter to complete;
  • Java Virtual Machine Stacks (Java Virtual Machine Stacks): used to store information such as local variable tables, operand stacks, dynamic links, and method exits;
  • Native Method Stack: It has the same function as the virtual machine stack, except that the virtual machine stack serves the Java method, while the native method stack serves for the virtual machine to call the Native method;
  • Java heap (Java Heap): The largest piece of memory in the Java virtual machine is shared by all threads, and almost all object instances allocate memory here;
  • Method area (Methed Area) : Used to store data such as class information, constants, static variables, and just-in-time compiled code that have been loaded by the virtual machine.

deep copy and shallow copy

Shallow copy: It only points to the copied memory address. If the original address changes, the shallow copied object will also change accordingly.

Deep copy: Open up a new memory address in the computer to store the copied object.

shallow copy

Shallow copy (shallowCopy) just adds a reference to the existing memory address, copies a new object through shallow copy, and points the reference of the newly created object to the memory address of the original object

Sao Dai's understanding: that is, the new object and the original object both use the same object, and the two references point to the same object. So modifying the property value of the new object will modify the property value of the original object

another angle of description

Shallow copy: Create a new object, then copy the non-static fields of the current object to the new object, if the field is a value type, then copy the field; if the field is a reference type, copy the reference but not copy referenced object. Therefore, the original object and its copy refer to the same object

shallow copy demo

package com.ys.test;

public class Person implements Cloneable{
    public String pname;
    public int page;
    public Address address;
    public Person() {}
    
    public Person(String pname,int page){
        this.pname = pname;
        this.page = page;
        this.address = new Address();
    }
    
    @Override
    protected Object clone() throws CloneNotSupportedException {
        return super.clone();
    }
    
    public void setAddress(String provices,String city ){
        address.setAddress(provices, city);
    }
    public void display(String name){
        System.out.println(name+":"+"pname=" + pname + ", page=" + page +","+ address);
    }

    public String getPname() {
        return pname;
    }

    public void setPname(String pname) {
        this.pname = pname;
    }

    public int getPage() {
        return page;
    }

    public void setPage(int page) {
        this.page = page;
    }
    
}
package com.ys.test;

public class Address {
    private String provices;
    private String city;
    public void setAddress(String provices,String city){
        this.provices = provices;
        this.city = city;
    }
    @Override
    public String toString() {
        return "Address [provices=" + provices + ", city=" + city + "]";
    }
    
}

Below we generate a Person object and call its clone method to copy a new object.

Note: The clone provided by the Object class can only implement shallow copy. To call the clone method of the object, the class must implement the Cloneable interface and override the clone method.

@Test
public void testShallowClone() throws Exception{
    Person p1 = new Person("zhangsan",21);
    p1.setAddress("湖北省", "武汉市");
    Person p2 = (Person) p1.clone();
    System.out.println("p1:"+p1);
    System.out.println("p1.getPname:"+p1.getPname().hashCode());
    
    System.out.println("p2:"+p2);
    System.out.println("p2.getPname:"+p2.getPname().hashCode());
    
    p1.display("p1");
    p2.display("p2");
    p2.setAddress("湖北省", "荆州市");
    System.out.println("将复制之后的对象地址修改:");
    p1.display("p1");
    p2.display("p2");
}

operation result

First look at the original class Person that implements the Cloneable interface and overrides the clone method. It also has three attributes, a pname defined by a reference type String, a page defined by a basic type int, and a reference type Address, which is a custom class , this class also contains two attributes pprovices and city.

Next, look at the test content. First, we create an object p1 of the Person class, whose pname is zhangsan, page is 21, and the two attributes of the address class Address are Hubei Province and Wuhan City. Then we call the clone() method to copy another object p2, and then print the contents of these two objects.

Print the result from line 1 and line 3:

p1:com.ys.test.Person@349319f9

p2:com.ys.test.Person@258e4566

It can be seen that these are two different objects.

From the object contents printed in lines 5 and 6, the contents of the original object p1 and the cloned object p2 are exactly the same.

In the code, we just changed the attribute Address of the cloned object p2 to Jingzhou City, Hubei Province (the original object p1 is Wuhan City, Hubei Province), but from the print results of the 7th and 8th lines, the original object p1 and the cloned object p2 The Address properties are all modified.

That is to say, the attribute Address of the object Person, after being cloned, actually just copies its reference, and they still point to the same heap memory space. When the attribute Address of one object is modified, the other will also change accordingly.

deep copy

Deep copy (deepCopy) is to add a reference, apply for a new memory at the same time, and point the newly added reference to the new memory. In the case of deep copy, there will be no shallow copy to release the same memory when releasing the memory. Error, create a new object through deep copy, re-open a piece of memory in the heap, copy the original object, and then point the reference of the new object to the newly opened memory address. When you modify any content of one object, it will not affect the content of the other object.

another angle description

Deep copy: Create a new object, and then copy the non-static fields of the current object to the new object, regardless of whether the field is a value type or a reference type, an independent copy is copied. When you modify any content of one object, it will not affect the content of the other object

class loading method

The Java class loading methods include the command line method, the Class.forName() method and the class loader method. The three methods and their cases are described below.

1. Command line mode

The command line method is the simplest class loading method, using the java command to explicitly load the class. For example, we can use the following command to load the HelloWorld class:

java HelloWorld

2. Class.forName() method

The Class.forName() method uses the static method forName() of the Class class to load the class. For example, we can use the following code to load the HelloWorld class:

Class.forName("HelloWorld");

3. Class loader method

The class loader method is the most commonly used class loading method, and the class loader ClassLoader is used to load classes. For example, we can use the following code to load the HelloWorld class:

ClassLoader.getSystemClassLoader().loadClass("HelloWorld");

Sao Dai understands: All classes in Java need to be loaded into the JVM by a class loader before they can run. The class loader itself is also a class, and its job is to read the class file from the hard disk into memory. When writing programs, we hardly need to care about class loading, because these are implicitly loaded, unless we have special usage, such as reflection, we need to explicitly load the required classes. The loading of Java classes is dynamic. It does not load all the classes at one time and then run them. Instead, it ensures that the basic classes (such as base classes) that the program runs are fully loaded into the jvm. As for other classes, you need to only loaded when. This is of course to save memory overhead.

What is a class loader? What are class loaders?

For any class, the uniqueness in the JVM needs to be established by the class loader that loads it and the class itself. Each class loader has an independent class namespace. The class loader loads the class file into the JVM memory according to the specified fully qualified name, and then converts it into a class object.

There are mainly four class loaders:

  1. The startup class loader (Bootstrap ClassLoader) is used to load the java core class library, which cannot be directly referenced by the java program (it is part of the virtual machine itself, used to load the path in the Java_HOME/lib/ directory, or specified by the -Xbootclasspath parameter in and recognized by the virtual machine)
  2. Extensions class loader (extensions class loader): It is used to load the extension library of Java. Implementations of the Java Virtual Machine provide an extension library directory. The class loader looks for and loads Java classes in this directory (responsible for loading all class libraries in the \lib\ext directory or the path specified by the Java.ext.dirs system variable)
  3. Application class loader (system class loader): It loads Java classes according to the class path (CLASSPATH) of the Java application. Generally speaking, the classes of Java applications are loaded by it. It can be obtained through ClassLoader.getSystemClassLoader() (responsible for loading the specified class library on the user class path (classpath), we can use this class loader directly. In general, if we do not have a custom class loader, this is the default loader device)
  4. Custom class loader , implemented by inheriting the java.lang.ClassLoader class

Sao Dai extension: The Java core class library contains the basic functions of the Java SE platform, which is the foundation and core of Java development. The Java core class library includes the following aspects:

1. Basic data types and wrapper classes: The Java core class library provides 8 basic data types and their corresponding wrapper classes, including byte, short, int, long, float, double, char, and boolean.

2. String and text processing: The Java core class library provides a wealth of string and text processing classes, including String, StringBuffer, StringBuilder, regular expressions, character encoding conversion, etc.

3. Collection framework: The Java core class library provides a wealth of collection framework classes, including List, Set, Map, Queue, Stack, etc., and corresponding implementation classes, such as ArrayList, LinkedList, HashSet, TreeSet, HashMap, TreeMap, etc.

4. IO and file processing: The Java core class library provides a wealth of IO and file processing classes, including File, InputStream, OutputStream, Reader, Writer, FileReader, FileWriter, etc.

5. Network programming: The Java core class library provides classes related to network programming, including Socket, ServerSocket, URL, URLConnection, etc.

6. Multi-threaded programming: The Java core class library provides classes related to multi-threaded programming, including Thread, Runnable, ThreadLocal, Lock, Semaphore, CountDownLatch, etc.

7. Reflection and dynamic proxy: The Java core class library provides classes related to reflection and dynamic proxy, including Class, Method, Constructor, Field, Proxy, etc.

In addition to the above aspects, the Java core class library also includes many other classes and interfaces, such as date and time processing, digital and mathematical calculations, exception handling, internationalization and localization, etc.

The Java core class library is the foundation and core of Java development. It is very important for Java programmers to master the use of the Java core class library.

What is the Parental Delegation Model? Why use the parent delegation model?

1. What is the parental delegation model?

Parental delegation mechanism : When loading a class, first check whether the class has been loaded. If it has been loaded, it will return directly, otherwise it will be delegated to the parent loader for loading. This is a recursive call, which is delegated layer by layer ( If the parent class loader still has its parent class loader, it is further delegated upwards, recursively in turn), and the request will eventually reach the top-level class loader. If the topmost class loader (startup class loader) can complete the class loading task, it will return successfully. When the class cannot be loaded, it will be delegated to the subclass loader layer by layer.

2. Why use the parent delegation model?

Parental delegation ensures that each class is the same class in each class loader. A very obvious purpose is to ensure the loading security of the official Java class library <JAVA_HOME>\lib and the extended class library <JAVA_HOME>\lib\ext , will not be overwritten by developers. To put it bluntly, it is to ensure the uniqueness of the class.

If there is no parental delegation model but each class loader loads itself, if the user writes a class with the same name as java. different Object classes.

For example, the class java.lang.Object, which is stored in rt.jar, no matter which class loader wants to load this class, it will eventually be delegated to the startup class loader to load (if the parental delegation model is used, any class will only have one Class loader to load, because he will delegate up first, and then delegate down until he finds the only class loader to load this class), so the Object class is the same in the various class loader environments of the program kind.

If the developer develops an open source framework by himself, he can also customize the class loader and use the parental delegation model to protect the classes that his framework needs to load from being overwritten by the application. The parental delegation model is not a model with mandatory constraints, but a class loader implementation recommended by Java designers to developers. This delegation and load order are completely breakable. If you want to customize the class loader, you need to inherit ClassLoader and rewrite findClass. If you want to not follow the class loading order of parent delegation, you also need to rewrite loadClass.

Extension: The uniqueness of a class is determined by the class loader that loads it and the class itself (the fully qualified name of the class + the instance ID of the class loader as the unique identifier). Comparing whether two classes are equal (including equals(), isAssignableFrom(), isInstance(), and instanceof keywords of Class objects, etc.), only makes sense if the two classes are loaded by the same class loader. Otherwise, even if these two classes come from the same Class file and are loaded by the same virtual machine, as long as the class loaders that load them are different, the two classes must not be equal.

Talk about JVM tuning tools?

JDK comes with many monitoring tools, all of which are located in the bin directory of JDK, the most commonly used of which are the two view monitoring tools jconsole and jvisualvm.

jconsole: A graphical tool that comes with Java, which can be used to monitor the running status of the JVM, memory usage, thread status, etc., and provides a wealth of analysis and diagnostic tools, such as thread analysis, heap dump, MBean operations, etc. . jconsole can easily interact with Java programs, and supports a variety of plug-ins, such as Visual GC, JMX Console, etc.

jvisualvm: A visualization tool that comes with Java, which can be used to monitor the running status of the JVM, memory usage, thread status, etc., and provides a wealth of analysis and diagnostic tools, such as CPU analysis, memory analysis, thread analysis, etc. jvisualvm can easily interact with Java programs, and supports a variety of plug-ins, such as Visual GC, Java Flight Recorder, etc.

Tell me about the role of the JVM?

When developing a Java program, we usually use a Java compiler to compile the Java source code into bytecode, and then hand the bytecode to the JVM for execution. During execution, the JVM converts bytecode into machine code, and allocates and manages various resources required by Java programs in memory, such as objects, threads, stacks, and so on. At the same time, JVM also provides a garbage collection mechanism to automatically reclaim unused memory to ensure the memory usage efficiency and security of Java programs.

Therefore, the compilation of Java source code into bytecode is done by the Java compiler, and the main function of the JVM is to interpret and execute the bytecode at runtime, and provide various resources and services required for the operation of Java programs.

Talk about the difference between the stack?

Heap and stack are two different memory allocation areas, and they are very different in memory management and usage.

1. Heap (Heap): The heap is a memory area dynamically allocated when the Java program is running, and is used to store Java objects and arrays. The size of the heap can be set through JVM parameters, and the size of the heap directly affects the performance and stability of the Java program. The heap is shared, and all threads can access the same heap, so synchronization is required.

2. Stack (Stack): The stack is a data structure when a Java program is running, which is used to store information and local variables of method calls. Each thread has its own stack, and the size of the stack is fixed, which is allocated and managed by the JVM at runtime. The stack is thread-private and cannot be accessed by other threads, so no synchronization is required.

The difference between the heap and the stack is mainly reflected in the following aspects:

1. Allocation method: the heap is dynamically allocated, and is allocated and managed by the JVM at runtime as needed; the stack is statically allocated, and is allocated and managed by the compiler at compile time.

2. Memory management: The memory management of the heap is performed by the JVM, including memory allocation, release, and garbage collection; the memory management of the stack is performed by the compiler, including memory allocation and release.

3. Allocation size: The size of the heap can be set through JVM parameters and can be adjusted dynamically; the size of the stack is fixed, determined by the compiler at compile time, and cannot be adjusted dynamically.

4. Access method: the heap is shared, and all threads can access the same heap, which needs to be synchronized; the stack is private to the thread, cannot be accessed by other threads, and does not need to be synchronized.

Sao Dai understands: Static variables are placed in the method area, and static objects are still placed on the heap.

JAVA class loading process/execution process of class loading/java class loading mechanism/principle mechanism of JVM loading Class files?

The process of class loading is mainly divided into three parts: (initial link addition, verification solution)

  • load
  • Link
  • initialization

Links can be subdivided into three subsections:

  • verify
  • Prepare
  • analyze

Sao Dai’s understanding: first load the class bytecode file from various sources into the memory through the class loader (loading), and then verify whether the loaded bytecode conforms to the virtual machine specification (verification), and the Java virtual machine is a class variable Allocate memory and assign default initial values ​​(preparation), the virtual machine replaces all symbolic references such as class names, method names, and field names with specific memory addresses, that is, direct references (parsing), and execute class constructors to class variables Do custom initialization (initialization)

1. Load

Simply put, loading refers to loading class bytecode files from various sources into memory through a class loader.

There are two important points here:

  • source of bytecode
    • General loading sources include .class files compiled from local paths
    • From the .class file in the jar package
    • Get from remote network
    • Dynamic Proxy Live Compilation
  • class loader
    • start class loader
    • extension class loader
    • application class loader
    • custom class loader

Note: Why is there a custom class loader?

  • On the one hand, because java code is easy to be decompiled, if you need to encrypt your own code, you can encrypt the compiled code, then decrypt it by implementing your own custom class loader, and finally load it.
  • On the other hand, it is also possible to load code from non-standard sources, such as from network sources, then you need to implement a class loader yourself to load from the specified source.

2. Verification

Mainly to ensure whether the loaded bytecode conforms to the virtual machine specification

  • For the verification of the file format, for example, are there any unsupported constants in the constants? Is there any non-standard or additional information in the document?
  • For the verification of metadata, such as whether the class inherits the final modified class? Do the fields and methods in the class conflict with the parent class? Is there an unreasonable overload?
  • For bytecode verification, ensure the rationality of program semantics, such as ensuring the rationality of type conversion.
  • For the verification of symbol references, for example, whether the corresponding class can be found through the fully qualified name in the verification symbol reference? Check whether the accessibility (private, public, etc.) in the symbol reference can be accessed by the current class?

3. Preparation (default initialization)

Mainly allocate memory for class variables and assign default initial values

In particular, it should be noted that the initial value is not the initialized value specifically written in the code, but the default initial value of the Java virtual machine according to different variable types.

For example, the initial value of 8 basic types is 0 by default; the initial value of reference type is null; the initial value of constant is the value set in the code, final static tmp = 456, then the initial value of tmp at this stage is 456

4. Analysis

The process of replacing symbolic references in the constant pool with direct references.

Two important points:

  • Symbolic references . That is, a string, but this string gives some information that can uniquely identify a method, a variable, and a class.
  • direct quote . It can be understood as a memory address, or an offset. For example, a class method, a direct reference to a class variable is a reference to the method area; and an instance method, a direct reference to an instance variable is the offset from the head reference of the instance to the location of the instance variable

For example, the method hello() is called now, and the address of this method is 1234567, then hello is a symbolic reference, and 1234567 is a direct reference.

In the parsing phase, the virtual machine replaces all symbolic references such as class names, method names, and field names with specific memory addresses or offsets, that is, direct references.

5. Initialization (custom initialization)

This stage is mainly to customize the initialization of class variables, which is the process of executing the class constructor. In fact, it is to assign values ​​​​to class variables according to the values ​​​​set by the programmers themselves.

int a = 5;

In the preparation stage, memory space is allocated for this a variable, and its default value is assigned, that is, a=0 in the preparation stage, because the default value of the int type is 0, and then in the initialization stage, the value set by the programmer is assigned to the corresponding Class variable, so assign 5 to a, a=5.

Note: Several stages of class loading are only for class variables, so variable assignments other than class variables will not be reflected in the class loading process

What are the commonly used JVM tuning parameters?

The meaning of the three major performance tuning parameters of the JVM -Xms -Xmx -Xss

-Xss: Specifies the size of each thread virtual machine stack

-Xms: the initial value of the heap

-Xmx: The maximum value that the heap can reach

For example

-Xms2g: initialize the heap size to 2g;

-Xmx2g: The maximum heap memory is 2g;

Commonly used JVM tuning parameters:

-XX:NewRatio=4: Set the memory ratio of young and old generation to 1:4;

-XX:SurvivorRatio=8: Set the ratio of Eden and Survivor in the new generation to 8:2;

-XX:+UseParNewGC: Specifies to use the ParNew + Serial Old garbage collector combination;

-XX:+UseParallelOldGC: Specifies to use the ParNew + ParNew Old garbage collector combination;

-XX:+UseConcMarkSweepGC: Specifies to use the CMS + Serial Old garbage collector combination;

-XX:+PrintGC: Enable printing gc information;

-XX:+PrintGCDetails: Print gc details.

object

What are the ways to create objects?

via the new keyword

This is the most common way to create an object by calling the constructor with or without parameters of the class through the new keyword. For example Object obj = new Object();

Through the newInstance() method of the Class class

Realized by reflection, this default is to call the no-argument constructor of the class to create the object. For example, Person p2 = (Person) Class.forName("com.ys.test.Person").newInstance();

Through the newInstance method of the Constructor class

This and the second method are implemented through reflection. Objects are created by specifying a constructor with the newInstance() method of the java.lang.relect.Constructor class. For example, the following specifies the first constructor of the Person class to create an object (getConstructors()[0] refers to the first constructor)

Person p3 = (Person) Person.class.getConstructors()[0].newInstance();

In fact, the second method uses the newInstance() method of Class to create an object, and its internal call is still the newInstance() method of Constructor.

Using the Clone method

Clone is a method in the Object class. Through the object A.clone() method, an object B with the same content as the object A will be created. Clone clone, as the name implies, is to create an object that is exactly the same. This is a shallow clone

Person p4 = (Person) p3.clone();

deserialization

Serialization is to store the Java object data in the heap memory in a certain way into a disk file or pass it to other network nodes (transmission on the network). Deserialization is the process of restoring object data in disk files or object data on network nodes into a Java object model.

The process of Java object creation

The creation process of ordinary Java objects, only ordinary Java objects are discussed here, arrays and Class objects are not included.

Simply put, the process of creating objects in java

When the virtual machine encounters a new instruction, it first checks whether the parameter of this instruction can locate a symbolic reference of a class in the constant pool, and checks whether the class represented by the symbolic reference has been loaded (loaded, parsed, and initialized). If not, the corresponding class loading process must be performed first, and then the virtual machine will allocate memory for the newly created object. After the memory allocation is completed, the virtual machine needs to initialize the allocated memory space to zero (excluding the object header ), initialize the object header, such as which class the object is an instance of, how to find the metadata information of the class, the hash code of the object, the GC generation age of the object, and other information. This information is stored in the Object Header of the object. Finally, the init method is executed to initialize the object according to the programmer's wishes.

The process of Java object creation

  • new instruction

When the virtual machine encounters a new instruction, it first checks whether the parameter of this instruction can locate a symbolic reference of a class in the constant pool, and checks whether the class represented by the symbolic reference has been loaded (loaded, parsed, and initialized). If not, the corresponding class loading process must be executed first.

  • Allocate memory

The virtual machine will then allocate memory for the newly created object. The size of memory required by an object is fully determined after class loading is complete. There are two methods of allocation: "Bump the Pointer" and "Free List", which are determined by whether the garbage collector used has the function of compaction or not.

  • initialization (default initialization)

After the memory allocation is completed, the virtual machine needs to initialize all the allocated memory space to zero value (excluding the object header). This step ensures that the instance fields of the object can be used directly in the Java code without assigning an initial value, and the program can Access to the zero value for the data type of these fields.

  • Initial settings of the object

Next, the virtual machine needs to perform necessary settings on the object header, such as which class the object is an instance of, how to find the metadata information of the class, the hash code of the object, the GC generation age of the object, and other information. This information is stored in the Object Header of the object. Depending on the current running state of the virtual machine, such as whether to enable bias lock, etc., the object header will have different setting methods.

  • <init> method (custom initialization)

After the above work is completed, from the perspective of the virtual machine, a new object has been generated, but from the perspective of the Java program, the object has not yet been created, and the init method is executed to convert the object according to the programmer's wishes. Initialize (should assign the parameters in the constructor to the fields of the object), so that a truly usable object is completely generated.

Java object memory layout

In the HotSpot virtual machine, the layout of objects stored in memory can be divided into three areas: object header (Header), instance data (Instance Data), and its padding (Padding).

object header

The object header of the HotSpot virtual machine contains two parts of information. The first part is used to store the runtime data of the object itself, such as hash code (HashCode), GC generation age, lock status flag, lock held by the thread, biased thread ID, Bias for timestamps etc. The other part is the type pointer, that is, the pointer of the object to its class metadata, and the virtual machine uses this pointer to determine which class instance the object is (not all virtual machine implementations must retain the type pointer on the object data, or That is to say, it is not necessary to go through the object itself to find the metadata information of the object). If the object is a Java array, there must be a block in the object header to record the length of the array.

Sao Dai extension:

Metadata: In Java, metadata refers to data describing various elements in a Java program, including classes, methods, fields, etc. Metadata provides information about these elements, such as their names, types, access modifiers, annotations, etc.

The metadata in Java mainly has the following types:

1. Annotation: Annotation is a kind of metadata used to describe program elements, which can be used to describe classes, methods, fields, etc. Annotations use the @ symbol as a prefix, can contain elements and default values, and can obtain annotation information at runtime through reflection.

2. Reflection: Reflection is a mechanism that can obtain class information at runtime, including class names, methods, fields, constructors, etc. Through reflection, you can get the metadata information of the class and perform dynamic operations.

3. JavaBeans: JavaBeans is a standard component model in the Java platform, used to describe reusable Java components. JavaBeans contains a set of properties, methods and events, and the metadata information of components can be obtained through JavaBeans API.

4. XML description file: Some frameworks and tools in Java, such as Spring, Hibernate, etc., use XML files to describe the information of program elements, including classes, methods, fields, etc. These XML files can be parsed by a parser to obtain metadata information of program elements.

In general, metadata in Java is data that describes program elements, including annotations, reflections, JavaBeans, and XML description files. These metadata provide information about program elements and can be obtained and manipulated through mechanisms such as reflection.

instance data

The instance data part is the effective information actually stored by the object, and it is also the content of various types of fields defined in the program code. Whether it is inherited from the parent class or defined in the subclass, it needs to be recorded. The storage order of this part will be affected by the default allocation strategy parameters of the virtual machine and the order in which the fields are defined in the Java source code (fields of the same width are always allocated together).

Align padding

The alignment padding part does not necessarily exist and has no special meaning, it just acts as a placeholder. Because the automatic memory management system of HotSpot VM requires that the starting address of the object must be an integer multiple of 8 bytes, that is, the size of the object must be an integer multiple of 8 bytes. The object header part is exactly a multiple of 8 bytes (1 or 2 times), therefore, when the object instance data part is not aligned, it needs to be completed by alignment padding.

Two ways to allocate memory

After the class loading is complete, a block of memory will be allocated to the object in the Java heap. Memory allocation depends on whether the Java heap is regular, and there are two methods: pointer collision and free list

pointer collision

If the memory of the Java heap is regular, that is, all the used memory is placed on one side, and the free one is placed on the other side. When allocating memory, move the pointer pointer to the free memory for a distance equal to the size of the object, thus completing the work of allocating memory.

free list

If the memory of the Java heap is not regular, it is necessary for the virtual machine to maintain a list to record which memory is available, so that it can be queried from the list at the time of allocation to allocate it to the object, and update the list after allocation Record.

Which allocation method to choose is determined by whether the Java heap is regular, and whether the Java heap is regular is determined by whether the garbage collector used has the function of compaction.

Deal with concurrency security issues when allocating memory

In the actual development process, objects are often created. Creating objects under concurrent conditions will lead to concurrency safety issues. For example, when multiple concurrently executing threads create objects, when allocating memory, it is possible to apply for the same location in the Java heap (that is, there are two concurrent threads at the same time. The memory allocated by different objects is the same block of memory), which requires locking this part of the memory space or using operations such as CAS to ensure thread safety, that is, to ensure that this area is only allocated to one thread. As a virtual machine, thread safety must be guaranteed. Generally speaking, virtual machines use two methods to ensure thread safety.

CAS + retry on failure

CAS (Compare and Swap) is a commonly used synchronization primitive in concurrent programming, which is used to implement synchronization operations between multiple threads. A CAS operation contains three operands: memory location V, expected value A, and new value B. If and only if the value of V is equal to A, CAS will atomically update the value of V with the new value B, otherwise no operation will be performed. The CAS operation is an optimistic locking strategy. It assumes that the thread performing the operation will not be interfered by other threads during the operation, so it can avoid the performance problems caused by using traditional locks. If the operation fails due to conflicts, retry until it succeeds.

TLAB

TLAB (Thread-Local Allocation Buffer) is an optimization technology in the Java virtual machine to improve the allocation efficiency of objects. TLAB is a memory area unique to each thread, used to store thread-private objects. When a thread needs to allocate an object, it will first check whether there is enough space in its own TLAB, and if so, allocate the object directly in the TLAB, otherwise it will allocate the object in the heap.

The advantage of using TLAB is that it can avoid competition between threads and improve the efficiency of object allocation. Since each thread has its own TLAB, multiple threads do not interfere with each other when allocating objects at the same time. In addition, since TLAB is thread-private, unnecessary memory competition and lock competition can also be avoided.

If TLAB is not used, multiple concurrently executing threads create objects, and when allocating memory, it is possible to apply for the same location in the Java heap. This requires locking this part of the memory space or using operations such as CAS to ensure thread safety. , which guarantees that the area is only allocated to one thread.

Sao Dai's understanding: When the object created by the thread is larger than the remaining memory in TLAB or the memory of TLAB is exhausted, then use CAS + failure retry to allocate memory.

Object access mode (object positioning)

Java programs need to access specific objects in the heap through references on the JVM stack. How objects are accessed depends on the implementation of the JVM virtual machine. At present, the mainstream access methods include handle and pointer access.

Pointer access : Inside the Java virtual machine, objects are accessed through pointers. Each object has a unique address in memory, and the instance variables and methods of the object can be directly accessed through pointers.

Handle access : The Java virtual machine also supports handle access. In the handle access method, the Java virtual machine uses a handle to represent the object, and the handle contains the address of the object and the object type information. When you need to access an object, first obtain the address of the object through the handle, and then access the instance variables and methods of the object through the address.

In the Java virtual machine, the access method of the object is determined by the specific implementation. In some older Java virtual machines, handle access may be used, while in some newer Java virtual machines, pointer access is generally used. In addition, the Java virtual machine also provides some object-oriented positioning technologies, such as TLAB, object head pointer, etc., which can further improve the efficiency of object access.

garbage collection mechanism

Briefly describe the Java garbage collection mechanism

In java, programmers do not need to manually release the memory of an object, but the virtual machine executes it by itself. In the JVM, there is a garbage collection thread, which is of low priority and will not be executed under normal circumstances. It will only be triggered when the virtual machine is idle or the current heap memory is insufficient, scanning those that are not referenced by any objects and recycle them.

What is GC? Why GC?

What is GC?

GC means garbage collection (Gabage Collection)

Why GC (purpose of Garbage Collection)?

Garbage Collection (GC) is an automatic memory management mechanism. Its main purpose is to automatically identify and reclaim unused memory space when the program is running, thereby preventing problems such as memory leaks and memory overflows. Specifically, the main purposes of garbage collection are as follows:

1. Avoid memory leaks: During the running of the program, if some objects are no longer referenced but still occupy memory space, this is a memory leak. Garbage collection can automatically detect and recycle these useless objects, thereby avoiding memory leaks.

2. Avoid memory overflow: Memory overflow means that the memory space required by the program at runtime exceeds the memory space that the system can provide, causing the program to crash. Garbage collection can reclaim unused memory space in time to avoid memory overflow.

3. Improve program performance: Manually managing memory space requires programmers to be responsible for memory allocation and recycling, which is prone to errors and takes up a lot of time and energy for programmers. Garbage collection can automatically manage memory space, reduce the burden on programmers, and improve program development efficiency and performance.

In short, the main purpose of garbage collection is to make the program more robust, stable and efficient.

When does garbage collection happen?

Garbage collection is automatic. It periodically scans the objects in the heap memory when the program is running, identifies objects that are no longer referenced, and reclaims the memory space they occupy. Specifically, the timing and method of garbage collection depend on the adopted garbage collection algorithm and implementation.

Generally speaking, the timing of garbage collection is as follows:

1. Timing recycling: After the program runs for a period of time, garbage collection is triggered periodically to reclaim unused memory space.

2. Idle collection: When the program is idle, garbage collection is triggered to reclaim unused memory space.

3. Incremental collection: Divide the garbage collection process into multiple stages, gradually reclaim unused memory space when the program is running, and reduce the impact of garbage collection on program performance.

4. Generational recycling: Divide the heap memory into multiple generations, each generation adopts a different recycling strategy, and recycles according to the survival time and size of the object to improve the efficiency of garbage collection.

In short, the timing and method of garbage collection are determined by the specific garbage collection algorithm and implementation, and developers do not need to manually control the timing of garbage collection.

What are the advantages and principles of garbage collection?

Advantages of garbage collection?

1. Convenience: Compared with manual memory management, programming languages ​​using garbage collection allow programmers to focus more on the logic and functional design of the application, without having to pay attention to the details of memory allocation and release. This greatly simplifies the development process and increases productivity.

2. Security: Manual memory management is prone to problems such as memory leaks or dangling pointers, resulting in program crashes or abnormal operation. Use garbage collection to avoid these problems, because the garbage collector can automatically handle the allocation and deallocation of memory, and ensure that objects are only released when they are no longer referenced.

3. Efficient: Garbage collection usually has better performance than manually managing memory. Although the implementation of garbage collection may consume some additional CPU and memory resources, it prevents serious performance problems caused by incorrect memory management, making the code more robust and maintainable.

4. Scalability: Since the garbage collector is responsible for handling memory management, it is very effective for very large applications and high-concurrency systems, reducing the difficulty of manually adjusting memory usage.

In summary, garbage collection is a very important technology that allows developers to focus their energy and attention on writing stable, efficient and reliable code.

The principle of garbage collection?

The basic principle of garbage collection is to determine which objects can be marked as garbage by tracking object references, and then the garbage collector automatically clears these garbage objects from memory. The following are some of the main algorithms for garbage collection:

  1. Mark-clear algorithm: This algorithm is divided into two stages, the first stage is to mark all objects that are still in use, and the second stage is to clear all unmarked objects. This method may generate memory fragmentation and increase the complexity of memory management.

  2. Copy algorithm: This algorithm divides the available memory into two blocks, and only uses one of them at a time. When garbage is collected, the surviving objects in the memory being used are copied to another block, and the previous memory block is cleared. This algorithm is easy to implement, but is not suitable for large objects and frequent memory allocations.

  3. Mark-compact algorithm: Similar to the mark-sweep algorithm, but it will mark first and compact the heap to eliminate fragmentation when moving surviving objects. This algorithm requires more operations, but handles memory fragmentation better.

The above algorithms have their advantages and disadvantages. Therefore, multiple garbage collection algorithms may be used in practical applications, and the most suitable method can be selected according to different situations.

Can the garbage collector reclaim memory immediately? Is there any way to actively notify the virtual machine to perform garbage collection?

Can the garbage collector reclaim memory immediately?

The garbage collector may not always be able to reclaim memory immediately, depending on the specific garbage collection algorithm and virtual machine implementation. Under normal circumstances, the garbage collector will perform memory recovery according to certain rules when managing heap memory, instead of waiting until the memory is exhausted to trigger garbage collection.

Is there any way to actively notify the virtual machine to perform garbage collection?

In some cases, the application program can issue a suggested garbage collection instruction to the virtual machine through an explicit call System.gc()or method, that is, notify the virtual machine to perform garbage collection. Runtime.getRuntime().gc()However, since the Java Programming Language Specification only guarantees that this approach is advisory, it cannot be guaranteed that garbage collection will be triggered in every case.

In addition to manually triggering garbage collection, the garbage collector also has some other trigger conditions, such as the heap size exceeding a certain threshold, the end of the life cycle of the object, etc., and the garbage collector will automatically start. Virtual machines usually control the timing of garbage collection according to the system's idle time, CPU usage, memory allocation speed, etc., so as to minimize the impact on applications.

What are the reference types in Java?

strong reference

The most common one in Java is strong reference, assigning an object to a reference variable, this reference variable is a strong reference. When an object is referenced by a strong reference variable, it is in a reachable state, and it cannot be reclaimed by the garbage collection mechanism. Therefore, strong references are one of the main causes of Java memory leaks.

Object obj = new Object(); //As long as obj still points to the Object object, the Object object will not be recycled 
obj = null; //Manually set null

As long as there are strong references, the garbage collector will never reclaim the referenced object. Even if the memory is insufficient, the JVM will directly throw OutOfMemoryError and will not recycle. If you want to break the connection between the strong reference and the object, you can explicitly assign the strong reference to null, so that the JVM can recycle the object in a timely manner

soft reference

Soft references are used to describe some non-essential but still useful objects. When there is enough memory, the soft reference object will not be recycled. Only when the memory is insufficient, the system will recycle the soft reference object. If there is still not enough memory after the soft reference object is recycled, a memory overflow exception will be thrown. This feature is often used to implement caching technologies, such as web page caching, image caching, etc.
After JDK1.2, the java.lang.ref.SoftReference class is used to represent soft references.

Before running the following Java code, you need to configure the parameters -Xms2M -Xmx3M to set the initial memory of the JVM to 2M and the maximum available memory to 3M.

public class TestOOM {
	private static List<Object> list = new ArrayList<>();
	public static void main(String[] args) {
	     testSoftReference();
	}
	private static void testSoftReference() {
		for (int i = 0; i < 10; i++) {
			byte[] buff = new byte[1024 * 1024];
			SoftReference<byte[]> sr = new SoftReference<>(buff);
			list.add(sr);
		}
		
		System.gc(); //主动通知垃圾回收
		
		for(int i=0; i < list.size(); i++){
			Object obj = ((SoftReference) list.get(i)).get();
			System.out.println(obj);
		}
		
	}
	
}

print result

We found that no matter how many soft reference objects are created by the loop, only the last object is always kept in the printed result, and all other objs are emptied and recycled.
This shows that in the case of insufficient memory, soft references will be automatically recycled.

weak quotation

Weak references need to be implemented with the WeakReference class, which has a shorter lifetime than soft references. For objects with only weak references, as long as the garbage collection mechanism runs, no matter whether the JVM's memory space is sufficient or not, the objects occupied by the object will always be recycled. Memory.

private static void testWeakReference() {
		for (int i = 0; i < 10; i++) {
			byte[] buff = new byte[1024 * 1024];
			WeakReference<byte[]> sr = new WeakReference<>(buff);
			list.add(sr);
		}
		
		System.gc(); //主动通知垃圾回收
		
		for(int i=0; i < list.size(); i++){
			Object obj = ((WeakReference) list.get(i)).get();
			System.out.println(obj);
		}
	}

phantom reference

Phantom Reference (Phantom Reference) is one of the four reference types in Java, and it is also the weakest one. The role of the virtual reference is to help the object to be cleaned up before being recycled when it is reclaimed by the garbage collector. The virtual reference cannot access the object itself through it, nor can it access any properties or methods of the object, because its get() method always returns null. Phantom references are mainly used to manage off-heap memory, such as memory in NIO Direct Memory. When the object referenced by the virtual reference is reclaimed by the garbage collector, the virtual reference will be put into a ReferenceQueue for cleaning when necessary. Phantom references are generally used together with ReferenceQueue. By checking the referenced object in ReferenceQueue, it can be determined that the object has been recycled, so as to perform related cleaning work. Phantom references are usually implemented by the java.lang.ref.PhantomReference class.

public class PhantomReference<T> extends Reference<T> {
    /**
     * Returns this reference object's referent.  Because the referent of a
     * phantom reference is always inaccessible, this method always returns
     * <code>null</code>.
     *
     * @return  <code>null</code>
     */
    public T get() {
        return null;
    }
    public PhantomReference(T referent, ReferenceQueue<? super T> q) {
        super(referent, q);
    }
}

Reference Queue (ReferenceQueue)

Reference queue (ReferenceQueue) is a queue used to manage reference objects in Java. When an object is reclaimed by the garbage collector, if the object has a reference queue associated with it, then the reference object will be put into the reference queue. By referring to the queue, you can determine whether the object is reclaimed by the garbage collector, so as to perform related cleaning work.

Reference queues are often used with weak, soft, and phantom references. When the object referenced by a weak reference, soft reference, or phantom reference is reclaimed by the garbage collector, the reference object will be placed in the reference queue associated with it. By checking the reference objects in the reference queue, it can be determined that the objects have been recycled, so that necessary cleaning work can be performed.

The use of reference queues is very flexible, and different reference types and cleaning tasks can be selected according to different needs. For example, for some objects that need to be released, phantom references and reference queues can be used to clean up resources.

Sao Dai's understanding: Note that reference queues cannot be used together with strong references. Unlike soft references and weak references, phantom references must be used together with reference queues.

Does garbage collection happen in the permanent generation in the JVM

jdk1.8 or earlier

Before JDK 8, the permanent generation (PermGen) in the JVM was an area used to store class information, constant pools, etc. It was part of the heap memory, but its garbage collection mechanism was different from that of the heap memory.

In versions prior to JDK 8, garbage collection also occurred in the permanent generation. Garbage collection in the permanent generation is mainly for data that is no longer used such as loaded class information and constant pools. They are considered to exist permanently, but they may actually be recycled. The triggering conditions for garbage collection in the permanent generation are different from those in the heap memory. They have different garbage collection mechanisms and garbage collection algorithms. When garbage collection occurs in the permanent generation, it may cause the unloading and loading of classes, thus affecting the performance and stability of the program. Therefore, in versions prior to JDK 8, it is necessary to adjust the size of the permanent generation and the garbage collection mechanism according to the actual situation to improve the performance and stability of the program.

Sao Dai extension: In versions prior to JDK 8, the permanent generation (PermGen) is an area used to store class information, constant pools, etc. It is part of the heap memory, but its garbage collection mechanism is different from the heap memory. The garbage collection mechanism and algorithm of the permanent generation are different from those of the heap memory, mainly including the following two types:

1. Mark-sweep algorithm

The garbage collection mechanism of the permanent generation uses a garbage collection mechanism based on the mark-sweep algorithm. The algorithm is divided into two phases: marking phase and clearing phase. During the marking phase, the garbage collector traverses all objects in the permanent generation and marks all surviving objects. During the cleanup phase, the garbage collector removes all unmarked objects, thereby freeing the memory space they occupy.

2. Full GC

The garbage collection algorithm of the permanent generation is different from that of the heap memory, and it uses the Full GC (full garbage collection) algorithm. Full GC is a very time-consuming garbage collection algorithm. It stops all threads of the application, then traverses the entire heap memory and permanent generation, and recycles all unreferenced objects. The triggering condition of Full GC is different from that of heap memory. It is usually triggered when the permanent generation is full or the class loader is unloaded.

Since the garbage collection mechanism and garbage collection algorithm of the permanent generation are different from those of heap memory, in versions prior to JDK 8, the size of the permanent generation and the garbage collection mechanism need to be adjusted according to the actual situation to improve the performance and stability of the program.

After jdk1.8

In JDK 8 and later versions, the permanent generation is removed and replaced by Metaspace, which is an area that uses native memory to store class information, constant pools, etc., and the permanent generation (PermGen) before JDK 8 different. By default, the memory space of the metaspace is the memory space of the operating system, not the heap memory space of the Java virtual machine. The size of the metaspace can be controlled by setting JVM parameters, for example, use the "-XX:MaxMetaspaceSize" parameter to specify the maximum size of the metaspace. The garbage collection mechanism of the metaspace is also different from that of the permanent generation. It uses a garbage collection mechanism based on the mark-clear algorithm.

Sao Dai extension: In JDK 8 and later versions, the permanent generation (PermGen) is removed and replaced by Metaspace (Metaspace), which is an area that uses native memory to store class information, constant pools, etc. The garbage collection mechanism and garbage collection algorithm of the metaspace are different from the permanent generation, mainly including the following two types:

1. Garbage collection mechanism based on mark-sweep algorithm

The garbage collection mechanism of the metaspace uses a garbage collection mechanism based on the mark-sweep algorithm, which is the same as that of the permanent generation. The algorithm is divided into two phases: marking phase and clearing phase. In the marking phase, the garbage collector traverses all objects in the metaspace and marks all surviving objects. During the cleanup phase, the garbage collector removes all unmarked objects, thereby freeing the memory space they occupy.

2. Garbage collection mechanism based on Compressed Class Space

Metaspace also uses a garbage collection mechanism based on Compressed Class Space. Compressed Class Space is an area for storing class metadata, which uses a compression algorithm to reduce the memory footprint of metadata. In metaspace, Compressed Class Space is an optional area, which can dynamically allocate and release memory space as needed. When the metadata in the Compressed Class Space is no longer used by the application, the garbage collector will recycle it to release the memory space.

Since the garbage collection mechanism and garbage collection algorithm of the metaspace are different from those of the permanent generation, in JDK 8 and later versions, there is no need to adjust the size of the metaspace and the garbage collection mechanism, which simplifies program management and maintenance.

Sao Dai’s understanding: Both the permanent generation and the metaspace will have a garbage collection mechanism, but the garbage collection mechanism and algorithm used by the two are not exactly the same

What are the new generation garbage collector and the old generation garbage collector? What's the difference?

What are the new generation garbage collector and the old generation garbage collector?

New generation collector: Serial, ParNew, Parallel Scavenge

Old age collector: Serial Old, Parallel Old, CMS

Whole heap collector: G1

What is the difference between the new generation garbage collector and the old generation collector?

The new generation and the old generation are two important memory areas in the Java virtual machine. Their garbage collectors have the following differences:

1. Different garbage collection algorithms

The new generation mainly uses the copy algorithm, which divides the memory space into two equal areas, and only uses one of the areas at a time. When this area is full, copy the surviving objects in it to another area, and then clear the original area. Area. This ensures that a block of memory is free after each garbage collection and can be used directly. The old generation mainly uses the mark-clear algorithm or mark-organize algorithm to mark the surviving objects, and then clear the unmarked objects or organize the surviving objects together to release the memory space.

2. Different memory allocation strategies

Objects in the new generation generally survive for a short time, so a fast memory allocation method is adopted, that is, memory is allocated by pointer collision or free list. Objects in the old generation generally survive for a long time, so a more complex memory allocation method is adopted, that is, memory allocation is performed by using a generational allocation strategy or a space allocator.

3. Garbage collection timing is different

Garbage collection in the young generation is generally triggered after the memory space of the new generation is full, while garbage collection in the old generation is generally triggered after the memory space of the old generation is full. In addition, the old generation will periodically trigger Full GC (Full Garbage Collection) to reclaim memory space according to the characteristics of various garbage collection algorithms and the actual situation of the application.

4. The efficiency of garbage collection is different

The garbage collection in the new generation is generally more efficient than the garbage collection in the old generation, because the objects in the new generation generally survive for a short time, and the garbage collector only needs to process a part of the objects, while the objects in the old generation are generally long-lived. For a long time to survive, the garbage collector needs to deal with a large number of objects, so the efficiency is relatively low.

Does Java have memory leaks? Please briefly describe

Yes, Java can also have memory leak issues. A memory leak means that the memory space requested by the program is not released correctly, causing these memory spaces to occupy the system memory all the time, eventually resulting in insufficient system memory or slow program operation. In Java, memory leaks are usually caused by several reasons:

1. The object was not released properly

In Java, programmers usually use the new operator to create an object, and after using the object, use null or manually call the finalize() method of the object to release the memory space occupied by the object. If programmers don't release objects properly, then these objects will keep occupying memory space and eventually lead to memory leaks.

2. Long-lived objects hold references to short-lived objects

In Java, if a long-lived object holds a reference to a short-lived object, the short-lived object may not be properly released after use, resulting in a memory leak. This usually happens when using techniques such as caches, thread pools, etc.

3. The resource is not properly closed

In Java, programmers usually use try-catch-finally statement blocks to handle resources such as files, network connections, and database connections. If the programmer does not close these resources properly, then these resources will always take up memory space and eventually lead to memory leaks.

Why does the new generation use the "copy algorithm" and the old generation use the "mark-sort method"?

The new generation uses the "copy algorithm" because the objects in the new generation have a short life cycle, and most objects are recycled quickly, so the "copy algorithm" can reclaim memory more quickly. The copy algorithm divides the new generation into two equal spaces, and only uses one of them each time. When this space is used up, copy the surviving objects to another space, and then clear the currently used space, so that Ensure that there are only surviving objects in the current space. The advantage of this method is that it is simple to implement and efficient in operation, but the disadvantage is that part of the memory space is wasted.

The "mark-organization method" is used in the old generation because the objects in the old generation have a long life cycle and are rarely recycled. Therefore, the "mark-organization method" can make better use of memory space. The mark-compact method is to mark all surviving objects, move them to one end, and then clear all memory outside the end boundary. The advantage of this method is that the memory space can be better utilized, but the disadvantage is that the implementation is more complicated and the operating efficiency is relatively low.

Sao Dai understands

Why the new generation uses the "replication algorithm"

Because most of the new generation are garbage objects, you can use the copy algorithm to copy a small part of the surviving objects to another area, and all the remaining objects are garbage objects, which can be wiped out at once. Because there are few surviving objects, the number of "copying" is small; although there are many garbage objects left, they can be wiped out, so the number of "deletion" is also small. Just like you have a folder with 1000 pictures in it, only 3 of them are useful, and you want to delete the remaining 997 junk pictures, how would you delete them? You definitely don't want to delete 997 junk pictures one by one, which needs to be deleted 997 times. You obviously think that it will be faster to copy 3 useful pictures and then delete the entire folder. This is why it is appropriate to use the copy algorithm for the new generation, but it is not appropriate to use the mark-and-sweep algorithm.

Why does the old generation use the "mark-collation method"

Because there are some "old and immortal" objects in the old generation, suppose you have a folder with 1000 pictures in it, 997 of which are useful, how would you delete the 3 junk pictures among them? You will definitely not copy 997 useful pictures and then kill the entire folder, because the cost of copying is too high, it takes a long time, and the positions of 997 pictures have changed, which is reflected in the java object, which is 997 The addresses referenced by each java object have to be changed again. Instead, you will delete the 3 junk pictures one by one, because the number of deletions is small and there is no need to move files a lot. Therefore, the old generation is suitable for using the mark removal algorithm, not for the copy algorithm (moving 997 pictures, just to delete 3 junk pictures at a time, so silly)

What are the three types of GC triggers?

Conditions triggered by Minor GC

Minor GC (Young GC) refers to the process of garbage collection for the new generation. The conditions triggered by Minor GC usually include the following aspects:

1. The Eden area is full: When the Eden area does not have enough space to allocate new objects, Minor GC will be triggered. In Minor GC, the Eden area and the Survivor area are garbage collected, and the surviving objects are copied to another Survivor area.

2. The Survivor area is full: When the Survivor area does not have enough space to accommodate the surviving objects copied from the Eden area, Minor GC will be triggered. In Minor GC, the Eden area and the two Survivor areas will be garbage collected, and the surviving objects will be copied to another Survivor area.

It should be noted that Minor GC only cleans up objects in the young generation, not objects in the old generation. When the objects in the new generation go through multiple Minor GCs, the surviving objects will be moved to the old generation and wait for the Full GC to recycle. Therefore, the frequency of Minor GC is usually higher than that of Full GC, but the memory space recovered each time is relatively small.

Conditions that trigger MajorGC

Major GC (Full GC) refers to the process of garbage collection of the entire heap (including the new generation and the old generation). The conditions triggered by Major GC usually include the following aspects:

1. Insufficient space in the old generation: When there is not enough space in the old generation to accommodate newly allocated objects, Major GC will be triggered. In Major GC, the entire heap is garbage collected, including the young generation and the old generation.

2. Insufficient permanent generation space (for virtual machines using permanent generation): The permanent generation is used to store data such as class information and constant pools. When the permanent generation space is insufficient, Major GC will be triggered. In Major GC, the entire heap is garbage collected, including the new generation, the old generation, and the permanent generation.

3. The System.gc() method is explicitly called: Calling the System.gc() method can suggest that the virtual machine perform garbage collection, but it does not guarantee that Major GC will be triggered immediately. The virtual machine can decide whether to perform Major GC according to the current heap status and garbage collection strategy.

4. Space allocation guarantee failure: During Minor GC, the virtual machine guarantees space allocation for the Survivor area to ensure the security of Minor GC. If the available space in the Survivor area is not enough to accommodate all surviving objects, the virtual machine will directly move these objects to the old age. If the old generation does not have enough space to accommodate these objects, a Major GC will be triggered.

It should be noted that the frequency of Major GC is relatively low, but the memory space recovered each time is relatively large, which will cause a long pause time. Therefore, in practical applications, you should try to avoid triggering Major GC, and optimize memory management by adjusting the size of the heap and adjusting the garbage collection strategy.

Conditions that trigger FullGC

Full GC (Full Garbage Collection) refers to the process of garbage collecting the entire heap (including the new generation, old generation, and permanent generation). The trigger conditions of Full GC are more complex, usually including the following aspects:

1. Insufficient space in the old generation: When there is not enough space in the old generation to accommodate newly allocated objects, Full GC will be triggered. In Full GC, the entire heap is garbage collected, including the young generation, old generation, and permanent generation.

2. Insufficient permanent generation space (for virtual machines using permanent generation): The permanent generation is used to store data such as class information and constant pools. When the permanent generation space is insufficient, Full GC will be triggered. In Full GC, the entire heap is garbage collected, including the young generation, old generation, and permanent generation.

3. The System.gc() method is explicitly called: Calling the System.gc() method can suggest that the virtual machine perform garbage collection, but it does not guarantee that Full GC will be triggered immediately. The virtual machine can decide whether to perform Full GC according to the current state of the heap and the garbage collection strategy.

4. Space allocation guarantee failure: During Minor GC, the virtual machine guarantees space allocation for the Survivor area to ensure the security of Minor GC. If the available space in the Survivor area is not enough to accommodate all surviving objects, the virtual machine will directly move these objects to the old age. If the old generation does not have enough space to accommodate these objects, Full GC will be triggered.

5. Insufficient system space: If the system space is insufficient, the virtual machine cannot allocate enough space for the heap, thus triggering Full GC.

6. The age of the object reaches the threshold: when an object has survived in the Survivor area for a period of time, it will be moved to another Survivor area. When an object survives a certain number of times in the Survivor area (the default is 15 times), it will be promoted to the old generation. If there is not enough space in the old generation to accommodate these objects, a Full GC will be triggered. To avoid this situation, the frequency of Minor GC can be controlled by adjusting the object promotion threshold.

It should be noted that the frequency of Full GC is relatively low, but the memory space recovered each time is relatively large, which will cause a long pause time. Therefore, in practical applications, Full GC should be avoided as much as possible, and memory management can be optimized by adjusting the size of the heap and adjusting the garbage collection strategy.

How does the JVM judge whether an object is a garbage object?

There are two methods, namely the reference counting method and the reachability analysis algorithm

1. Reference counting method

Add a reference counter to the object. If the referenced counter is incremented by 1, the counter is decremented by 1 when the reference becomes invalid. If the counter is 0, it is marked as garbage.

Disadvantages: If an object A holds object B, and object B also holds an object A (the two objects refer to each other), then a cycle holding similar to a deadlock in the operating system occurs. In this case, the relationship between A and B If the counter is always greater than 1, the GC will never be able to recycle these two objects

Analysis : A counter is maintained at the object header, and the counter is incremented every time the reference to the object is added. If the reference to the object is lost, the counter is decremented. When the counter is 0, it indicates that the object has been discarded and is not alive. On the one hand, this method cannot distinguish soft, virtual, weak, and strong reference categories. On the other hand, it will cause a deadlock. Assuming that two objects refer to each other, the counter will never be released, and the GC will never be possible.

2. Accessibility analysis algorithm

Using a series of objects referenced by GC Roots as the starting point, starting from these nodes to search downwards, the path traveled by the search is called the reference chain. When an object does not have any reference chain connected to GC Roots, it means that the object is unreachable, that is, a garbage object. If the object is found to have no reference chain connected to GC Roots after reachability analysis, the object will be marked as a recyclable garbage object. If the marked object overrides the finalize() method, then the JVM will put the object into the F-Queue queue and execute the finalize() method at a later time. The priority of the thread executing the finalize() method is very low, so the execution time of the finalize() method is uncertain. Executing the finalize() method may reconnect the object to the reference chain, so before executing the finalize() method, the JVM will perform a "self-rescue" operation on the objects in the F-Queue, that is, move these objects to the reachable object collection , thus preventing these objects from being recycled. After executing the finalize() method, if the object is still not referenced, the object will be recycled. If the object is reconnected to the reference chain in the finalize() method, the object will be removed from the F-Queue queue and will not be recycled.

What objects can be used as GC Roots?

  1. Objects referenced in the virtual machine stack: objects referenced in the local variable table in the virtual machine stack, including objects referenced in the stack frames of each thread and objects referenced in static variables.

  2. Objects referenced by class static properties in the method area: objects referenced by class static properties in the method area, such as static variables and constants.

  3. Objects referenced by constants in the method area: objects referenced by constants in the method area, such as string objects in the string constant pool.

  4. Objects referenced by JNI in the local method stack: objects referenced by JNI (Java Native Interface) in the local method stack.

Sao Dai extension: The finalize() method is a method defined in the Object class. Its function is to be called before the object is about to be reclaimed by the garbage collector to complete some resource release and cleanup. The finalize() method is a mechanism provided by the Java language to perform some cleanup work before the object is recycled, such as closing files, releasing resources, and so on. However, since the execution time of the finalize() method is uncertain, it may cause delays in the garbage collector, so it is not recommended to perform complex operations in the finalize() method.

The execution timing of the finalize() method is uncertain, because the specific implementation of the garbage collector is determined by the virtual machine manufacturer. In some cases, the finalize() method may never be executed, such as when an exception occurs during the running of the program, the program is forcibly terminated, and so on. Therefore, the finalize() method should not be used as the only means of resource release, and mechanisms such as try-finally statement blocks should be used to ensure the release of resources.

In Java 9, the finalize() method has been marked as obsolete. It is recommended to use mechanisms such as try-with-resources statement block and PhantomReference to replace the finalize() method to complete resource release and cleanup.

If the object's reference is set to null, will the garbage collector immediately free the memory occupied by the object?

The memory occupied by the object is not necessarily freed immediately. When all references to an object are set to null, the object becomes unreachable, but the garbage collector does not immediately reclaim the object's memory. Instead, the garbage collector will perform a garbage collection operation at some point, reclaiming unreachable objects and freeing the memory they occupy.

Specifically, the timing of garbage collection by the garbage collector is uncertain, and it is affected by various factors, such as memory usage, type of garbage collector, configuration of the garbage collector, and so on. In some cases, the garbage collector may perform a collection operation immediately to release the memory occupied by the object; in other cases, the garbage collector may delay the collection operation until the memory usage reaches a certain threshold or the system resources are sufficient. recycling operations.

Therefore, programmers should not rely on the behavior of freeing memory immediately after object references are set to null, but should optimize memory usage by avoiding creating unnecessary objects, manually calling System.gc(), etc.

When is the finalize() method called? What is the purpose of a destructor (finalization)?

When is the finalize() method called?

The finalize() method is a method defined in the Object class. Its function is to be called before the object is about to be reclaimed by the garbage collector to complete some resource release and cleanup. The finalize() method is a mechanism provided by the Java language to perform some cleanup work before the object is recycled, such as closing files, releasing resources, and so on. However, since the execution time of the finalize() method is uncertain, it may cause delays in the garbage collector, so it is not recommended to perform complex operations in the finalize() method.

The execution timing of the finalize() method is uncertain, because the specific implementation of the garbage collector is determined by the virtual machine manufacturer. In some cases, the finalize() method may never be executed, such as when an exception occurs during the running of the program, the program is forcibly terminated, and so on. Therefore, the finalize() method should not be used as the only means of resource release, and mechanisms such as try-finally statement blocks should be used to ensure the release of resources.

What is the purpose of a destructor (finalization)?

The purpose of the destructor (finalization) is to perform some cleanup work before the object is destroyed, such as releasing resources, closing files, and so on. Unlike the finalize() method in Java, the destructor is a concept in the C++ language that is called automatically when an object is destroyed. The execution timing of the destructor is determined, it is called automatically when the object is destroyed, and has nothing to do with the garbage collector.

In Java, due to the existence of the garbage collector, the timing of object destruction is uncertain, so there is no concept of destructor in Java. On the contrary, Java provides the finalize () method to complete the function similar to the destructor, but because the execution timing of the finalize () method is uncertain, it should not be used as the only means of resource release.

Tell me which garbage collectors the JVM has?

The figure shows 7 collectors that act on different generations. If there is a connection between the two collectors, it means that they can be used together. The region the virtual machine is in indicates whether it belongs to the young generation or the old generation collector.

New generation collectors (all of which are copy algorithms): Serial, ParNew, ParallelScavenge

Old age collector: CMS (mark-clean), Serial Old (mark-organize), Parallel Old (mark-organize)

Whole heap collector: G1 (a mark-clear algorithm in one Region, and a copy algorithm between two Regions) At the same time, first explain a few terms:

  • Parallel (Parallel): Multiple garbage collection threads work in parallel, and the user thread is in a waiting state at this time
  • Concurrent: User threads and garbage collection threads execute concurrently
  • Throughput: running user code time / (running user code time + garbage collection time)

1.Serial collector

The Serial collector is a garbage collector in the Java virtual machine, which is mainly used to collect garbage objects in the young generation. It is a single-threaded garbage collector that suspends the execution of the application, then scans the objects in the young generation, marks objects that are no longer referenced as garbage objects, and reclaims them.

The advantage of the Serial collector is that it is simple and efficient, and is suitable for applications in single-core processors and small memory environments. However, since it is single-threaded, its garbage collection is less efficient, causing longer application pause times.

In the Java virtual machine, you can choose to use the Serial collector by setting parameters. For example, the Serial collector can be enabled with the following command line arguments:

-XX:+UseSerialGC

Additionally, the behavior of the Serial collector can be tuned by setting the following parameters:

-XX:NewRatio: Set the ratio of the young generation to the old generation. The default value is 2, which means that the young generation occupies 1/3 of the heap memory.

-XX:SurvivorRatio: Set the ratio of the Eden area to the Survivor area. The default value is 8, which means that the Eden area accounts for 8/10 of the young generation memory, and the Survivor area accounts for 1/10 of the young generation memory.

-XX:MaxTenuringThreshold: Set the age threshold for the object to enter the old generation. The default value is 15, which means that the object will be moved to the old generation after 15 Minor GCs in the young generation.

2. ParNew collector

The ParNew collector is a young generation garbage collector, which is a multi-threaded version of the Serial collector. Similar to the Serial collector, the ParNew collector also uses the mark-copy algorithm to reclaim garbage objects in the young generation, but it can use multithreading to speed up the garbage collection process.

The advantage of the ParNew collector is that it can take advantage of multi-core processors to improve garbage collection efficiency, and the number of threads can be controlled by setting parameters to adapt to different hardware environments. In addition, it can also be used in conjunction with the CMS collector to provide more efficient garbage collection capabilities.

In the Java virtual machine, the ParNew collector can be enabled with the following command line parameters:

-XX:+UseParNewGC

Additionally, the behavior of the ParNew collector can be tuned by setting the following parameters:

-XX:ParallelGCThreads: Set the number of threads for garbage collection, the default value is the number of CPU cores.

-XX:MaxTenuringThreshold: Set the age threshold for objects entering the old generation, the default value is 15.

-XX:SurvivorRatio: Set the ratio of the Eden area to the Survivor area. The default value is 8, which means that the Eden area accounts for 8/10 of the young generation memory, and the Survivor area accounts for 1/10 of the young generation memory.

In short, the ParNew collector is an efficient young generation garbage collector that can take advantage of multi-core processors and provide faster garbage collection capabilities.

3. Parallel Scavenge collector

The Parallel Scavenge collector is a young generation garbage collector. It is a multi-threaded collector. Similar to the ParNew collector, it also uses the mark-copy algorithm to reclaim garbage objects in the young generation.

The main feature of the Parallel Scavenge collector is to focus on throughput, that is, to complete garbage collection in the shortest possible time to maximize the running time of the application. It can achieve high-throughput garbage collection by controlling the number and priority of garbage collection threads.

Unlike other collectors, the goal of the Parallel Scavenge collector is to achieve a manageable throughput, not to minimize pause times. Therefore, its pause time may be longer than other collectors, but it can complete garbage collection in a shorter time, thereby improving the throughput of the application.

In the Java virtual machine, the Parallel Scavenge collector can be enabled with the following command line parameters:

-XX:+UseParallelGC

Additionally, the behavior of the Parallel Scavenge collector can be tuned by setting the following parameters:

-XX:ParallelGCThreads: Set the number of threads for garbage collection, the default value is the number of CPU cores.

-XX:MaxGCPauseMillis: Set the maximum garbage collection pause time, the default value is 200 milliseconds.

-XX:GCTimeRatio: Set the ratio of garbage collection time to the total time. The default value is 99, which means that the garbage collection time accounts for 1% of the total time.

In summary, the Parallel Scavenge collector is a throughput-focused garbage collector that can take advantage of multi-core processors to provide higher application throughput.

4.Serial Old collector

In the Java virtual machine, the Serial Old collector is a traditional, single-threaded garbage collector, which is mainly used to collect garbage objects in the old age. It uses a mark-clear algorithm, first marks the objects that need to be recycled, and then clears these objects to release the memory space.

The advantage of the Serial Old collector is that it is simple, efficient, and suitable for small applications and client applications. However, its disadvantage is that it cannot take full advantage of the advantages of multi-core CPUs, because it is single-threaded and cannot process garbage collection tasks in parallel. Also, it causes the application to pause, because during garbage collection, the application has to wait for the garbage collection to complete before continuing.

Prior to Java 8, the Serial Old collector was the default old age collector. However, with the continuous development of the Java virtual machine, more applications now use the concurrent collector or the G1 collector to manage garbage objects.

5. Parallel Old Collector

The Parallel Old collector is a parallel garbage collector in the Java virtual machine, which is mainly used to collect garbage objects in the old age. Different from the Serial Old collector, the Parallel Old collector can take advantage of multi-core CPUs and use multiple threads to process garbage collection tasks in parallel, thereby improving the efficiency of garbage collection.

The Parallel Old collector uses a mark-sort algorithm, first marks the objects that need to be recycled, then moves the surviving objects to one end, and then cleans up the objects at the other end to release the memory space. Due to the parallel processing, the Parallel Old collector can complete the garbage collection task in a short time, reducing the application pause time.

The Parallel Old collector is suitable for large applications and server applications that can take full advantage of parallel processing on multi-core CPUs. However, its disadvantage is that during garbage collection, it will consume a lot of CPU resources, which may affect the performance of the application. Also, it may cause the application to pause, because during garbage collection, the application must wait for the garbage collection to complete before proceeding.

6. CMS collector

The CMS collector is a concurrent garbage collector in the Java virtual machine, which is mainly used to collect garbage objects in the old age. Unlike the Parallel Old collector, the CMS collector can perform garbage collection while the application is running, reducing the pause time of the application.

The CMS collector uses a mark-clear algorithm, first marks the objects that need to be recycled, and then clears these objects to release memory space. Due to the concurrent processing, the CMS collector can perform garbage collection while the application is running, reducing the application pause time.

The CMS collector is suitable for large-scale applications and server applications, and can perform garbage collection while the application is running, reducing the application pause time. However, its disadvantage is that during garbage collection, it will occupy a certain amount of CPU resources, which may affect the performance of the application. Also, due to the mark-sweep algorithm, the CMS collector can cause memory fragmentation issues that require additional processing to resolve.

8. G1 collector

The G1 collector is a concurrent garbage collector in the Java virtual machine, which is mainly used to collect garbage objects in the heap memory. Unlike the CMS collector, the G1 collector can perform garbage collection while the application is running, reducing the application pause time.

The G1 collector adopts the idea of ​​generational collection and divides the heap memory into multiple small blocks, and each small block is called a region. It uses a mark-compact algorithm, performs garbage collection in each region, and then copies surviving objects to another region. Due to the adoption of the idea of ​​concurrent processing and generational collection, the G1 collector can perform garbage collection while the application is running, reducing the pause time of the application.

The G1 collector is suitable for large-scale applications and server applications, and can perform garbage collection while the application is running, reducing the application pause time. It can also dynamically adjust the timing and region size of garbage collection according to the needs of the application to achieve optimal performance. However, its disadvantage is that during garbage collection, it will occupy a certain amount of CPU resources, which may affect the performance of the application. In addition, the G1 collector does not have memory fragmentation problems due to the mark-compact algorithm.

About the choice of gc

Unless the application has very strict pause time requirements, please run the application first and allow the VM to choose the collector (if there is no special requirement. Just use the default gc provided by the JVM). If necessary, adjust the heap size to improve performance. If performance still does not meet goals, use the following guidelines as a starting point for selecting a collector:

  1. If the data set of the application is small (up to about 100 MB), select the serial collector with option -XX:+UseSerialgc.
  2. If the application will run on a single processor and has no pause time requirements, select the serial collector with option -XX:+UseSerialgc.
  3. If (a) peak application performance is the first priority, and (b) there are no pause time requirements or pauses of a second or more are acceptable, let the VM choose the collector or use -XX:+UseParallelgc to choose the parallel collector
  4. Choose to have -XX:+UseG1gc if response time is more important than overall throughput and garbage collection pauses must be kept within about a second. (It is worth noting that CMS in JDK9 has been Deprecated and cannot be used! Remove this option)
  5. If jdk8 is used and the heap memory reaches 16G, it is recommended to use the G1 collector to control the time of each garbage collection.
  6. If response time is a high priority, or if the used heap is very large, use -XX:UseZgc to select a fully concurrent collector. (It is worth noting that Zgc can be started from JDK11, but Zgc is experimental at this time. The experimental label was canceled in JDK15 [released in 202009], and it can be directly displayed and enabled, but the default gc of JDK15 is still G1)
  7. These guidelines provide only a starting point for selecting a collector, since performance depends on the size of the heap, the amount of real-time data maintained by the application, and the number and speed of available processors.
  8. If the recommended collectors do not achieve the desired performance, first try tuning the heap and young generation sizes to achieve the desired goals.
  9. If performance is still insufficient, try using other collectors

General principles: reduce STOP THE WORD time, use concurrent collectors (such as CMS+ParNew, G1) to reduce pause time, speed up response time, and use parallel collectors to increase overall throughput on multiprocessor hardware.

What garbage collection algorithms (recycling mechanisms) does the JVM have?

mark-sweep algorithm

In the Java virtual machine, the mark-sweep algorithm is an algorithm used for garbage collection. It is divided into two phases: marking phase and clearing phase.

In the marking phase, the garbage collector traverses all objects in the heap memory and marks all surviving objects, that is, referenced objects. This process usually starts from the root object, traverses all reachable objects through the reference chain, and marks them as alive objects.

In the cleanup phase, the garbage collector traverses the entire heap memory and clears all unmarked objects, that is, garbage objects.

The advantage of the mark-clear algorithm is that it is simple and easy to implement, and it is suitable for large objects and long-lived objects. However, its disadvantage is that it will cause memory fragmentation problems, because the memory space after clearing is not continuous, and large objects cannot be allocated. Also, it causes the application to pause, because during garbage collection, the application has to wait for the garbage collection to complete before continuing.

copy algorithm

In the Java virtual machine, the copy algorithm is an algorithm used for garbage collection. It divides the heap memory into two areas, only one of which is used at a time, called the "active area", and the other area is called the "idle area". When the active area is full, the garbage collector will copy the surviving objects to the free area, and then clear the active area for the next object allocation.

The replication algorithm has the advantage of being simple, efficient, and suitable for most applications. Since only one of the regions is used at a time, there is no memory fragmentation problem. In addition, the copy algorithm can also be used in combination with generational collection to divide the heap memory into multiple generations, and each generation uses a different garbage collection algorithm to achieve optimal performance.

However, the copy algorithm has the disadvantage of requiring twice as much memory space because each object needs to be copied to another area. Also, since the replication algorithm processes only half of the objects at a time, it is not suitable for large objects and long-lived objects.

Mark-Collating (Compression) Algorithms

Mark useless objects, move all surviving objects to one end, and then directly clear the memory outside the end boundary.

After marking, instead of cleaning up the object, the surviving object is moved to one end of the memory. Objects outside the end boundaries are then cleared.

Advantages : Solve the problem of memory fragmentation in the mark-sweep algorithm.

Disadvantages : Local object movement is still required, which reduces efficiency to a certain extent.

Generational Algorithm

The memory is divided into several parts according to the different life cycle of the object, usually the new generation and the old generation. The new generation basically adopts the copy algorithm, and the old generation adopts the marking algorithm. The current commercial virtual machines use the garbage collection algorithm of generational collection

See the following question for details of the generational algorithm

Introduce the generational recycling mechanism

JVM Generational Recycling Strategy

The JVM generational recovery strategy is that the Java virtual machine divides the heap memory into several pieces according to the life cycle of the object. Generally, it is divided into the new generation, the old generation, and the permanent generation. However, the permanent generation was permanently removed in JDK1. space replaces

new generation

The new generation is mainly used to store new objects. Generally, it occupies 1/3 of the heap space. Due to the frequent creation of objects, the new generation will frequently trigger MinorGC for garbage collection. The commonly used GC recovery algorithm is the copy algorithm. The new generation can be further subdivided into three parts: Eden, Survivor0 (S0 for short), and Survivor1 (S1 for short). These 3 parts divide the new generation according to the ratio of 8:1:1. When the JVM cannot allocate memory space for the new object (when the Eden area is full), the JVM triggers MinorGc. Therefore, the lower the space occupation of the new generation, the more frequent MinorGc. The MinorGC trigger mechanism is that when the Eden area is full, the JVM will trigger MinorGC.

detailed process

  • Most of the newly created objects will be stored in the Eden area (if the newly created objects occupy a large amount of memory, they will be directly allocated to the old generation)
  • When the Eden area is full for the first time, when the memory in the Eden area is not enough, a Minor GC will be triggered for a garbage collection. Firstly, the garbage objects in the Eden area are reclaimed and cleared, and the surviving objects are copied to S0, and S1 is empty at this time.
  • The next time the Eden area is full, perform another garbage collection. This time, all garbage objects in Eden and S0 areas will be cleared, and surviving objects will be copied to S1, and S0 will become empty at this time.
  • The next time the Eden area is full, perform another garbage collection. This time, all garbage objects in Eden and S1 areas will be cleared, and surviving objects will be copied to S0, and S1 will become empty at this time.
  • After repeated switching between S0 and S1 several times (15 times by default), if there are still surviving objects. It means that these objects have a long life cycle, so they are transferred to the old generation.

Sao Dai extension: The virtual machine defines an object age (Age) counter for each object. If the object is still alive after being born in Eden and after the first Minor GC, and can be accommodated by Survivor, it will be moved to the Survivor space, and the object age will be set to 1. Every time an object survives a Minor GC in the Survivor area, its age will increase by 1 year. When its age increases to a certain level (15 years old by default), it will be promoted to the old generation. The age threshold for promoting an object to the old age can be set by the parameter -XX:MaxTenuringThreshold (threshold).

old generation

An object is copied to the old generation if it has survived long enough in the young generation without being cleaned up. The memory size of the old generation is generally larger than that of the new generation, which can store more objects. If the object is relatively large (such as a long string or a large array), and the remaining space in the new generation is insufficient, the large object will be directly allocated to the old generation.

We can use -XX:PretenureSizeThreshold to control the size of objects that are directly promoted to the old generation. Objects larger than this value will be directly allocated to the old generation. Because of the long life cycle of objects in the old generation, there is no need for too many copy operations, so the mark-organization algorithm or the mark-clear recovery algorithm is generally used. Objects in the old age are relatively stable, so MajorGC will not be executed frequently. MajorGC takes a long time, because it needs to be scanned as a whole before recycling, and MajorGC will generate memory fragmentation. In order to reduce memory loss, it is generally necessary to merge or mark them out for direct allocation next time. When the old generation is too full to fit, OOM will be thrown.

permanent generation

Permanent Generation (Permanent Generation) is a memory area in the Java virtual machine for storing information such as classes, methods, and constants. In Java 8 and previous versions, the permanent generation is a part of the heap memory. It has a fixed size. Once it is full, it will cause an OutOfMemoryError exception.

Information stored in the permanent generation includes:

- Metadata information of the class, such as class name, access modifiers, fields, methods, etc.
- String constant pool.
- Static variables.

Since the size of the permanent generation is fixed, and the stored information will continue to increase as the application runs, it is easy to cause an OutOfMemoryError exception. In order to solve this problem, in Java 8 and later versions, the permanent generation is removed and replaced by Metaspace.

Metaspace is a memory area in the Java virtual machine for storing information such as classes, methods, and constants. Unlike the permanent generation, the size of the metaspace is no longer fixed, and it can be dynamically adjusted as needed. In addition, metaspace can also use local memory, thereby reducing the use of heap memory.

The information stored in the metaspace is similar to the permanent generation, including metadata information of classes, string constant pools, static variables, etc. However, the class metadata information in the metaspace is no longer stored in the heap memory of the virtual machine, but in the local memory. This can avoid the fragmentation problem of heap memory and improve the performance of the application.

Sao Dai's understanding: The method area is a part of the heap memory, including the permanent generation (Permanent Generation), which is used to store metadata information, static variables, constants and other data of the class. In Java 8 and earlier versions, the permanent generation is part of the method area, and it has a fixed size. Once it is full, it will cause an OutOfMemoryError exception. In Java 8 and later versions, the method area is removed and replaced by metaspace (Metaspace)

CMS Garbage Collector

1. What is CMS?

Concurrent Mark Sweep. As you can see from the name, CMS is a concurrent gc that uses the mark-clear algorithm. CMS is a GC that recycles for the old age.

2. What is the use of CMS?

The purpose of CMS is to obtain the minimum pause time. In some applications or websites that have high requirements on response time, user programs cannot have long pauses, and CMS can be used in this scenario.

3. The process of CMS garbage cleaning

Generally speaking, the execution process of CMS can be divided into the following stages:

  1. Initial mark (STW initial mark)
  2. Concurrent marking
  3. Concurrent precleaning
  4. Remark (STW remark)
  5. Concurrent sweeping
  6. Concurrent reset

Initial marking : At this stage, all worker threads in the program will be temporarily suspended due to the "stop-the-world" (STW) mechanism. The main task of this stage is only to mark that GC Roots can be directly associated with Object. All application threads that were suspended will be resumed once the marking is complete. Since the directly associated objects are relatively small, the speed here is very fast.

Concurrent marking : On the basis of the initial marking, continue to trace down the marking (the process of traversing the entire object graph from the directly associated object of Gc Roots), in the concurrent marking phase, the threads of the application and the threads of the concurrent marking are executed concurrently, so the user does not need to Will experience stalls and can run concurrently with garbage collection threads.

Concurrent precleaning : The concurrent precleaning phase is still concurrent. At this stage, the virtual machine looks for objects that have newly entered the old generation during the concurrent marking phase (some objects may be promoted from the new generation to the old generation, or some objects are allocated to the old generation). By rescanning, the work of "re-marking" in the next stage is reduced, because the next stage will Stop The World. Concurrent abortable precleaning phase. This stage is actually the same as the previous stage, and it is also to reduce the workload of the next STW remarking stage. This stage is added so that we can control the end timing of this stage, such as how long to scan (default 5 seconds) or when the proportion of Eden area usage reaches the desired ratio (default 50%) to end this stage.

Remark : This phase pauses the virtual machine and the collector thread scans the remaining objects in the CMS heap. Scanning starts at the "root object" and works downwards, marking objects that were promoted from the young generation, objects that were newly allocated to the old generation, and objects that were modified during the concurrent phase. (Because in the concurrent marking phase, the working thread of the program will run at the same time or cross-run with the garbage collection thread, so in order to correct the marking record of the part of the object whose marking changes due to the continued operation of the user program during the concurrent marking period)

Concurrent cleanup : Clean up garbage objects. At this stage, the collector thread and the application thread execute concurrently.

Concurrent reset : At this stage, reset the data structure of the CMS collector and wait for the next garbage collection.

4. Disadvantages of CMS

  1. The basic algorithm used by the CMS collector is Mark-Sweep. All CMS will not organize and compress the heap space. This will have a problem: the heap collected by CMS will generate space fragments. CMS does not organize and compress the heap space, which saves the pause time of garbage collection, but also wastes heap space. In order to solve the problem of wasting heap space, the CMS collector no longer uses simple pointers to point to an available heap space for the next object allocation. Instead, some unallocated space is summarized into a list. When the JVM allocates object space, it will search this list to find a space large enough to hold the object.
  2. Requires more CPU resources. As can be seen from the above figure, in order to keep the application program from stopping, the CMS thread and the application program thread execute concurrently, which requires more CPUs, and purely relying on thread switching is unreliable. Moreover, in the re-marking phase, if it is empty to ensure that STW is completed quickly, more or even all CPU resources will be used. Of course, multi-core multi-CPU is also the future trend!
  3. Another disadvantage of CMS is that it requires a larger heap space. Because the threads of the application program are still executing during the CMS marking phase, there will be situations where the heap space will continue to be allocated. In order to ensure that there is still space allocated to the running application program before the CMS reclaims the heap, a part of the space must be reserved. In other words, CMS will not start collecting when the old generation is full. Instead, it tries to start the collection earlier, avoiding the situation mentioned above: there is not enough space on the heap for allocations before the collection is complete! By default, when the old generation uses 68%, CMS starts to act. – XX:CMSInitiatingOccupancyFraction =n to set this threshold.

In general, the CMS collector reduces the pause time of collection, but reduces the utilization of heap space.

5. When to use CMS

If your application is sensitive to pauses, and you can provide more memory and more CPU (that is, the hardware is awesome) when the application is running, then using CMS to collect will benefit you. Also, if in the JVM, there are relatively many objects with a long survival time (the old age is relatively large), it will be more suitable to use CMS.

Good article reference: Detailed explanation of CMS garbage collection mechanism - short book

Guess you like

Origin blog.csdn.net/qq_50954361/article/details/131375126