JVM running process, class loading, garbage collection

1. Introduction to JVM

1、JVM

JVM is the abbreviation of Java Virtual Machine, which means Java virtual machine.

A virtual machine refers to a complete computer system that is simulated by software and has complete hardware functions and runs in a completely isolated environment.

Common virtual machines: JVM, VMwave, Virtual Box.

The difference between the JVM and the other two virtual machines:

  1. VMwave and VirtualBox simulate the instruction set of the physical CPU through software, and there will be many registers in the physical system;
  2. The JVM simulates the Java bytecode instruction set through software. Only the PC register is mainly reserved in the JVM, and other registers are trimmed.

The JVM is a customized non-real computer

Each jvm is a java process. If there are two java processes, there are two jvms!!!

Why add a JVM between the program and the operating system: Java is a language with a particularly high level of abstraction, providing a series of features such as automatic memory management. It is impossible to implement these features directly on the operating system, so the JVM needs to be converted.

As can be seen from the figure, with the abstraction layer of JVM, Java can achieve cross-platform . The JVM only needs to ensure that the .class file can be executed correctly , and then it can run on platforms such as Linux, Windows, and MacOS.

How Java achieves cross-platform: windows system implements a windows version of the JVM, Linux system implements the Linux version of the JVM, Mac system implements the Mac version of the JVM

Any system also implements the corresponding version of JVM. Cross-platform is supported by countless different versions of JVM!! These different JVMs are internally encapsulated, and the APIs of different systems are all bytes that implement the same rules. code

insert image description here

C, C++, Go, and Rust all compile the code into native code, that is, machine instructions that the CPU can recognize, and do not require a virtual machine . ——The machine instructions generated for different systems/cpu are different (the compiled executable programs are different)

Java, Python, and PHP are all translated into specified bytecodes for cross-platform, and then converted into machine instructions by the corresponding virtual machine . ——Bytecodes are the same , tomcat windows can be run directly by copying to Linux, a c++ program compiled on windows, copying to Linux will not work (there are also special means such as wine)


2. History of JVM development

2.1、Sun Classic VM

As early as 1996, when Java1.0 was released, Sun could not release a java virtual machine called Sun Classic vm, which was also the world's first commercial java virtual machine, and it was completely eliminated when jdk1.4 was released.

Only an interpreter is provided inside this virtual machine.

If you use the JIT compiler, you need to plug-in. But once the JIT compiler is used, the JIT will take over the execution system of the virtual machine. The interpreter just doesn't work anymore. Interpreters and compilers don't work together.

Now Hotspot has this virtual machine built in;

2.2、Exact VM

In order to solve the previous virtual machine problem, when jdk1.2, sun provides this virtual machine.

Exact has the prototype of a modern high-performance virtual machine, including the following functions:

  1. Hotspot detection (compile hotspot code into bytecode to speed up program execution);

  2. Compiler and parser mixed working mode.

It is only used briefly on the Solaris platform, and it is still a classic vm on other platforms. The hero was short of breath and was finally replaced by the Hotspot virtual machine.

2.3、HotSpot VM

History of HotSpot

  1. Originally designed by a small company called "Longview Technologies";

  2. In 1997, the company was acquired by Sun; in 2009, Sun was acquired by Oracle.

  3. In JDK1.3, HotSpot VM becomes the default virtual machine

Currently HotSpot occupies an absolute market position, dominating the martial arts.

Regardless of whether JDK6 is still widely used or JDK8 is more widely used, the default virtual machine is HotSpot;
The most mainstream JVM, Oracle official jdk and open source openjdk all use this JVM. There are applications from server, desktop to mobile and embedded.

The HotSpot in the name refers to its hot code detection technology. It can find the code with the most compilation value through counters, trigger just-in-time compilation (JIT) or on-stack replacement; through the cooperation of compiler and interpreter, it can achieve a balance between optimized program response time and best execution performance

2.4、JRockit

JRockit focuses on server-side applications. Currently, based on HotSpot, the excellent features of JRockit are transplanted.

It can pay less attention to the startup speed of the program, so JRockit does not include a parser implementation inside, and all codes are compiled and executed by a just-in-time compiler;

Numerous industry benchmarks show that the JRockit JVM is the fastest JVM in the world.

Using JRockit products, customers have experienced significant performance improvements (some exceeding 70%) and hardware cost reductions (up to 50%);

Strengths: Comprehensive portfolio of Java runtime solutions.

JRockit's solution for delay-sensitive applications, JRockit Real Time, provides JVM response time in milliseconds or microseconds, suitable for financial, military command, and telecommunications networks;

MissionControl service suite, which is a set of tools to monitor, manage and analyze applications in production environments with extremely low overhead; 2008, BEA was acquired by Oracle.

Oracle expressed the work of integrating two excellent virtual machines, which is roughly completed in JDK8. The way of integration is to transplant the excellent features of JRockit on the basis of HotSpot.

2.5、J9 JVM

Full name: IBM Technology for Java Virtual Machine, referred to as IT4J, internal code name: J9.

The market positioning is close to HotSpot, server-side, desktop application, embedded and other multi-purpose JVMs are widely used in various Java products of IBM.

At present, one of the three influential commercial virtual machines is also known as the world's fastest Java virtual machine (stable on IBM's own products);

Around 2017, IBM released the open source J9 VM, named OpenJ9, which was managed by the Eclipse Foundation, also known as Eclipse OpenJ9.

2.6. Taobao JVM (domestic research and development)

Released by the AliJVM team. Ali, the most powerful company using Java in China, covers many fields such as cloud computing, finance, logistics, and e-commerce, and
needs to solve complex problems of high concurrency, high availability, and distribution. There are tons of open source products out there.

Based on OpenJDK, it developed its own customized version AlibabaJDK, referred to as AJDK. It is the cornerstone of the entire Ali JAVA system;

Based on the OpenJDK HotSpot JVM, the first optimized, deeply customized and open-source high-performance server-version Java virtual machine in China
has the following characteristics (just understand):

  1. The innovative GCIH (GC invisible heap) technology realizes off-heap, that is, Java objects with a long life cycle are moved from the heap to outside the heap, and the GC cannot manage the Java objects inside the GCIH, so as to reduce the recovery evaluation of the GC The purpose of improving the efficiency and improving the recovery efficiency of GC.

  2. Objects in GCIH can also be shared among multiple Java virtual machine processes.

  3. Use the crc32 instruction to implement JVM intrinsics to reduce the calling overhead of JNI;

  4. Java profiling tool and diagnostic assistance function of PMU hardware;

  5. ZenGC for big data scenarios.

The taobao JVM application has high performance on Ali products, and the hardware is heavily dependent on intel's cpu, which has lost compatibility but improved performance. It has been launched on Taobao and Tmall, replacing all official Oracle JVM versions.


3. JVM and "Java Virtual Machine Specification"

The various JVM versions above, such as HotSpot and J9 JVM, can be regarded as the specific implementation of JVM products by different manufacturers, and the implementation of their (JVM) products must comply with the "Java Virtual Machine Specification", "Java Virtual Machine Specification "is the most important and authoritative work in the field of Java published by Oracle. It describes the various components of the JVM in a complete and detailed manner.

PS: The following parts of this article are introduced by default using HotSpot, which is the default virtual machine of Oracle Java.


2. JVM running process

The JVM is the foundation of Java operation, and it is also the key to realize one-time compilation and execution everywhere, so how does the JVM execute it?

1. JVM execution process

Before the program is executed, the java code must be converted into a bytecode (class file). The JVM first needs to load the bytecode into the runtime data area (Runtime Data Area ) through a certain method class loader ( ClassLoader). ) , and the bytecode file is a set of instruction set specifications of the JVM, which cannot be directly handed over to the underlying operating system for execution. Therefore, a specific command parser execution engine (Execution Engine) is required to translate the bytecode into the underlying system instructions and then It is handed over to the CPU for execution, and in this process, it is necessary to call the interface native library interface (Native Interface) of other languages ​​to realize the function of the entire program. This is the responsibility and function of these four main components.

insert image description here

In summary, the JVM executes Java programs mainly by being divided into the following four parts, which are:

  1. Class Loader (ClassLoader)

  2. Runtime Data Area

  3. Execution Engine

  4. Native Interface


2. JVM runtime data area

2.1, memory area division

When decorating a rough house, it is necessary to divide the house into multiple rooms, each room has a different function, such as living room, bedroom, kitchen and so on.

The same is true for the division of memory areas. When the JVM is started, it will apply for a large memory area. The JVM
is an application program. To apply for memory from the operating system (equivalent to renting an office building),
the JVM must be based on needs. Divide the whole space into several parts , each part has different functions

The JVM runtime data area is also called the memory layout, but it should be noted that it is completely different from the Java Memory Model (JMM for short), and belongs to two completely different concepts. It consists of the following five parts:

insert image description here


① Local method stack

The local method stack is similar to the virtual machine stack, except that the Java virtual machine stack is for the JVM , while the local method stack is for the local method

Native means the C++ code inside the JVM, which is the stack space prepared for calling the native method (method inside the JVM)


② Java virtual machine stack (thread private)

The stack mentioned here and the stack of data structures are not the same thing!!

The stack of data structures is a general and broader concept. The stack mentioned here is a specific space in the JVM.

For the JVM virtual machine stack, the calling relationship between methods is stored here

  • Inside the entire stack space, it can be considered to contain many elements (each element represents a method).
    Each element here is called a "stack frame"
    . This stack frame will contain the entry address of this method, the method What are the parameters, what is the return address, local variables...

For the local method stack, the calling relationship between the native methods is stored

The role of the Java virtual machine stack: The life cycle of the Java virtual machine stack is the same as that of the thread. The Java virtual machine stack describes the execution of the Java method.

Memory model: Each method will create a stack frame (Stack Frame) to store local variable table, operand stack, dynamic link, method exit and other information when it is executed . In the heap memory and stack memory we often say, the stack memory refers to the virtual machine stack.

A stack frame consists of the following 4 parts:

insert image description here

  1. Local variable table: Stores various basic data types (8 basic data types) and object references known to the compiler. The memory space required by the local variable table is allocated during compilation. When entering a method, how much local variable space this method needs to allocate in the frame is completely determined, and the size of the local variable table will not be changed during execution. Simply put, it stores method parameters and local variables.

  2. Operation stack: Each method generates a first-in, last-out operation stack.

  3. Dynamic Link: A method reference pointing to the runtime constant pool.

  4. Method Return Address: The address of the PC register.


What is thread privateness?

Since the multi-threading of the JVM is implemented by switching threads in turn and allocating processor execution time , at any given moment, a processor (a multi-core processor refers to a core) will only execute one thread. instructions . Therefore, in order to return to the correct execution position after switching threads, each thread needs an independent program counter , and the counters between each thread do not affect each other and are stored independently . We call this type of area "thread-private" memory.

insert image description here

The stack here is actually not only one, there are many!! Each thread has one,
jconsole checks the internal situation of the java process, and you can see all the threads.
Click on the thread to see the call stack of the thread ( View the information in the thread's stack)

insert image description here

The stack is thread-private [very common], but strictly speaking it is not exactly
private, meaning you can't use mine.
In fact, the content on a thread stack can be used by another thread


③ Program counter (thread private)

The role of the program counter: used to record the line number executed by the current thread.

**The program counter is a relatively small memory space. Like the Java virtual machine stack, each thread has a copy. **It can be regarded as the line number indicator of the bytecode executed by the current thread.

If the current thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed;
if it is a Native method being executed, the counter value is empty.

The program counter memory region is the only region where no OOM conditions are specified in the JVM specification!


④ Heap (thread sharing)

The role of the heap: All objects created in the program are stored in the heap, and the member variables of the class are also on the heap

The heap is the largest area in the entire JVM space. There is only one copy of the heap for one process!!
The stack is one for each thread, and there are N pieces for one process!!
Heap - Multiple threads use the same heap. Stack - each thread uses its own stack

Our common JVM parameter setting -Xms10m minimum startup memory is for the heap, and -Xmx10m maximum running memory is also for the heap.
ms is the abbreviation of memory start, mx is the abbreviation of memory max

The heap is divided into two areas: the new generation and the old generation. The new generation contains newly created objects, and objects that survive after a certain number of GCs are placed in the old generation. There are three areas in the new generation: one Endn + two Survivor (S0/S1)

insert image description here

During garbage collection, the surviving objects in Endn will be put into an unused Survivor, and the current Endn and the
Survivor in use will be cleared


⑤ Method area/metadata area (thread sharing)

insert image description here

Meta Mate, save class objects, constant pool, static objects

One process has one block, shared by multiple threads

The function of the method area: it is used to store data such as class information loaded by the virtual machine, constants, static variables, code compiled by the just-in-time compiler, etc.

Renamed: In the "Java Virtual Machine Specification", this area is called "method area", and in the implementation of HotSpot virtual machine, this area is called PermGen (PermGen) in JDK 7, and it is called metaspace since Java 8 (Metaspace).

PS: The permanent generation (PermGen) and the metaspace (Metaspace) are the implementation of the method area in the "Java Virtual Machine Specification" in HotSpot. The relationship between them is like, for a car, it defines a part It is called "kinetic energy providing device", but there are different implementation technologies for different vehicles. For example, for fuel vehicles, the realization technology of its "kinetic energy providing device" is a gasoline engine (referred to as engine), while for electric vehicles, The realization of its "kinetic energy providing device" is an electric motor (referred to as a motor). The motor and the motor are equivalent to the permanent generation and the metaspace. It is the realization of the definition of the "brake", that is, the method area.

Changes in JDK 1.8 metaspace:

  1. For HotSpot, the memory of the JDK 8 metaspace belongs to the local memory, so the size of the metaspace is no longer affected by the parameters of the JVM's maximum memory, but is related to the size of the local memory.
  2. In JDK 8, the string constant pool was moved to the heap.

Runtime constant pool:

  • The runtime constant pool is part of the method area, storing literals and symbol references.

    • Literals: strings (moved to the heap in JDK 8), final constants, values ​​of primitive data types.

    • Symbolic references: Fully qualified names of classes and structures, names and descriptors of fields, names and descriptors of methods.


3. Summary

insert image description here

The most important test point is to give a piece of code and ask which area a certain variable is in?? Principle:

  1. local variables on the stack
  2. Ordinary member variables are on the heap
  3. Static member variables are in the method area/metadata area

4. Abnormal problems in memory layout

① Java heap overflow

The Java heap is used to store object instances. As long as objects are continuously created and there is a reachable path between GC Roots and objects to avoid GC from clearing these objects, a memory overflow exception will occur when the number of objects reaches the maximum heap capacity.
You can set the JVM parameter -Xms: set the minimum value of the heap, -Xmx: set the maximum value of the heap.

The OOM exception of Java heap memory is the most common memory overflow situation in practical applications. When Java heap memory overflow occurs, the abnormal heap information "java.lang.OutOfMemoryError" will further prompt "Java heap space". When "Java heap space" appears, it clearly tells us that OOM occurs on the heap.

At this time, it is necessary to analyze the files from the dump, taking MAT as an example. Analyze whether the problem is a memory leak (Memory Leak) or a memory overflow (Memory Overflow)

Memory leaks: Leaked objects cannot be overflowed by GC
memory: memory objects should indeed survive. At this time, it is necessary to compare the JVM heap parameters with the physical memory to check whether the JVM heap memory should be increased; or check whether the life cycle of the object is too long.


② Virtual machine stack and local method stack overflow

Since our HotSpot virtual machine combines the virtual machine stack and the local method stack into one, for HotSpot, the stack capacity only needs to be set by the -Xss parameter.

There are two exceptions that will be generated by the virtual machine stack:

  • If the stack depth requested by the thread is greater than the maximum depth allowed by the virtual machine, a StackOverFlow exception will be thrown

  • If the virtual machine cannot apply for enough memory space when expanding the stack, an OOM exception will be thrown

Example: Observing StackOverFlow exceptions (in a single-threaded environment),

/**
* JVM参数为:-Xss128k
* @author 38134
*
*/
public class Test {
    
    
	private int stackLength = 1;
	public void stackLeak() {
    
    
		stackLength++;
		stackLeak();
	}

    public static void main(String[] args) {
    
    
		Test test = new Test();
		try {
    
    
			test.stackLeak();
		} catch (Throwable e) {
    
    
			System.out.println("Stack Length: "+test.stackLength);
			throw e;
		}
	}
}

When a StackOverflowError exception occurs, there is an error stack that can be read, and it is better to find the problem. If you use the default parameters of the virtual machine, it is no problem for the stack depth to reach 1000-2000 in most cases, which is completely sufficient for normal method calls (including recursion).
If the memory overflow problem is caused by multi-threading, if the number of threads cannot be reduced, the only way to reduce the maximum heap and stack capacity is to exchange for more threads.
Example: Observing memory overflow exceptions under multi-threading

/**
 * JVM参数为:-Xss2M
 * @author 38134
 *
 */
public class Test {
    
    
    private void dontStop() {
    
    
        while(true) {
    
    
        }
    }
    public void stackLeakByThread() {
    
    
        while(true) {
    
    
            Thread thread = new Thread(new Runnable() {
    
    
                @Override
                public void run() {
    
    
                    dontStop();
                }
            });
            thread.start();
        }
    }
    public static void main(String[] args) {
    
    
        Test test = new Test();
        test.stackLeakByThread();
    }
}

The above code should be run with caution. Remember to save all the work at hand


3. JVM class loading

1. Class loading process

From the picture above, we can see that in the entire JVM execution process, the most closely related to programmers is the process of class loading, so let's look at the execution process of class loading next.
For a class, its life cycle is as follows:

insert image description here

Among them, the first 5 steps are in a fixed order and are also the process of class loading, and the middle 3 steps belong to connection, so for class loading, it is divided into the following
steps:

  1. load

  2. connect

    1. verify
    2. Prepare
    3. analyze
  3. initialization


① load

Loading: Find the .class file (the process of finding it), open the file, read the file, and read the content of the file into the memory

The "Loading" stage is a stage in the entire "Class Loading" process. It is different from Class Loading. One is Loading and the other is Class Loading, so don't put the two are confused.

During the Loading phase, the Java virtual machine needs to complete the following three things:

  • Get the binary byte stream defining this class by its fully qualified name.
  • Convert the static storage structure represented by this byte stream into the runtime data structure of the method area.
  • Generate a java.lang.Class object representing this class in the memory, as the access entry of various data of this class in the method area.

② Verification

Authentication is the first step in the connection phase,Check that the .class file format is correct, the purpose of this stage is to ensure that the information contained in the byte stream of the class file complies with all the constraints of the officially provided "Java Virtual Machine Specification", to ensure that the information will not endanger the security of the virtual machine itself after being run as code .

The final loading is to get the class object

Java Language and Virtual Machine Specifications

insert image description here

All the information of the class written in the java code will be included here, but it is reorganized in a binary way

insert image description here

Validation options:
file format validation, bytecode validation, symbol reference validation...


③ preparation

The preparatory stage is formalAllocate memory to the variables defined in the class object (that is, static variables, variables modified by static)(First occupy a place in the metadata area),And in the stage of setting the initial value of the class variable, the memory is initialized to all 0 (which will cause the static member to be set to 0 value).

For example, there is such a line of code at this time:

public static int value = 123;

It initializes the int value of value to 0, not 123.


④ Analysis

The parsing phase is the Java virtual machineThe process of initializing string constants and replacing symbolic references in the constant pool with direct references

How to understand that symbolic references are replaced by direct references:

  • String constants must have a memory space to store the actual content of the character, and a reference to store the starting address of the memory space
  • Before the class is loaded, the string constant is in the .class file at this time . At this time, the "reference" records are not the real address of the string constant , but its "offset" in the file (or a placeholder) - symbolic reference
  • After the class is loaded, the string constant is actually put into the memory . Only at this time can the "memory address" be assigned to the specified memory address - direct reference

Example: I go to the cinema to watch a movie. I know that A is in the front and B is in the back. I only know my relative position, but not the specific position => symbolic reference

Wait until the movie theater, after the organization sits down, I know my true self is a child => symbolic reference


⑤ Initialization

Call the constructor, initialize members, execute code blocks, execute static code blocks, load parent classes...

Really initialize the content in the class object , and the Java virtual machine actually starts to execute the Java program code written in the class, and transfers the dominance to the application. The initialization phase is the process of executing the class constructor method


2. Timing of class loading

When will a class be loaded?
It is not that all the classes are loaded once the java program is running, but only when they are actually used (lazy mode)

  1. Instances of Constructed Classes

  2. Call the static method of this class / use the static property, because you need to have the class object first

  3. To load a subclass , you must first load its parent class

It is only loaded when it is used. Once loaded, there is no need to reload for subsequent use


3. Parental delegation model

Class Loader: Responsible for converting the bytecode in the JVM into a class that can be executed by the JVM, when the JVM is running, classes will be dynamically created as needed, and the startup class loader will first load the core class library in the program, and then load dependent classes down layer by layer through the parent delegation model. Class loaders can be divided into three types: startup class loaders, extension class loaders, and application class loaders.

insert image description here

3.1, class loader type

From the perspective of the Java virtual machine, there are only two different class loaders: one is the bootstrap class loader (Bootstrap ClassLoader), which is implemented in C++ language and is part of the virtual machine itself ; The first is all other class loaders , which are implemented by the Java language, exist independently outside the virtual machine , and all inherit from abstract classes java.lang.ClassLoader.

From the perspective of Java developers, the class loader should be more detailed. Since JDK 1.2, Java has maintained a three-tier classloader , parent-delegating classloading framework.

The three class loaders provided by the JVM by default:

  • BootstrapClassLoader starts the class loader: responsible for loading the classes in the standard library (java specification, which classes are required to be provided, no matter what kind of JVM implementation, these same classes will be provided, and the core class library of Java in the lib directory in the JDK is loaded, that is $JAVA_HOME/lib directory.
  • ExtensionClassLoader extension class loader: responsible for loading the classes in the JVM extension library (except for the specification, additional functions provided by the manufacturer/organization that implements the JVM), and loading the classes in the lib/ext directory.
  • ApplicationClassLoader application class loader: responsible for loading classes in third-party libraries/user project codes provided by users

The above three classes have a "parent-child relationship"
(not a parent class subclass, which is equivalent to each class loader having a parent attribute pointing to its own parent class loader )

On the other hand, the class loader can actually be customized by the user. The above three class loaders are
user-defined class loaders that come with jvm, and can also be added to the above process, which can be compared with the existing Loading uses
the User Defined ClassLoader custom class loader: customize the class loader according to your own needs.

insert image description here


3. 2. Parental delegation model

"Dual parent" delegation model is actually very embarrassing to translate (machine translation). It is
more appropriate to call it "single parent delegation model", or "father delegation model"
parent One of the parents

Loading: Find the .class file (the process of finding it), open the file, read the file, and read the content of the file into the memory

The parent delegation model describes the basic process of finding a .class file.
If a class loader receives a class loading request, it will not try to load the class itself first, but will delegate the request to the parent class loader to complete , each level of class loader is the same, so all loading requests should eventually be sent to the top-level startup class loader , only when the parent loader reports that it cannot complete the loading request (no When the required class is found), the subloader will try to complete the loading by itself.

How the above class loaders work together:

  • When loading a class first, start with ApplicationClassLoader

  • But ApplicationClassLoader will hand over the loading task to the father, and let the father do it

  • So ExtensionClassLoader is going to load... but it's not really loaded, but entrusted to his father

  • BootstrapClassLoader is going to load, and it also wants to entrust it to its father, but it turns out that its father is null.
    No father / father finished loading, did not find the class, and then loaded it by itself.
    At this time, BootstrapClassLoader will search for related classes in the standard library directory it is responsible for. If found, it will be loaded. If not found, it will continue to be loaded by subclasses. to load

  • ExtensionClassLoader actually searches the directory related to the extension library , if found, it will be loaded, if not found, it will be loaded by the subclass loader

  • ApplicationClassLoader actually searches the directory related to the user project . If found, it loads it. If not found, it is loaded by the subclass loader (since there is no subclass currently, it can only throw an exception such as ClassNotFoundException )


3. 2. Advantages of the parental delegation model

Why the above order?
The above sequence is actually derived from the logic of jvm code implementation. This code is probably written in a "recursive" way

In fact, it is not impossible to start directly from the top,
but the JVM code is currently implemented in a similar recursive manner, which leads to a process from bottom to top and from top to bottom.

The main purpose of this order is to ensure that Bootstrap can be loaded first and Application can be loaded later , which can avoid unnecessary bugs caused by users creating some strange classes

advantage:

  1. Avoid repeated loading of classes: For example, class A and class B both have a parent class C, then class C will be loaded when A starts, then there is no need to repeatedly load class C when class B is loaded.

  2. Security: Using the parental delegation model can also ensure that the core API of Java is not tampered with. If the parental delegation model is not used, but each class loader loads itself, some problems will occur.

    Assuming that the user writes a java.lang.String class in his own code,
    according to the above loading process, the JVM loads the standard library class at this time, and will not load the class written by the user himself,
    so that it can be guaranteed. Even if the above problems occur, it will not confuse the existing code of the JVM. At most, the class written by the user will not take effect.


4. Breaking the parental delegation model

The class loader written by yourself can be followed or not. Whether it is complied with mainly depends on the requirements
such as Tomcat, to load the webapp, here is a separate class loader, which does not follow the parental delegation model, but loads from the specified directory

Although the parent delegation model has its advantages, it also has certain problems in some cases, such as the JDBC implementation in the SPI (Service Provider Interface) mechanism in Java.

Little knowledge: The full name of SPI is Service Provider Interface, which is a set of interfaces provided by Java to be implemented or extended by third parties. It can be used to enable framework extensions and replace components. The role of the SPI is to find service implementations for these extended APIs.

The Driver interface of JDBC is defined in JDK, and its implementation is provided by service providers of various databases, such as the MySQL driver package. Let's
first look at the core usage code of JDBC:

public class JdbcTest {
    
    
    public static void main(String[] args){
    
    
        Connection connection = null;
        try {
    
    
            connection =
                    DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/test", "root",
                            "awakeyo");
        } catch (SQLException e) {
    
    
            e.printStackTrace();
        }
        System.out.println(connection.getClass().getClassLoader());
        System.out.println(Thread.currentThread().getContextClassLoader());
        System.out.println(Connection.class.getClassLoader());
    }
}

Then we enter the source code class of DriverManager and we will find that it exists in the rt.jar of the system, as shown in the following figure:

insert image description here

From the loading process of the parent delegation model, we can see that rt.jar is loaded by the top-level parent class Bootstrap ClassLoader, as shown in the following figure:

insert image description here

But when we entered its getConnection source code, we found that when it calls a specific class implementation, it uses a subclass loader (thread
context loader Thread.currentThread().getContextClassLoader) to load a specific database package
( Such as the jar package of mysql), the source code is as follows:

@CallerSensitive
public static Connection getConnection(String url, 
                                       java.util.Properties info) throws SQLException {
    
    
    return (getConnection(url, info, Reflection.getCallerClass()));
}
private static Connection getConnection(String url, java.util.Properties info, 
                                        Class<?> caller) throws SQLException {
    
    
    ClassLoader callerCL = caller != null ? caller.getClassLoader() : null;
    synchronized(DriverManager.class) {
    
    
        // synchronize loading of the correct classloader.
        if (callerCL == null) {
    
    
            //获取线程上下为类加载器
            callerCL = Thread.currentThread().getContextClassLoader();
        }
    }
    if(url == null) {
    
    
        throw new SQLException("The url cannot be null", "08001");
    }
    println("DriverManager.getConnection(\"" + url + "\")");
    SQLException reason = null;

    for(DriverInfo aDriver : registeredDrivers) {
    
    
        // isDriverAllowed 对于 mysql 连接 jar 进行加载
        if(isDriverAllowed(aDriver.driver, callerCL)) {
    
    
            try {
    
    
                println(" trying " + aDriver.driver.getClass().getName());
                Connection con = aDriver.driver.connect(url, info);
                if (con != null) {
    
    
                    // Success!
                    println("getConnection returning " + aDriver.driver.getClass().getName());
                    return (con);
                }
            } catch (SQLException ex) {
    
    
                if (reason == null) {
    
    
                    reason = ex;
                }
            }
        } else {
    
    
            println(" skipping: " + aDriver.getClass().getName());
        }
    }
    if (reason != null) {
    
    
        println("getConnection failed: " + reason);
        throw reason;
    }

    println("getConnection: no suitable driver found for "+ url);
    throw new SQLException("No suitable driver found for "+ url, "08001");
}

In this way, the parental delegation model is destroyed, because the DriverManager is located in the rt. Context loader Thread.currentThread().getContextClassLoader ), which destroys the parental delegation model (the parental delegation model says that all classes should be loaded by the parent class, but JDBC obviously cannot do this). Its interaction flow chart is as follows:

insert image description here


4. Garbage collection

1、GC (Garbage Collection)

Garbage refers to the garbage collection of memory that is no longer used
, that is, to automatically release the unused memory for us

There is malloc in C language, and new in C++. These are dynamic memory applications (apply for a memory space on the heap). The
above memory space needs to be released manually free, delete

In the case of C/C++
, if you do not release it manually, this memory space will continue to exist until the end of the process (the memory life cycle on the heap is relatively long, unlike the stack, the space of the stack will end with the method execution, the stack Frame destroyed and automatically released. Heap, it is not automatically released by default )

It may lead to a serious problem: memory leak
If the memory has been occupied and not released, it will lead to less and less remaining space... Further cause subsequent memory application operations to fail!! Especially the
server, I am particularly afraid of this, because it is 7*24h operation. If it is a client program, such as opening QQ, it will be closed after use, the process will exit, and all memory will be released

GC is the most mainstream way to solve memory leaks
Java Go Python PHP JS Most mainstream languages ​​use GC to solve the above problems

Why is there no GC in C++? Because GC has advantages and disadvantages

  • GC benefits: very worry-free, making it easier for programmers to write code, less prone to errors
    GC disadvantages: need to consume additional system resources, and also have additional performance overhead
    .
  • In addition, there is a more critical problem here in GC, STW problem,
    if stop the world sometimes, there is already a lot of garbage in the memory, triggering a GC operation at this time, the overhead may be very large, so large that it may eat up system resources On the
    other hand, when GC collects garbage, some lock operations may be involved, causing the business code to fail to execute normally. In extreme cases, it may occur tens of milliseconds or even hundreds of milliseconds

The new version of Java (starting from Java 13) introduces the garbage collector zgc, which is already very delicately designed, allowing STW to be controlled below 1ms


JVM has many areas: heap, stack, program counter, metadata area...

The space on the stack is a block of stack frames, which are applied for when the method is called, and are automatically destroyed when the method ends. Each thread has a copy of the program counter. When the thread ends, the memory will naturally be reclaimed along with the thread. The metadata area contains class objects. Generally, only class loading is considered, and class unloading is not involved.

Therefore, GC mainly releases the heap

GC uses "object" as the basic unit to recycle (rather than bytes)

insert image description here

What GC recycles is that the entire object is no longer used
and some are used, and some unused objects are not recycled for the time being (an object has many attributes in it, maybe 10 of them will be used later, and 10 attributes will be used later) No need)
To recycle, it is to recycle the entire object, instead of "recycling half the object"

The actual working process of GC:

  1. Find garbage/judgment garbage . Almost all object instances are stored in the Java heap. Before the garbage collector performs garbage collection on the heap, it must first determine which of these objects are still alive and which have "dead".

    • Finding garbage/judging garbage
      The key idea is to grab this object and see if there is a "reference" pointing to it. There is only
      one way to use objects in Java, and use them by reference!! If an object has a reference pointing to it, it is possible Used to
      If an object has no reference pointing to it, it will not be used again
  2. release the object


2. Judgment algorithm for dead objects

Memory vs Objects

In Java, all objects must be stored in memory (it can also be said that objects are stored in memory), so we reclaim memory, which can also be called the recycling of dead objects.

How to know whether an object has a reference point , two typical implementations:

  1. Reference counting algorithm [ not java's practice . python / php]
  2. Reachability analysis [java approach]

Pay attention to the interview questions!
The question is: talk about how to determine whether an object is garbage in garbage collection. At this time,
you can say both.


2.1, reference counting algorithm

The algorithm described by reference counting is:
Assign a reference counter (integer) to each object . Whenever a reference is created to point to the object, the counter will be +1; when the reference is destroyed, the counter will be -1; any object whose counter is 0 at any time can no longer be used , that is, the object is "dead".

{
    
    
    Test t = new Test(); // Test 对象的引用计数 1
	Test t2 = t; // t2 也指向 t 了,引用计数为 2
	Test t3 = t; // 引用计数为 3
} // 大括号结束,上述三个引用超出作用域,失效,此时引用计数就是 0 了,此时 new Test() 对象就是垃圾了

The reference counting method is simple to implement, and the judgment efficiency is relatively high . It is a good algorithm in most cases. For example, the Python language uses reference counting for memory management.

However, reference counting is not used to manage memory in mainstream JVMs for the following reasons:

  1. Much wasted memory space (low utilization)

    • Each object must be assigned a counter. If there are very few objects in the code counted by 4 bytes, it doesn’t matter. If there are too many objects, it will take up a lot of extra space, especially when each object is relatively small .
      The volume of an object is 1k. At this time, it does not matter if there are 4 more selves.
      The volume of an object is 4 bytes. At this time, 4 bytes are added, which is equivalent to doubling the volume
  2. Unable to solve the circular reference problem of the object (the main reason)

example:

class Test {
    
    
	Test t = null;}
}

Test a = new Test(); // 1 号对象, 引用计数是 1
Test b = new Test(); // 2 号对象, 引用计数也是 1

a.t = b // a.t也指向2号对象,2号对象引用计数是2了
b.t = a // b.t也指向1号对象了,1号对象引用计数也是2了.

Next, if the references of a and b are destroyed, the reference counts of object 1 and object 2 are both -1, but the result is still 1, not 0, but although it is not 0, the memory cannot be released, but in fact these two objects have been There is no way to be accessed!!!

Python/PHP uses reference counting and needs to be matched with other mechanisms to avoid circular references


2.2. Accessibility Analysis Algorithm

Objects in Java are all pointed to and accessed by reference.
Often a reference points to an object, and members in this object point to other objects.

class Node {
    
    
    public int val;
    public Node left;
    public Node right;
}

public class TestDemo {
    
    
    public static Node build() {
    
    
        Node a = new Node();
        Node b = new Node();
        Node c = new Node();
        Node d = new Node();
        Node e = new Node();
        Node f = new Node();
        Node g = new Node();
        a.val = 1;
        b.val = 2;
        c.val = 3;
        d.val = 4;
        e.val = 5;
        f.val = 6;
        g.val = 7;
        a.val = 7;
        a.left = b;
        a.right = c;
        b.left = d;
        b.right = e;
        e.left = g;
        c.right = f;
        return a;
    }

    public static void main(String[] args) {
    
    
        Node root = build();
        // 此时这个 root 就相当于树根节点了
        // 当前代码中只有一个引用 root,但是它管理了 N 个对象
    }
}

insert image description here

Although there is only root reference here, the above 7 objects are all reachable!!!
root => a

root.left => b

root.left.left => d

root.left.right => e

root.left.right.left =>g

The reachability analysis here starts from the root and traverses as much as possible, and all objects that can be accessed are reachable!!

root.right.right = null;

It will cause f to be unreachable, and f is garbage

root.right = null;

Will cause c to be unreachable~
If c is unreachable, f must be unreachable

All objects in the entire Java are connected as a whole through a chain/tree structure similar to the above-mentioned relationship

Reachability analysis is to treat the structure of all these objects organized as a tree, divide from several root nodes, that is, use a series of objects called "GC Roots" as the starting point, traverse the tree, and the path traveled is called It is called "reference chain", all objects that can be accessed are marked as "reachable", when an object does not have any reference chain connected to GC Roots (if it cannot be accessed, it is unreachable)(also using this method are C #, Lisp - the earliest language that uses dynamic memory allocation)

The JVM holds a list of all objects by itself (every time an object is new, the JVM will record it, and the JVM will know which objects are in total, and the address of each object), through the above traversal, mark the reachable ones, and the rest The unreachable ones can be recycled as garbage

Reachability analysis needs to be similar to "tree traversal", which is definitely slower than reference counting

But it doesn’t matter if the speed is slow. The above-mentioned reachability analysis traversal operation does not need to be executed all the time. It only needs to be analyzed once every once in a while.

insert image description here

Objects Object5-Object7 are still related to each other, but they are not reachable to GC Roots, so they will be judged as recyclable objects.

In the Java language, the objects that can be used as GC Roots include the following:

  1. local variables on the stack
  2. Objects in the constant pool
  3. static member variable

There are many such starting points in a code, and each starting point is traversed down to complete a scanning process

insert image description here

From the above, we can see the function of "reference". In addition to the earliest we used it (reference) to find objects, now we can also use "reference" to judge dead objects. Therefore, in JDK1.2, Java expanded the concept of references and divided references into four types: strong references, soft references, weak references, and phantom references. The strengths of the four citations are in descending order.

  1. Strong references: Strong references refer to references that commonly exist in program code, similar to "Object obj = new Object()". As long as strong references still exist, the garbage collector will never recycle the referenced object instance.

  2. Soft references: Soft references are used to describe objects that are useful but not necessary. For objects associated with soft references, before the system is about to overflow memory, these objects will be included in the scope of recycling for the second recycling. If there is still not enough memory for this recovery, a memory overflow exception will be thrown. After JDK1.2, the SoftReference class is provided to implement soft references.

  3. Weak references: Weak references are also used to describe non-essential objects. But its strength is weaker than soft references. Objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector starts working, no matter whether the current content is enough or not, it will recycle objects that are only associated with weak references. The WeakReference class is provided after JDK1.2 to implement weak references.

  4. Phantom reference: Phantom reference is also called ghost reference or phantom reference, which is the weakest kind of reference relationship. Whether an object has a virtual reference will not affect its lifetime at all, and an object instance cannot be obtained through a virtual reference. The only purpose of setting a phantom reference to an object is to receive a system notification when the object is reclaimed by the collector. After JDK1.2, the PhantomReference class is provided to implement virtual references.


3. Garbage collection algorithm

After learning to mark dead objects, you can perform garbage collection operations. Before officially introducing the garbage collector, let’s look at several algorithms used by the garbage collection machine (these algorithms are the guiding ideology of the garbage collector)

3.1. Mark-clear algorithm

The "mark-and-sweep" algorithm is the most basic collection algorithm. The algorithm is divided into two phases of "marking" and "clearing": first mark all objects that need to be recycled, and after the marking is completed, all marked objects are uniformly recycled . Subsequent collection algorithms are based on this idea and its shortcomings are improved.

insert image description here

There are two main disadvantages of the "mark-clear" algorithm:

  1. Efficiency problem: the efficiency of the two processes of marking and clearing is not high

  2. Space problem: memory fragmentation problem, the released free space is fragmented, not continuous , too much space fragmentation may cause that when the program needs to allocate large objects in the future, it cannot find enough continuous memory and has to be triggered in advance Another garbage collection.

Applying for memory requires continuous space. The total free space may be large, but each specific space is very small, which may cause failure when applying for a larger memory!!! For example, the total free space is 10K, divided into 1K one, a total of 10, if you apply for 2K memory at this time, the application will fail!!


3.2. Copy Algorithm

The "copy" algorithm is to solve the efficiency problem of "mark-clean".

It divides the available memory into two pieces of equal size according to capacity, and only uses one of them at a time, use half and lose half .The copy algorithm is to copy the "not garbage" object to the other half, and then delete the entire space. Every time the copy algorithm is triggered, it is copied to the other side, and the data in the memory is copied to the past

The advantage of this is that the entire half area is reclaimed every time, and there is no need to consider complex situations such as memory fragmentation when allocating memory. You only need to move the top pointer of the heap and allocate in order. This algorithm is simple to implement and efficient to run.

shortcoming:

  1. low space utilization

  2. If there is less garbage, the cost of copying more valid objects will be greater

insert image description here


3.3 Marking-Collating Algorithm

The copy collection algorithm will perform more copy operations when the object survival rate is high, and the efficiency will become lower. Therefore, the replication algorithm cannot generally be used in the old generation.

Aiming at the characteristics of the old age, a so-called "mark-sorting algorithm" is proposed. The marking process is still consistent with the "mark-sweep" process, but the subsequent steps are not to clean up the recyclable objects directly, butThe sequence table deletes the middle element, and there will be an element handling operation, so that all surviving objects are moved to one end, and then the memory outside the end boundary is directly cleaned up.

It solves the problem of low space utilization, the first shortcoming of the copy algorithm,
but obviously, the shortcoming of this method is that the efficiency is not high. If the space to be moved is relatively large, the overhead is also very high at this time

insert image description here


3.4, generational algorithm

Based on the above basic strategies, a compound strategy "generational recycling" has been developed

Through regional division, different regions and different garbage collection strategies are realized, so as to achieve better garbage collection. For different scenarios, use different algorithms, which is the design idea of ​​time-division generation algorithm.

Based on an empirical rule: if something exists for a long time, there is a high probability that it will continue to exist for a long time

The above rules are also valid for Java objects (there are a series of experiments and demonstration processes...). Java objects either have a very short life cycle or a very long life cycle. According to the length of the life cycle, different algorithms are used

Introduce a concept to the object. The unit of age is not year, but the rounds of GC (after a traversal of reachability analysis, it is analyzed that the object is not garbage). The
older the age, the longer the object exists

The current JVM garbage collection uses the "Generational Collection" algorithm. This algorithm has no new ideas, but divides the memory into several blocks according to the different life cycles of the objects. Generally, the Java heap is divided into the new generation and the old generation.

Which objects will enter the new generation? Which objects will enter the old generation?

  • New generation: Generally created objects will enter the new generation;
  • Old generation: large objects and objects that have survived N times (usually 15 times by default) garbage collection will be moved from the new generation to the old generation.

According to the above empirical rules, most of the objects in java are "live and die", and the life cycle is very short, so the survival area is small, and the Eden area is large, which is generally enough.

Therefore, it is not necessary to divide the memory space according to the ratio of 1:1, but to divide the memory (new generation memory) into **a larger Eden (Eden) space and two smaller Survivor (survivor)** Space, use Eden and one of Survivor each time (one of the two Survivor areas is called the From area, and the other is called the To area).

insert image description here

--process:

  • Objects that have just come out with an age of 0 are placed in the Eden area. After a round of GC, the object will be put into the survivor area

  • In the new generation, a large number of objects die every time garbage collection, and only a small number survive, so we use the replication algorithm;Move the surviving objects to the survivor area and release the entire Eden area

  • After the surviving area, it also needs to accept the test of GC periodically . If it becomes garbage, it will be released. If it is not garbage, copy it to another survivor area (only one of these two survivor areas is used at the same time), and copy back and forth between the two survivor areas (copy algorithm) . Since the survivor area is not large, the space waste here can also be accept.

    If the object has been copied back and forth between the two survivor areas many times, it will enter the old age at this time

  • The life cycle of objects in the old generation is generally longer, and periodic GC scans are also required, but the frequency is lower, and the "mark-clean" or "mark-sort" algorithm is used.

When the Survivor space is not enough, it needs to rely on other memory (old generation) for allocation guarantee.

HotSpot defaults the size ratio of Eden to Survivor to be 8:1, which means Eden:Survivor From:Survivor To = 8:1:1. Therefore, the available memory space of each new generation is 90% of the entire new generation capacity, and the remaining 10% is used to store surviving objects after recycling.

The replication algorithm flow implemented by HotSpot is as follows:

  1. When the Eden area is full, it will trigger the first Minor gc, and copy the surviving objects to the Survivor From area; when the Eden area triggers the Minor gc again, it will scan the Eden area and the From area, and garbage the two areas Recycling, the surviving objects after this recycling are directly copied to the To area, and the Eden and From areas are cleared.

  2. When Minor gc occurs in Eden, the Eden and To areas will be garbage collected, the surviving objects will be copied to the From area, and the Eden and To areas will be cleared.

  3. Some objects will be copied back and forth in the From and To areas, and exchanged 15 times in this way (determined by the JVM parameter MaxTenuringThreshold, which defaults to 15), and finally if they are still alive, they will be stored in the old generation

insert image description here

Interview question: Do you know about Minor GC and Full GC? Is there any difference between these two GCs?

  1. Minor GC, also known as the new generation GC: refers to the garbage collection that occurs in the new generation. Because most Java objects have the characteristics of eternity, Minor GC (using the copy algorithm) is very frequent, and the recovery speed is generally faster.

  2. Full GC is also called old age GC or Major GC: refers to garbage collection that occurs in the old age. Major GC appears, often accompanied by at least one Minor GC (not absolute, in the Parallel Scavenge collector there is a strategy selection process for direct Full GC). The speed of Major GC is generally more than 10 times slower than Minor GC.

insert image description here


Guess you like

Origin blog.csdn.net/qq_56884023/article/details/131755536
Recommended