The Story of JVM - Virtual Machine Class Loading Mechanism

Virtual machine class loading mechanism


I. Overview

This chapter will explain how class files enter the virtual machine and how the virtual machine processes these class files. The Java virtual machine loads the class file into the memory, and verifies, converts, parses, and initializes the data, and finally forms a Java type that can be directly used by the virtual machine. This process is called the class loading mechanism of the virtual machine. These processes all occur while the program is running, giving Java extremely high scalability and flexibility.

2. Timing of class loading

From being loaded into the virtual machine memory to being unloaded from the memory, a class's life cycle will go through loading, verification, preparation, parsing, initialization, use, and unloading. Among them, the stages of verification, preparation, and analysis are collectively referred to as the connection process.
Insert image description here
However, these stages are not necessarily in the order shown in the figure. For example, the beginning of the parsing stage may be after initialization. This is to support Java's dynamic binding. The reason I say "start" here, rather than "finish", is that the various stages are intertwined with each other, rather than waiting for the end of the previous stage before entering the next stage.
The Java virtual machine specification does not restrict when a class is loaded, but there are strict restrictions on the initialization of a class. The class will be initialized only in the following six situations: (1) When new, getstatic, putstatic or invokestatic are
encountered During these four bytecode instructions, if the type is not initialized, it is initialized. The specific scenarios are: use the new keyword to instantiate an object; read or set a static field (except for the one modified by final, which has been put into the constant pool at compile time); when calling a static method of a type (2) use
java When the method of the .lang.reflect package performs a reflection call on a type, if the type is not initialized, it will be initialized.
(3) When initializing a class, if its parent class has not been initialized, its parent class will be initialized first
(4) When the virtual machine starts, the virtual machine will first initialize the main class to be executed specified by the user
(5) If a java.lang The final parsing result of the .invoke.MethodHandle instance is four types of method handles: REF_getStatic, REF_putStatic, REF_invokeStatic, and REF_newInvokeSpecial. If the class corresponding to the method handle has not been initialized, its initialization is triggered first. (6) Defines the new default added by JDK
8 method (interface method modified by the default keyword), the interface must be initialized before the implementation class of the interface.
For the third type, an interface is different from a class. An interface does not require all its parent interfaces to be initialized. Only when the parent interface is actually used will it be initialized.

3. Class loading process

(1) Loading
Loading is a stage of entire class loading. During the loading stage, the entire Java virtual machine needs to complete the following three things:
1. Obtain the binary byte stream that defines this class through the fully qualified name of a class
. 2. The static storage result represented by this byte stream is converted into a data structure at runtime in the method area.
3. Generate a java.lang.class object representing this class in the memory, representing various data access entries of this class in the method area.
These definitions It's not particularly specific, leaving a lot of room for virtual machines and Java. For example, obtaining this type of binary byte stream is not only obtained through class files, but can also be obtained from ZIP files (JAR, WAR, EAR), obtained from the network (WEB Applet), obtained from the database, and calculated at runtime. Generate (dynamic proxy Proxy), obtain from other files (JSP generates the corresponding class file).
For the process of obtaining the binary byte stream of the class, the non-array type loading stage is the most controllable stage. You can use virtual This can be accomplished using the machine's built-in boot class loader, or you can define your own class loader to obtain the binary byte stream.
The array class itself is not created through the class loader. It is dynamically created directly in the memory by the Java virtual machine. However, its loading is still closely related to the class loader, because the element types in the array still rely on the class loader to load .
If the elements in the array are still reference types, then look further in and mark the array in the namespace of the class loader that loads the component type until it is not a reference type. The Java virtual machine marks the array as associated with the boot class loader.
After the loading phase is completed, the binary byte stream outside the Java virtual machine is stored in the method area according to the format set by the virtual machine. After the type data is properly placed in the method area, a java will be instantiated in the Java heap memory. An object of the .lang.Class class, this object will serve as the external interface for the program to access the type data in the method area.

(2) Verification
Verification is the first step in the connection phase. This step is to ensure that the bytecode of the class file meets the constraint requirements and will not harm the JVM. The Java language itself is a relatively safe programming language. If you do something that accesses outside the boundaries of an array or jumps to a non-existent line of code, the compiler will refuse to compile and throw an exception. But not all class files are compiled from Java. Class files can be written as anything, so the Java virtual machine needs to check the input byte stream. There are roughly four action stages in verification: file format verification, metadata verification, bytecode verification and symbol reference verification.
1. File format verification: It is to check whether the file format of the bytecode file meets the standard:
whether it starts with the magic number 0XCAFEBABE
; whether the major and minor version numbers are within the acceptable range of the virtual machine; whether
the constant type in the constant pool correctly
points to the various indexes of the constant Whether the value points to a non-existent constant or a constant of an unreasonable type.
There are many verification points at this stage. The main purpose is to ensure that the input byte stream can be correctly parsed and stored in the method area. This stage is based on the binary byte stream. After this stage, the byte stream is allowed to enter the method area of ​​the virtual machine memory for storage. Subsequent verification is based on the storage structure of the method area and will not be read. , operate the byte stream.
2. Metadata verification: mainly semantic analysis of bytecode description information. Main verification:
Whether this class has a parent class (everything except java.lang.object should have a parent class) Whether the
parent class of this class inherits a class that is not allowed to be inherited (final modified class)
If the class is not an abstract class, Whether all the required methods in its parent class or abstract class have been implemented
3. Bytecode verification: This stage mainly verifies whether the program semantics are legal and logical. It mainly verifies the method body of the class (the code attribute of the class file) to ensure that the class method will not perform actions that endanger the security of the virtual machine during runtime, such as:
Ensure that the data type of the operand stack and the instruction code sequence can work together at any time.
Ensure that any instruction will not jump to instructions outside the method area.
Ensure that type conversion is legal. For example, you can assign subclass objects to parent class data. type, but it is unsafe to assign a parent class object to a subclass data type.
If the bytecode of a method body in a type does not pass bytecode verification, there must be something wrong with it. But if it passes the bytecode verification, there may not be any problems. Because you cannot use a program to check whether a program logic is correct.
In order to avoid spending too much time on bytecode verification, JDK6 added a new attribute "StackMapTable" to the attribute table of the method body Code attribute. These methods are verified when the javac compiler is compiled, and then the StackMapTable is marked. However, there are still security risks, and the StackMapTable may be modified before entering the virtual machine.
4. Symbolic reference verification: This stage occurs when the virtual machine converts symbolic references to direct references, and this conversion action occurs in the parsing stage. Symbol reference verification can be seen as a matching check for various types of information other than the class itself. It is usually necessary to verify:
can the corresponding class be found through the fully qualified name
? Whether there is a field descriptor that matches the method in the specified class and the method and field symbol reference described by the simple name. Verification
is mainly to ensure that the parsing phase can run normally. If not An exception will be thrown, typically java.lang.IllegalAccessError, java.lang.NoSuchFieldError, java.lang.NoSuchMethodError.
The verification phase is important, but not necessary. If a program is used and verified repeatedly, You can also use -Xverify:none to turn off verification to shorten the time for the virtual machine to load classes.

(3) Preparation
The preparation stage is the process of allocating memory and assigning initial values ​​to variables in the class (statically modified variables). According to the concept, the space allocated by these variables should be in the method area, but the method area is just a concept. After JDK8, class variables will be stored in the Java heap along with the Class object.
Memory allocation at this stage only includes class variables (static variables), not instance variables, which are allocated in the Java heap along with the object when the object is instantiated. The initial value assigned to a class variable is generally 0.
Insert image description here
As shown in the figure, the value assigned to value in the preparation stage is 0, because no Java method has been executed at this time, and the putstatic instruction assigned to value 123 is compiled and placed in the class constructor () method, and it is necessary to initialize the class The assignment will be done only at this stage.
Insert image description here
But if the final modified variable (constant) is like the picture above, it will be assigned a value of 123 during the preparation phase. When compiling, Javac will generate the ConstantValue attribute for value, and in the preparation phase, value will be assigned based on the value of ConstantValue.

(4) Parsing
The parsing stage is the process in which the Java virtual machine replaces symbolic references with direct references.
Symbol reference: Use a set of symbols to describe the referenced target. The symbol can be any literal, as long as the target can be located. The referenced target is not necessarily already loaded into the virtual machine.
Direct reference: A handle that directly or indirectly locates the target. The referenced target must exist in the virtual machine's memory.
The parsing action is mainly performed on seven types of symbolic references: class or interface, field, class method, interface method, method type, method handle and call point qualifier.
1. Class or interface parsing: Assuming that we are currently in class D, to resolve an unresolved symbolic reference N into a direct reference to a class or interface C, the following three steps are involved: (1
) If C is not an array type, then the virtual machine passes the fully qualified name representing N to the class loader of D to load class C. During the loading process, other classes may also be loaded. Due to metadata verification, bytecode verification, etc., failure anywhere means parsing failure.
(2) If C is an array type, it depends on its element type. If the element type is an object, load the array element type according to (1), and load the common type directly
(3) to complete the loading. It is also necessary to verify whether D has permission to access C. If not, java.lang.IllegalAccessError is thrown Exception
2. Field parsing: To parse a field, you must first parse the symbol reference of the class or interface to which the field belongs. If the parsing is successful, proceed to the next steps (the class or interface to which the field belongs is represented by c):
(1) If c itself contains a field whose simple name and field descriptor both match the target, the parsing ends.
(2) If c implements the interface, each interface and its parent interface will be recursively searched according to the inheritance relationship. If any interface contains matching fields, the parsing ends
(3) If c is not a java.lang.object, it will recursively search for the parent class of c according to the inheritance relationship. If there is a matching field, the parsing ends. (4) Otherwise, the
search fails and java.lang.NoSuchFieldError is returned
if If found, the field will be verified for permission. If there is no permission, a java.lang.IllegalAccessError exception will be returned.
If a field with the same name appears in both the parent class and the implemented interface of a certain class, the unique access field can still be determined according to the parsing rules, but the javac compiler may refuse to parse it into a class file.
2. Method analysis: The first step of method analysis is the same as field analysis. It is also necessary to first analyze the symbolic reference of the class or interface to which the method indexed in the class_index item of the method table belongs. The class or interface to which the method belongs is represented by c. The following steps will be followed to analyze:
(1) The constant type definition of the symbol reference of the method of the class and the method of the interface in the class file is separated. If the c indexed in the class_index is found in the method table of the class is an interface, then Throws java.lang.IncompatibleClassChangeError exception.
(2) If a target whose simple name and descriptor match the method is found in class c, it is directly referenced
(3) Otherwise, recursively search in the parent class of c
(4) Otherwise, in the interface implemented by class c and Search in the parent interface. If there is a matching method, it means that c is an abstract class and a java.lang.AbstractMethodError exception
(5) will be thrown. Otherwise, the search will fail and a java.lang.NoSuchMethodError exception will be thrown.

3. Interface method analysis: Interface method analysis also requires first parsing out the symbolic reference of the class or interface to which the method indexed in the class_index item of the interface method table belongs. Use c to represent this interface. The parsing process is as follows:
(1) Contrary to the class parsing method, if it is found that c is a class, the java.lang.IncompatibleClassChangeError exception will be thrown
(2) If there is a corresponding method in interface c, it will be directly referenced
(3) In c Search in the parent interface of c until the java.lang.Object class is found. If found, it will be directly referenced
(4) Since the java interface runs multiple inheritance, if multiple corresponding methods are found in different parent class interfaces of c, one of them will be returned
( 5) Otherwise, the search fails and a java.lang.NoSuchMethodError exception is thrown.

(5) Initialization
Class initialization is the last step in the class loading process. It is not until the initialization phase that the Java virtual machine begins to actually execute the Java program code written in the class. In the initialization phase, variables and other resources are initialized according to the plan made by the programmer through program coding. The initialization phase is the process of executing the class constructor () method, which is automatically generated by the Javac compiler. The () method is merged by the compiler automatically collecting the assignment actions of variables in the class and the static code blocks. The compiler collection order is based on the order of statements in the source file. In a static statement block, you can only access the statement block defined before it, and cannot access the statement block defined after it, but you can assign values ​​to the statement blocks after it.
The Java virtual machine ensures that the parent class has been executed before the () method of the subclass is executed, so the () method of java.lang.object must be executed first.
Since the () method of the parent class is executed first, the static statement block defined in the parent class will take precedence over the variable assignment operation of the subclass.
Insert image description here
As shown in the picture, the value of B should be 2 instead of 1.
If there are no static statement blocks and variable assignment statements in a class or interface, the compiler does not need to generate a () method for this class. Unlike classes,
executing the () method of an interface does not require first executing the () method of the parent interface. Method, the parent interface will be initialized only when the variables defined by the parent interface are used.

4. Class loader

The JAVA virtual machine design team intentionally places the action of "loading the binary byte stream of the class according to its fully qualified name" during the class loading process outside the Java virtual machine, so that the application can decide how to obtain the required information. Class, the code that implements this action is called a "class loader".
(1)
The class and the class loader jointly determine the uniqueness of the class in the Java virtual machine. Even if two classes originate from the same class file and are loaded by the same Java virtual machine, as long as the class loaders that load them are different, the two classes are definitely not equal. The equality here includes the return results of the equals() method, isAssignableFrom() method, and isInstance() method of the Class object representing the class.
Insert image description here
In the figure, a simple class loader is constructed, which loads a class and instantiates the object of this class. From the first line of output, we can see that this object is indeed instantiated by this class, but from the second line, we find that the type check between this object and the class returns false. This is because there are two classloadertests, one is a virtual machine The application class is loaded, and the other is loaded by our custom class loader. The two are two different classes in the Java virtual machine.

(2) Parental delegation model
From the perspective of the Java virtual machine, there are only two class loaders, one is the startup class loader, and the other is all other class loaders.
But from the perspective of a Java developer, class loaders should be divided into more details. Java has always maintained a three-tier class loader and parent-delegated class loading architecture. Most Java programs will use the following three class loaders provided by the system for loading:
Startup class loader: Responsible for loading the <JAVA_HOME>\lib directory, or classes stored in the path specified by the -Xbootclasspath parameter.
Extension class loader: It is responsible for loading all class libraries in the <JAVA_HOME>\lib\ext directory, or in the path specified by the java.ext.dirs system variable.
Application class loader: Responsible for loading all class libraries on the user class path (ClassPath). Developers can also use this class loader directly in the code. If the program does not define its own class loader, this is generally the default class loader in the program.
Insert image description here
The hierarchical relationship between various class loaders shown in Figure 7-2 is called the "parental delegation model" of class loaders. This is not a binding model, but a best implementation of a class loader recommended by Java designers to developers.
The workflow of the parent delegation model is: when the class loader receives a request to load a class, it will first delegate the request to the parent class loader to complete. Only when the final parent class loader cannot complete it will the child class loader complete it. .
An obvious benefit of using the parent delegation model is that classes in Java have a prioritized hierarchical relationship with their class loaders. For example, the class java.lang.Object will eventually be loaded by the top parent class loader (startup class loader) according to the parent delegation model. Even if the user writes a java.lang.obejct class, it can only be compiled normally and cannot be loaded and run.
Insert image description here
The logic of this code is: first check whether the class is loaded, if not, call the loadClass() method of the parent loader, if the parent class is empty, use the default startup class loader to load, if the parent class fails to load, then Use your own findClass() to load.

(3) Destruction of the Parental Delegation Model
Until the advent of Java modularity, the Parental Delegation model had been "destroyed" on a large scale three times.
The first time it was broken: it happened before the advent of the parental delegation model, that is, before JDK1.2. The concept of class loader and java.lang.ClassLoader existed in the first version of Java before the advent of the parental delegation model. Faced with existing user-defined class loader code, some compromises have to be made when referencing the parent delegation model. We can only try to guide users to rewrite the loading method in the findClass method instead of the loadClass method.
The second time it was broken: In the parental delegation model, the more basic classes are loaded by the higher-level class loader. But if the basic type calls user code (such as JNDI service), the startup class loader will not recognize the user code. In order to solve this dilemma, we had to introduce a less elegant design: Thread Context ClassLoader. You can use the thread context class loader to load the code related to the service provider interface. In fact, the parent class loader requests the child class loader to complete the class loading. This behavior violates the general principles of the parent delegation model.
The third time it was destroyed: It was caused by the user's pursuit of the dynamic nature of the program, such as module hot deployment and code hot replacement.
One of the standard proposals for hot deployment is OSGi. The key to realizing modular hot deployment in OSGi is the implementation of its custom class loader mechanism. In the OSGi environment, the class loader no longer follows the parent delegation model, but searches according to the mesh model.


Guess you like

Origin blog.csdn.net/weixin_45841848/article/details/132594533