JVM - class loading, linking and initialization

Please indicate the source for reprinting: https://blog.csdn.net/l1028386804/article/details/80068986

In this article we discuss the loading, linking and initialization of Java classes. The representation of Java byte code is a byte array (byte[]), and the representation of Java classes in the JVM is an object of the java.lang.Class class. A Java class needs to go through the three steps of loading, linking and initializing from byte code to being able to be used in the JVM. In these three steps, what is directly visible to developers is the loading of Java classes. By using the Java class loader (class loader), a Java class can be dynamically loaded at runtime; while linking and initialization are before using the Java class. action that will take place. This article will detail the process of loading, linking and initializing Java classes.

First, the loading of Java classes

The loading of Java classes is done by the class loader. Generally speaking, class loaders are divided into two categories: bootstrap and user-defined class loaders. The difference between the two is that the startup class loader is implemented by the native code of the JVM, while the user-defined class loader inherits from the java.lang.ClassLoader class in Java. In the part of the user-defined class loader, the JVM generally provides some basic implementations. Developers of applications can also write their own classloaders as needed. The most commonly used in the JVM is the system class loader (system), which is used to initiate the loading of Java applications. The class loader object can be obtained through the getSystemClassLoader() method of java.lang.ClassLoader. The final function that the class loader needs to complete is to define a Java class, that is, to convert the Java byte code into an object of the java.lang.Class class in the JVM. But the process of class loading is not so simple. Java class loader has two more important features: hierarchical organization and proxy mode. Hierarchical organization means that each class loader has a parent class loader, which can be obtained through the getParent() method. Class loaders are organized together in this parent-child fashion, forming a tree-like hierarchy. The proxy mode means that a class loader can either complete the definition of Java classes by itself, or it can be delegated to other class loaders to complete. Due to the existence of the proxy pattern, the class loader that starts the loading process of a class and the class loader that finally defines the class may not be the same. The former is called the initial class loader, while the latter is called the defining class loader. The relationship between the two is that the defining class loader of a Java class is the initial class loader of other Java classes imported by that class. For example, if class A imports class B through import, the class loader defined by class A is responsible for starting the loading process of class B. A normal class loader first delegates to its parent class loader before attempting to load a Java class by itself. When the parent class loader cannot find it, it will try to load it by itself. This logic is encapsulated in java.lang. In the loadClass() method of the ClassLoader class. In general, a parent-first strategy is good enough. In some cases, it may be necessary to take the opposite strategy, that is, try to load it yourself, and then proxy to the parent class loader if it cannot be found. This practice is common in Java web containers and is recommended by the Servlet specification. For example, Apache Tomcat provides an independent class loader for each Web application, which uses its own priority loading strategy. IBM WebSphere Application Server allows Web application selection

Second, the strategy used by the class loader

An important use of a class loader is to create an isolated space in the JVM for Java classes of the same name. In the JVM, judging whether two classes are the same is not only based on the binary name of the class, but also the class loader according to the definition of the two classes. Two classes are considered the same only if they are exactly the same. Therefore, even if the same Java byte code is defined by two different class loaders, the resulting Java classes are different. A java.lang.ClassCastException is thrown if an assignment is attempted between objects of two classes. This feature allows Java classes of the same name to coexist in the JVM. In practical applications, it may be required that different versions of a Java class with the same name can exist simultaneously in the JVM. This need can be met through a class loader. This technology has been widely used in OSGi.
Second, the linking of
Java classes The linking of Java classes refers to the process of merging the binary code of Java classes into the running state of the JVM. This class must be loaded successfully before linking. The linking of classes includes several steps such as validation, preparation, and parsing. Validation is used to ensure that the binary representation of a Java class is structurally correct. If there is an error in the verification process, a java.lang.VerifyError error will be thrown. The preparation process is to create static fields in the Java class and set the values of these fields to default values. The preparation process does not execute code. A Java class will contain formal references to other classes or interfaces, including its parent classes, implemented interfaces, method parameters, and Java classes that return values. The process of parsing is to ensure that these referenced classes can be found correctly. The parsing process may cause other Java classes to be loaded. Different JVM implementations may choose different resolution strategies. One approach is to recursively resolve all dependent formal references at link time. Another approach might be to parse only when a formal reference is really needed. That is to say, if a Java class is only referenced, but not actually used, then the class may not be resolved. Consider the following code:

public class LinkTest {
	public static void main(String[] args) {
		ToBeLinked toBeLinked = null;
		System.out.println("Test link.");
	}
}

The class LinkTest references the class ToBeLinked, but doesn't actually use it, just declares a variable, and doesn't create an instance of the class or access its static fields. In Oracle's JDK 6, if you delete the compiled Java byte code of ToBeLinked and then run LinkTest, the program will not throw an error. This is because the ToBeLinked class is not actually used, and the linking strategy adopted by Oracle's JDK 6 is such that the ToBeLinked class will not be loaded and thus will not find that the Java bytecode for ToBeLinked does not actually exist. If you change the code to ToBeLinked toBeLinked = new ToBeLinked(); and then run it in the same way, an exception will be thrown. Because the ToBeLinked class is actually used at this time, it will need to be loaded.

3. Initialization of Java classes

When a Java class is actually used for the first time, the JVM will initialize the class. The main operations of the initialization process are executing static code blocks and initializing static fields. Before a class is initialized, its immediate parent class also needs to be initialized. However, the initialization of an interface does not cause the initialization of its parent interface. During initialization, static code blocks and initialized static fields are executed sequentially from top to bottom in the source code. Consider the following code:

public class StaticTest {
	public static int X = 10;
	public static void main(String[] args) {
		System.out.println(Y); //output 60
	}
	static {
		X = 30;
	}
	public static int Y = X * 2;
}

In the above code, during initialization, the initialization of the static field and the execution of the static code block will be executed sequentially from top to bottom. Therefore, the value of variable X is first initialized to 10, and then assigned to 30; and the value of variable Y is initialized to 60.
Initialization of Java classes and interfaces occurs only at specific times, including:

Create an instance of the Java class. Such as

MyClass obj = new MyClass()

Call a static method in a Java class. Such as

MyClass.sayHello()

Assign values to static fields declared in a Java class or interface. Such as

MyClass.value = 10

Accesses a static field declared in a Java class or interface, and the field is not a constant variable. Such as

int value = MyClass.value

Execute the assert statement in the top-level Java class.
Classes and interfaces may also be initialized through the Java reflection API. It should be noted that when accessing a static field in a Java class or interface, only the class or interface that actually declares the field will be initialized. Consider the following code:

package com.lyz.test;

class B {
	static int value = 100;
	static {
		System.out.println("Class B is initialized."); // 输出
	}
}

class A extends B {
	static {
		System.out.println("Class A is initialized."); // will not output
	}
}

public class InitTest {
	public static void main(String[] args) {
		System.out.println(A.value); // output 100
	}
}

In the above code, the class InitTest refers to the static field value declared in class B through A.value. Since value is declared in class B, only class B will be initialized, and class A will not be initialized.

4. Create your own class loader

During Java application development, it may be necessary to create the application's own classloader. Typical scenarios include implementing specific Java bytecode lookup methods, encrypting/decrypting bytecodes, and implementing isolation for Java classes of the same name. Creating your own class loader is not a complicated thing, you just need to inherit from the java.lang.ClassLoader class and override the corresponding method. There are many methods provided in java.lang.ClassLoader. Here are a few to consider when creating a class loader:

defineClass(): This method is used to complete the conversion from the byte array of Java byte code to java.lang.Class. This method cannot be overridden and is generally implemented in native code.
findLoadedClass(): This method is used to find loaded Java classes by name. A class loader does not repeatedly load a class with the same name.
findClass(): This method is used to find and load Java classes by name.
loadClass(): This method is used to load a Java class by name.
resolveClass(): This method is used to link a Java class.

What is more confusing here is the role of the findClass() method and the loadClass() method. As mentioned earlier, during the linking process of Java classes, the Java classes need to be parsed, and the parsing may cause other Java classes referenced by the current Java class to be loaded. At this time, the JVM loads other classes by calling the loadClass() method of the current class's definition class loader. The findClass() method is an extension point for the class loader created by the application. Applications with their own class loaders should override the findClass() method to add custom class loading logic. The default implementation of the loadClass() method will be responsible for calling the findClass() method. As mentioned earlier, the proxy mode of the class loader uses the parent class first strategy by default. The implementation of this strategy is encapsulated in the loadClass() method. If you want to modify this policy, you need to override the loadClass() method.
The following code shows a common implementation pattern for custom class loading:

public class MyClassLoader extends ClassLoader {
	protected Class<?> findClass(String name) throws ClassNotFoundException {
		byte[] b = null; //find or generate the byte code of the Java class
		return defineClass(name, b, 0, b.length);
	}
}