[In-depth Java Virtual Machine] Part 4: Class Loading Mechanism

Please indicate the source for reprint: http://blog.csdn.net/ns_code/article/details/17881581

 

class loading process

    The entire life cycle of a class starts from being loaded into the virtual machine memory and unloads the memory, and its entire life cycle includes seven stages: loading, verification, preparation, parsing, initialization, use, and unloading. The order in which they start is shown in the following diagram:

    The class loading process includes five stages: loading, verification, preparation, parsing, and initialization. Among the five phases, the order in which the four phases of loading, verification, preparation and initialization occur is determined, while the parsing phase is not necessarily, it can start after the initialization phase in some cases, this is to support Java Runtime binding of the language (also known as dynamic binding or late binding). Also note that the phases here are started sequentially, not performed or completed sequentially, as these phases are usually intermixed with each other, usually invoking or activating one phase while another is being executed.

    Here is a brief description of binding in Java: binding refers to associating the invocation of a method with the class (method body) where the method is located. For java, binding is divided into static binding and dynamic binding:

 

  • Static binding: that is, early binding. The method has been bound before the program is executed, and it is implemented by the compiler or other linker at this time. For java, it can be simply understood as the binding at the compile time of the program. Only methods in java are final, static, private and constructors are early bound.
  • Dynamic binding: that is, late binding, also known as runtime binding. Binding at runtime is based on the type of the concrete object. In java, almost all methods are late bound.
    The following describes in detail what each stage of the class loading process does.
 

   load

 

    The first stage of the class loading process at load time, during the loading stage, the virtual machine needs to do three things:

    1. Obtain the binary byte stream defined by the fully qualified name of a class.

    2. Convert the static storage structure represented by this byte stream into the runtime data structure of the method area.

    3. Generate a java.lang.Class object representing this class in the Java heap as the access entry to the data in the method area.

    Note that the binary byte stream in item 1 here is not simply obtained from the Class file. For example, it can also be obtained from the Jar package, obtained from the network (the most typical application is Applet), and other files. Generation (JSP application) and so on.

    Compared with other stages of class loading, the loading stage (to be precise, the action of obtaining the binary byte stream of the class in the loading stage) is the most controllable stage, because developers can use the class loading provided by the system. You can use the class loader to complete the loading, or you can customize your own class loader to complete the loading.

    After the loading phase is completed, the binary byte stream outside the virtual machine is stored in the method area according to the format required by the virtual machine, and an object of the java.lang.Class class is also created in the Java heap, so that the Objects access these data in the method area.

    When it comes to loading, we have to mention the class loader, and the class loader will be described in detail below.

    Although the class loader is only used to implement class loading, its role in Java programs is far from limited to the class loading phase. For any class, its class loader and the class itself need to determine its uniqueness in the Java virtual machine, that is, even if two classes come from the same Class file, as long as their classes are loaded If the loader is different, the two classes must not be equal. The "equal" here includes the return results of the equals(), isAssignableFrom(), isInstance() and other methods of the Class object representing the class, and also includes the judgment result of the object ownership using the instanceof keyword.

    From the perspective of the Java virtual machine, there are only two different class loaders:

 

  • Startup class loader: It is implemented in C++ (this is limited to Hotspot, which is the default virtual machine after JDK1.5, there are many other virtual machines implemented in Java language), which is a part of the virtual machine itself.
  • All other class loaders: These class loaders are implemented by the Java language, independent of the virtual machine, and all inherit from the abstract class java.lang.ClassLoader, these class loaders need to be loaded into memory by the startup class loader Only then can other classes be loaded.

 

    From a Java developer's point of view, class loaders can be roughly divided into the following three categories:

 

  • Bootstrap ClassLoader: Bootstrap ClassLoader, same as above. It is responsible for loading the class library stored in JDK\jre\lib (JDK represents the installation directory of JDK, the same below), or in the path specified by the -Xbootclasspath parameter, and can be recognized by the virtual machine (such as rt.jar, all The classes starting with java.* are loaded by Bootstrap ClassLoader). The startup class loader cannot be directly referenced by a Java program.
  • Extension class loader: Extension ClassLoader, which is implemented by sun.misc.Launcher$ExtClassLoader, which is responsible for loading all classes in the JDK\jre\lib\ext directory or the path specified by the java.ext.dirs system variable Libraries (such as classes starting with javax.*), developers can directly use the extension class loader.
  • Application class loader: Application ClassLoader, this class loader is implemented by sun.misc.Launcher$AppClassLoader, which is responsible for loading the classes specified by the user class path (ClassPath), developers can use this class loader directly, if the application The program has not customized its own class loader. In general, this is the default class loader in the program.

 

     The application is loaded by the cooperation of these three class loaders. If necessary, we can also add custom class loaders. Because the ClassLoader that comes with the JVM only knows how to load standard java class files from the local file system, if you write your own ClassLoader, you can do the following:

 1) Automatically verify digital signatures before executing untrusted code.

 2) Dynamically create custom build classes that meet user-specific needs.

 3) Get the java class from a specific place, such as the database and the network.

In fact, when using Applet, a specific ClassLoader is used, because at this time it is necessary to load java classes from the network, and to check the relevant security information, and most of the application servers also use the custom ClassLoader technology.

 

    The hierarchical relationship of these class loaders is shown in the following figure:

    This hierarchical relationship is called the class loader's parent delegation model. We call the class loader above each layer the parent loader of the class loader of the current layer. Of course, the parent-child relationship between them is not achieved through inheritance, but the combination relationship is used to reuse the parent loader. code. This model was introduced during JDK1.2 and was widely used in almost all Java programs after that, but it is not a mandatory constraint model, but a class loader implementation recommended by Java designers to developers .

    The workflow of the parent delegation model is: if a class loader receives a class loading request, it will not try to load the class by itself first, but delegate the request to the parent loader to complete it, and so on. Therefore, all the Class loading requests should eventually be passed to the top-level startup class loader, and only if the parent loader does not find the required class in its search scope, i.e. the loading cannot be completed, the child loader will try to do it on its own. Load the class.

    Using the parent delegation model to organize the relationship between class loaders has an obvious advantage, that is, a Java class has a priority with its class loader (to put it bluntly, the directory where it is located). The hierarchical relationship is very important to ensure the stable operation of Java programs. For example, the class java.lang.Object is stored in rt.jar under JDK\jre\lib, so no matter which class loader wants to load this class, it will eventually be delegated to the startup class loader for loading, which is guaranteed here The Object class is the same class in various class loaders in the program.

 

 

   verify

    The purpose of verification is to ensure that the information contained in the byte stream in the Class file meets the requirements of the current virtual machine, and will not endanger the security of the virtual machine itself. Different virtual machines may implement class verification differently, but they generally complete the following four stages of verification: file format verification, metadata verification, bytecode verification, and symbol reference verification.

 

 

  • File format verification: Verify that the byte stream conforms to the specification of the Class file format and can be processed by the current version of the virtual machine. The main purpose of this verification is to ensure that the input byte stream can be correctly parsed and stored in the method area. . After this stage of verification, the byte stream will be stored in the method area of ​​the memory, and the next three verifications are based on the storage structure of the method area.
  • Metadata verification: Semantic verification is performed on the metadata information of the class (in fact, the syntax verification is performed on each data type in the class) to ensure that there is no metadata information that does not conform to the Java syntax specification.
  • Bytecode Verification: The main work of verification at this stage is to perform data flow and control flow analysis, and to verify and analyze the method body of the class to ensure that the methods of the verified class will not do harm to the security of the virtual machine at runtime. the behavior of.
  • Symbolic reference verification: This is the last stage of verification, which occurs when the virtual machine converts a symbolic reference into a direct reference (this conversion occurs in the parsing phase, which will be explained later), mainly for information other than the class itself (constant Various symbolic references in the pool) are checked for matching.
 

 

 

   Prepare

    The preparation phase is the phase in which memory is formally allocated for class variables and initial values ​​for class variables are set, all of which will be allocated in the method area. There are a few things to note about this stage:

 

    1. At this time, memory allocation only includes class variables (static), not instance variables. Instance variables will be allocated in the Java heap along with the object when the object is instantiated.

    2. The initial value set here is usually the default zero value of the data type (such as 0, 0L, null, false, etc.), rather than the value explicitly assigned in the Java code.

   Suppose a class variable is defined as:

public static int value = 3;

    Then the initial value of the variable value after the preparation phase is 0, not 3, because no Java method has been executed yet, and the putstatic instruction that assigns value to 3 is stored in the class constructor <clinit> after the program is compiled. () method, so the action of assigning value to 3 will not be executed until the initialization phase.

    The following table lists all the basic data types in Java and the default zero value for reference types:

   Here are a few more things to note:

 

  • For basic data types, for class variables (static) and global variables, if they are used directly without explicitly assigning them, the system will assign them a default zero value, while for local variables, before use It must be explicitly assigned, otherwise it will not pass at compile time.
  • For constants that are modified by both static and final, they must be explicitly assigned when they are declared, otherwise they will not pass at compile time; while constants that are only modified by final can either be explicitly assigned at the time of declaration, or It can be explicitly assigned a value when the class is initialized. In short, it must be assigned an explicit value before use, and the system will not assign a default zero value to it.
  • For the reference data type reference, such as array reference, object reference, etc., if it is used directly without explicit assignment, the system will assign it the default zero value, that is, null.
  • If no value is assigned to each element in the array when the array is initialized, the elements in it will be assigned the default zero value according to the corresponding data type.

 

    3. If the ConstantValue attribute exists in the field attribute table of the class field, that is, it is modified by both final and static, then the variable value will be initialized to the value specified by the ConstValue attribute in the preparation stage.

   Suppose the above class variable value is defined as: 

public static final int value = 3;

    When compiling, Javac will generate the ConstantValue property for the value. During the preparation stage, the virtual machine will assign the value to 3 according to the setting of ConstantValue. This is the case, recalling the second example of passive references to objects in the previous blog post . We can understand that static final constants put their results into the constant pool of the class that calls it at compile time.

 

 

   Parse

   The parsing phase is the process by which the virtual machine converts symbolic references in the constant pool into direct references. The differences and associations between symbolic references and direct references have been compared in the class file structure article, so I won't repeat them here . As mentioned earlier, the parsing phase may start before initialization or after initialization. The virtual machine judges as needed, whether to parse the symbol references in the constant pool when the class is loaded by the loader (before initialization), or wait until A symbolic reference is resolved before it is used (after initialization).
    It is very common to perform multiple resolution requests for the same symbolic reference. The virtual machine implementation may cache the result of the first resolution (record the direct reference in the runtime constant pool, and mark the constant as resolved) , so as to avoid repeated parsing actions.
    The parsing action is mainly performed for four types of symbol references, namely, classes or interfaces, fields, class methods, and interface methods, which correspond to the four constant types of CONSTANT_Class_info, CONSTANT_Fieldref_info, CONSTANT_Methodref_info, and CONSTANT_InterfaceMethodref_info in the constant pool respectively.
 
    1. Analysis of class or interface: Determine whether the direct reference to be converted is a reference to an array type or a common object type, so as to perform different analysis.
    2. Field parsing: When parsing a field, it will first check whether there is a field whose simple name and field descriptor match the target in this class. If so, the search will end; if not, it will follow the inheritance relationship. Recursively search for each interface implemented by the class and its parent interface from top to bottom. If there is none, then recursively search for its parent class from top to bottom according to the inheritance relationship until the end of the search. The search process is shown in the following figure:
 
   It is easy to see the search order of field resolution from the execution result of the following piece of code:
[java]  view plain copy  
 
  1. class Super{  
  2.     public static int m = 11;  
  3.     static{  
  4.         System.out.println( "The super class static statement block was executed");  
  5.     }  
  6. }  
  7.   
  8.   
  9. class Father extends Super{  
  10.     public static int m = 33;  
  11.     static{  
  12.         System.out.println( "Executed the parent class static statement block");  
  13.     }  
  14. }  
  15.   
  16. class Child extends Father{  
  17.     static{  
  18.         System.out.println( "Executed the subclass static statement block");  
  19.     }  
  20. }  
  21.   
  22. public class StaticTest{  
  23.     public static void main(String[] args){  
  24.         System.out.println(Child.m);  
  25.     }  
  26. }  
    The execution result is as follows:
    Execute the super class static statement block
    Execute the super class static statement block
    33
    If you comment out the line defined for m in the Father class, the output is as follows:
    Executed super class static statement block
    11
   In addition, it is obvious that this is the case of the first example in the previous blog post. Here we can analyze it as follows: static variables occur in the static analysis stage, that is, before initialization, at this time, the symbolic reference of the field has been converted into memory. reference, and it is associated with the corresponding class. Since no field matching m is found in the subclass, m will not be associated with the subclass, so it will not trigger the subclass. initialization.
    Finally, it should be noted that in theory, the search and analysis are performed in the above order, but in practical applications, the compiler implementation of the virtual machine may be stricter than that required by the above specification. The compiler may refuse to compile if a field with the same name appears in both the class's interface and the superclass's interface, or both in its own or superclass's interface. If some modifications are made to the above code, Super is changed to an interface, and the Child class inherits the Father class and implements the Super interface, then the following error will be reported when compiling:
StaticTest.java:24: ambiguous reference to m, variable m in Father and variable m in Super
both match
                System.out.println(Child.m);
                                        ^
1 error
        3. Class method parsing: The parsing of class methods is similar to the search steps for field parsing, except that there are more steps to determine whether the method is located in a class or an interface, and the matching search for class methods is to search for the parent class first. Search for the interface again.
    4. Interface method parsing: Similar to the class method parsing steps, the knowledge interface will not have a parent class, so just recursively search for the parent interface upwards.

    initialization

    Initialization is the last step in the class loading process. At this stage, the Java program code defined in the class is actually executed. In the preparation phase, the class variables have been assigned the initial values ​​required by the system, and in the initialization phase, the class variables and other resources are initialized according to the subjective plan specified by the programmer through the program, or it can be expressed from another angle: The initialization phase is the process of executing the class constructor <clinit>() method.
   Here is a brief description of the execution rules of the <clinit>() method:
    1. The <clinit>() method is generated by the compiler automatically collects assignments of all class variables in the class and merges the statements in the static statement block. The order in which the compiler collects is determined by the order in which the statements appear in the source file. It is decided that only the variables defined before the static statement block can be accessed in the static statement block, and the variables defined after it can be assigned in the previous static statement, but cannot be accessed.
    2. The <clinit>() method is different from the instance constructor <init>() method (the class constructor), it does not need to explicitly call the parent class constructor, and the virtual machine guarantees that the subclass's <clinit>() Before the method is executed, the <clinit>() method of the parent class has been executed. Therefore, the class whose <clinit>() method is executed first in the virtual machine must be java.lang.Object.
    3. The <clinit>() method is not necessary for a class or interface. If there is no static statement block in a class and no assignment to class variables, the compiler may not generate <clinit> ( )method.
    4. Static statement blocks cannot be used in interfaces, but there are still assignment operations initialized by class variables (final static), so interfaces and classes will generate <clinit>() methods. But the difference between interface fish is that executing the <clinit>() method of the interface does not need to execute the <clinit>() method of the parent interface first, and the parent interface will be initialized only when the variables defined in the parent interface are used. In addition, the implementation class of the interface will not execute the <clinit>() method of the interface when it is initialized.
    5. The virtual machine ensures that the <clinit>() method of a class is correctly locked and synchronized in a multi-threaded environment. If multiple threads initialize a class at the same time, only one thread will execute the <clinit> of this class. >() method, other threads need to block and wait until the active thread executes the <clinit>() method. If there is a time-consuming operation in the <clinit>() method of a class, it may cause multiple thread blocking, which is often hidden in practical applications.
 
    A simple example is given below to illustrate the above rules more clearly:
[java]  view plain copy  
 
  1. class Father{  
  2.     public static int a = 1;  
  3.     static{  
  4.         a = 2;  
  5.     }  
  6. }  
  7.   
  8. class Child extends Father{  
  9.     public static int b = a;  
  10. }  
  11.   
  12. public class ClinitTest{  
  13.     public static void main(String[] args){  
  14.         System.out.println(Child.b);  
  15.     }  
  16. }  
   Executing the above code will print 2, which means that the value of b is assigned to 2.
    Let's look at the steps to get this result. First allocate memory for class variables in the preparation phase and set the initial value of class variables, so that both A and B are assigned the default value of 0, and then assign them the values ​​specified in the program when calling the <clinit>() method. When we call Child.b, the <clinit>() method of Child is triggered. According to rule 2, before that, the <clinit>() method of its parent class Father must be executed first, and according to rule 1, after executing < When the clinit>() method is used, the relevant static statements need to be executed in the order in which they appear in the code, such as static statements or static variable assignment operations. Therefore, when the execution of Father's <clinit>() method is triggered, a will be first assigned as 1. Execute the statement in the static statement block, assign a value to 2, and then execute the <clinit>() method of the Child class, which will assign the value of b to 2.
    If we reverse the order of the "public static int a = 1;" statement and the "static statement block" in the Father class, after the program is executed, 1 will be printed. Obviously, according to Rule 1, when Father's <clinit>() method is executed, the content in the static statement block is executed first, and then the "public static int a = 1;" statement is executed.
    In addition, after reversing the order of the two, if you access a in a static block (such as assigning a to a variable), you will get an error at compile time, because according to rule 1, it can only assign a value to a , and cannot access it.


Summarize

     In the whole class loading process, except that the user application can participate in the loading stage by customizing the class loader, all other actions are completely dominated and controlled by the virtual machine. The Java program code (and bytecode) defined in the class is executed only after initialization, but the execution code here is just the beginning, it is limited to the <clinit>() method. The class loading process mainly loads the Class file (accurately speaking, it should be the binary byte stream of the class) into the virtual machine memory, and the bytecode operation is actually executed, and the real start only after the loading is completed.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325373223&siteId=291194637