Class loading mechanism Java Virtual Machine: JVM notes

Foreword

The virtual machine data that describes the classes loaded from Class file into memory, and verify the data, and converts parsing initialization, forming Java type can be used as a virtual machine, which is a virtual machine class loading mechanism.

  • Class loading process

    Starting class is loaded into the virtual machine memory, the memory location to unload its entire life cycle comprises: loading, verification, preparation, resolution, initialization, and the use of unloading, wherein the verification, preparation, three portions collectively connected parse . This order of occurrence of the seven stages shown in Figure 1-1.
    Figure 1-1: flow chart class loader

The figure above, loading, validation, preparation, initialization sequence and unload these five stages is fixed, the class loading process must begin step by step in accordance with this order, but the stage will not necessarily resolve: he can in no case and then start again after the initialization phase, which is to support the Java language run-time binding. Colleagues, which is above the stage are generally intersect one another mixing is usually invoked during the execution of a phase, another phase of activation (e.g. initialize another class within a class).

  • Class loading time

    Under what circumstances need to start the first phase of class loading process: the load? Java Virtual Machine specification does not perform the obligation, to this point to specific virtual machine to free grasp. But for the initialization phase, the virtual machine specification are strict rules and only 5 in the case of the class must be initialized (loading, validation, preparation naturally need to start before that) immediately.

    • Encountered new 、getstatic、putstatic、invokestaticduring these four byte code instructions, if the class does not carry out the State Reserve petrochemical, you need to trigger its initialization. This scenario generates four instructions is: Use newkeyword to instantiate an object, read, or only a class static variable (except static field is final modification, has been at compile the results into a constant pool) time, and call when a static method of a class.
    • Use java.lang.reflectwhen the package approach to reflect the class called, if the class has not been initialized, you need to trigger their initialization
    • When initializing a class, if the parent class has not been initialized, you need to trigger initialization of the parent class.
    • When a virtual machine is started, users need to develop a master class to be executed (contains main () method of the class), the virtual machine to initialize this master class.
    • When using a dynamic language support JDK1.7 if an java.lang.invoke.MethodHandleinstance of the final result of the analysis REF_getStatic、REF_putStatic、REF_invokeStaticmethod of the handle, and this handle corresponding to the class the method has not been initialized, it is necessary to trigger its initialization.

    For more than five kinds of triggers class scene is initialized, the virtual machine specification uses a very strong now scheduled for: 有且只有These five scenes of behavior is known to be active references a class, but in addition, all references to class methods are not trigger initialization, known as passive reference, as in the following example:

public class Parent {
    public static int a = 1;
    static {
        System.out.println("Parent init");
    }
}
public class Son extends Parent{
    static {
        System.out.println("Son init");
    }
}
   public static void main(String[] args) {
        System.out.println("args = [" + Son.a + "]");
    }
输出结果:
Parent init
args = [1]
复制代码

For static fields, only direct define the class of this field will be initialized , and therefore referred to by its subclasses static fields defined in the parent class, it will only trigger initialization of the parent class without triggering initialize the subclass, as to whether or not to trigger loading and verification subclasses, does not specify the virtual machine specification, this depends on the specific virtual machine, for the Sun HotSpot virtual machine, it can -XX:_TraceClassLoadingbe observed operations will lead to load a subclass of parameters.

In addition, to refer to the class by the array definition, it does not trigger this type of initialization.

   public static void main(String[] args) {
        Parent[] parentArry = new Parent[10];
    }
复制代码

After running the above code does not output anything, note did not trigger the Parentinitialization phase class. But this code which triggers another group called [Lxxx.xxx.Parent(前面的xxx指代类的包名)the initialization of the class, this is not looks a bit familiar, in front of the bytecode article to know [Lhere is represented by an array of objects. It is automatically generated by the virtual machine, and the Object class inheritance directly created by the operation of the bytecode instructions newarraytriggered. This class represents an element type is Parenta one-dimensional array, the array should have properties and methods (method can be called directly only the length of the user and clone) are implemented in this class. In the Java language, when the time array bounds checking to throw ArrayIndexOutOfBoundsExceptionexception, but the abnormality detection elements in the array type package is not accessed, but the array access encapsulated xaload、xastorebytecode instruction.

When you refer to a class of static final modification of the constants and are not triggered such initialization

public class Parent {
    public static final int a = 1;
    static {
        System.out.println("Parent init");
    }
}
  public static void main(String[] args) {
        System.out.println("args = [" + Son.a + "]");
    }
输出结果:
args = [1]
复制代码

Since a finalvalue of the constant a modified immutable, it will pass at compile time constant propagation optimization, the value stored in the constant 1 into the main class (class where the main method) of the constant pool, so after the main category of the actual constant reference 1 are converted to a reference to the main class constant pool itself, that is, in fact, the main class of class document and did not Parentlike was symbolic references, does not exist any class in the series after the two anomalies class got engaged.

Shelf interface ah process with the class loading process is slightly different for the interface need to do some special instructions: interface also has the initialization process, this point and the classes are the same, but the interface can not use the static{}statement block, but the compiler will still generate the interface <client>class constructor , to initialize the member variable defined in the interface. Interfaces and classes are differentiated regular described earlier need to initialize a third scenario: when a class initialization, requires all the parent class already been initialized. But one interface at initialization time, does not require an excuse for all the negative initialization is completed, only in real time using the excuse to negative (constants defined in the interface, such as references) will be initialized.

  • Step classloading

Then explain in detail the whole process of class loading, ie loading, validation, preparation, resolution, specific action to initialize this five-phase locking execution.

  • load

Load the class loading process of a stage in the loading phase, the virtual machine is completed about three things

  • 1. to obtain such a binary byte stream defined by the fully qualified name of a class.
  • 2. The byte stream represents static storage structure into a run-time data structure area method.
  • 3. In the memory generates a representative of this class Class object, as a method of access entry region of this class have various data.

For the other stages of the class load, an array of non-loading phase (to be exact, the operation of the loading phase is acquired in the class of binary byte stream) is the most controllable developer, may be used as the loading phase system provides boot class loader to complete, may be performed by the class loader to user-defined (e.g., byte cipher, then the custom class loader to load the decrypted class), developers can define their own class by obtaining loader to control the byte stream.

But the array classes are not created by the class loader, which is by the Java virtual machine created directly. However, the data type and class loader still have a very close relationship, because the type of the array elements like eventually to test the class loader to create, the process of creating an array of class to the following rules:

  • 1. If a reference type array type, it needs to load this type of component, then loading the identified class name space of the component type class loader, which is in a subsequent class loader will tell the .
  • 2. If when the underlying data type of the array type, Java virtual machine and the associative array is marked as the bootstrap class loader.
  • 3. Visibility array class visibility is consistent with its component type, if the component type is not a reference type, the visibility of that array will default to the public.

After the loading stage is completed, the virtual machine according to an external binary byte stream format of the virtual machine that is stored in the method area process area data storage format defined by the virtual machine itself, then instantiate a class in memory Class object (and not explicitly in the Java heap for the HotSpot virtual machine in terms of mClass objects rather special, though he was the target, but the method is stored inside the zone), this object will serve as a method to access these data types in the area The external interface.

Part loading phase and subsequent cross-connection phase is carried out, when the loading phase has not been completed, the connection phase may have begun, but those caught in the loading phase of the operation is still a connection phase.

  • verification

    Validation is connected to the first phase of the purpose of this step is to ensure that the information in line with the requirements of the current virtual machine byte stream Class file contains and does not jeopardize the safety of the virtual machine itself. If the virtual machine does not check the incoming byte stream, its full trust, then it may be because the byte stream loaded with harmful and cause the system to crash, so verification is a virtual machine for their own protection is an important work. From the 2011 release of "Java Virtual Machine Specification (JSE version 7)" from the point of view on the whole, the development phase will be largely completed verification operation the following four stages: file format validation, metadata validation, bytes code checking, verification of symbolic references.

  • 1 file format verification to verify whether the byte stream Class compliant file format, and can be the current version of the virtual machine process, such verification may contain the following points:

    • Whether the magic number 0xCAFEBABY beginning.
    • Major and minor version number in the current virtual machine processing range.
    • Is there a constant type (tag check mark constant) is not supported Constants pool.
    • It points to the constant variety of index values ​​pointing constants do not exist or do not meet the type of constants.
    • Whether non-compliance with UTF8-encoded data CONSTANT_Utf8_info types of constants.
    • Class files and documents in various parts of itself if there is to be deleted or other additional information. ......

The above is just a small part of the verification point, and the object is able to resolve the packet format conforms to the requirements of a Java data type correctly incoming byte stream. Only through this stage of Yanzhou, byte stream will enter the method of memory storage area, behind the three verification stages are all based on the method of obtaining the storage structure, no longer directly manipulate the byte stream.

  • 2 metadata validation second step is described bytecode information semantic analysis, to ensure that information which meets the requirements described in the Java language specification, verification point included at this stage are as follows:

    • Whether the class has a parent class (except Object, all classes should have a parent class).
    • Whether the parent of this class are not allowed to inherit the inherited class (the final modified class).
    • If the class is not an abstract class, whether to implement all the methods of its parent class or interface implementation requirements.
    • Class fields, methods, and if the parent is a contradiction (e.g. covering the final field of the parent class). ......
  • 3. Bytecode verification that the verification process is the most complex stage, the main purpose is the data flow and control flow analysis to determine the semantics of the program is legitimate, logical. After inspection metadata, this stage will verify class method body analysis to ensure the method validation class does not make harm to the security of virtual machines running events, such as:

    • Operand stack and data type of instruction code sequences can work together, for example, will not be stored in the operand stack int type data, loading Shique long.
    • Guaranteed jump instruction (GOTO) does not jump to the bytecode instructions other than the method body.
    • A method to ensure effective conversion type body when. ......

    If the body is not a class method by check, it is certainly a problem, but not necessarily through the check is completely safe, that is 通过程序去校验程序逻辑是无法做到绝对准确的.

    Virtual machine design team in order to avoid excessive consumption of time in the bytecode verifier stage, an optimized virtual machine after Javac of JDK1.6, a method in accordance with Code property attribute table called an increased StackMapTableattributes, the attribute describes the legitimacy of all the basic loss ah local variables and operand stack table proper operation at the beginning, during the bytecode verifier, these states do not need to derive the program according to the method body, only You need to check the StackMapTablerecord to attribute the legality, and the verified byte code is converted to type inference type checking, which saves some time.

  • 4. Verification of symbolic references

    The last stage occurs in the virtual machine will check symbol references into direct reference time, the conversion operation will occur in the third stage is connected in parsing, validation can be seen as symbolic references other than its own class (in each constant pool type character reference) information matching check, also need to check the following:

    • Whether the reference symbol by the fully qualified name string can be found described in the corresponding class.
    • Whether there is a coincidence method descriptor word and the name of the described method is simple and fields in the specified class.
    • Access Type symbolic reference classes, fields, methods may be whether the current type of access. The purpose ...... symbol references the validation is to ensure that the analysis operation can be performed properly. If not referenced by a symbol, it will throw an IncompatibleClassChangeErrorunusual subclass, for example NoSuchField(Method)Error.

    For virtual machine, the validation phase is an important, but not essential stage, if your code has been used repeatedly and verified, then the implementation phase could consider using -Xverify:noneparameters to shut down most of the class verification measures to shorten the time class load.

  • ready

    Preparation phase is the official allocate memory for the class variables and class variables set the initial phase value, these variables are used in the memory area allocated in the process. This stage is easy to confuse the two concepts need to emphasize: First of all, this time for memory allocation includes only class variables (static variables), and does not include the instance variables, instance variables will be assigned along with the object when the object is instantiated in the Java heap; secondly, here is the initial value of the field is that his zero value data type.

  public static  int number= 1;
  public static final int numberFinal= 123;
复制代码

In the example above numberthe initial value after the preparation phase 0 instead of 1, because this time is still not started and Java method, and the numbervalue of 1 of the putstaticinstruction after the program is compiled, stored in the class constructor <clinit>()among method, so the numberassignment for the action 1 will be executed only during the initialization phase.

However, in exceptional circumstances, if the field attribute table class field in the presence of ConstantValueproperty (the final modification), and that in the preparation phase variable numberFinalis initialized to the specified value. Javac will compile time to numberFinalgenerate the ConstantValueproperty, in the preparation phase will be based on a virtual machine ConstantValuesetting the value to 123.

  • Resolve

    Parsing stage is a virtual machine to a constant pool of symbolic references to the process of replacing direct references, symbols referenced in the JVM note: Java Virtual Machine constant pool mentioned many times, he in the Class file CONSTANT_Class_info、CONSTANT_Fieldref_info、CONSTANT_Methodref_infoappears other types of constants, that analytical phase as direct references and symbolic references, what relevance do?

    Reference symbol (SymbolicReferences) : Symbol reference to a certain set of symbols described in the referenced, literal symbols may be in any form, can be unambiguously positioned to the target as long as can be used. But the goal is not necessarily a reference has been loaded into memory, which is similar to a placeholder that represents the future we need to point to such a content, then at a later stage to replace it with a direct reference in many cases. Virtual machines can accept a variety of symbolic reference must be consistent, not as literal form of symbolic references clearly defined in the Class File Format Java Virtual Machine specification.

    A direct reference (SymbolicReferences) : direct reference may be direct pointer to the object, or a relative offset can be targeted to the target profile handle. And is a direct reference to the memory layout of the virtual machine implementation of relevant references translated in a different virtual machine instances are generally not the same as a direct reference to a symbol. If you have a direct reference to that target reference must already exist in memory.

    Virtual machine specification does not provide a specific time resolution phase occurred only in the implementation of the requirements anewarray、multianewarray、checkcast、getfield、getstatic、instanceof、invoke(dynamic,interfance,special,static,virtual)、ldc、ldc_w、new、putfield、putstaticbefore the 16-byte code, to parse the symbols they use references. So virtual machine implementation can be judged in the end it is when the class loader is loaded on to a symbolic constant pool references to parse, or wait until a symbolic reference would be to resolve it before it is used as needed.

    Apart invokedynamicthan instruction, a virtual machine implementation of the analytical results of the first cache, the amount of the pool always directly referenced in the log, and the constants are identified as resolution status, so as to avoid repeating the analysis operation, if the resolution succeeds, or a reference symbol fails, subsequent references to its resolution also should receive a success or irregularity notification.

    For invokedynamicinstructions, when it comes to a previously by the invokedynamictrigger instruction through symbolic references resolved, does not mean that the analysis results to other invokedynamicequally into force command. Because the invokedynamicpurpose of instruction has always been used for dynamic language support, it corresponds to a reference point called dynamic invocation qualifier, here meaning dynamic must wait until the program is to run this instruction when parsing action can take place. In contrast, the remaining commands can trigger resolved are static, that can be completed in just loading phase, it has not yet begun to start executing code parsing.

    The main analysis operation for the class or interface, fields, methods class, interface method, type method, and calls the method handle 7 points class qualifiers for symbolic references, here introduces four kinds of the foregoing, the last three with the new dynamic language support JDK closely related, being not much to do here to say, the first three are for the constant pool CONSTANT_(Class、Fieldref、Methodref、InterfaceMethodref)_info.

    1. Parsing the class or interface

    Assuming class Wreference should never parsed a symbol Nresolved to a class or interface Odirect reference, that the virtual machine to complete the process consists of the following three steps.

    • If Onot an array type, the virtual machine that will pass the fully qualified name represents N W to the class loader to load the class O. In the loading process, because the metadata validation, bytecode verification is required, it is possible to trigger action to load other classes, once the loading process appeared abnormal, analytical process is failed.

    • If Oan array type, and an array of objects of the type (descriptors [Lxxx / xxx), it will load the array element type according to the rules above, if the preceding N descriptor lock forms as assumed, it will load the object element type, then generates an array of objects by the virtual machine and the dimensions of this array of elements representative.

    • If the above two steps no anomaly is present in cthe virtual machine has actually become a valid class or interface, but also for verification of symbolic references before the completion of the analysis, confirm Wthe availability of Oaccess, if found not to have access authority, will throw IlleagalAccessErroran exception.

    2 Field resolve

    A nice field to resolve the unresolved symbolic references, will be first in the field of table class_indexentries in the index of CONSTANT_Class_infosymbolic references to parse, that is the symbol of class and interface reference field belongs, that is, to Understand field, we must first solution they are in class.

    • After parsed the class, if the class itself contains a simple field names and field descriptors are matched with the target, then direct return to the field of direct reference

    • If the class implements the interface, will be in accordance with the inheritance of each interface and a recursive search of his parent interface, if the interface contains a simple field names and field descriptors are matched with the target, then direct return directly reference the field.

    • If the class is not Object, it will recursively search the parent class in accordance with the inheritance, if the parent class contains a simple field names and field descriptors are matched with the target, then direct return directly reference the field.

    • If the above steps fail, then throw an NoSuchFieldErrorexception.

    • If you do not have the same access to the fields returned cited, throws IlleagalAccessErroran exception.

    • If a field of the same name also appears in the interface class and the parent class, or a plurality of interfaces found in their parent class, the compiler will refuse to compile possible.

    3. The method of parsing class

    Interpretation of the class method and a step of parsing the same field, but also need to parse out the method of the class is located. Then follow the steps subsequent search for a class method.

    • 1 constant type definition) type method and interface method reference symbols are separated (one Methodref, one InterfaceMethodref), if the class method table index is found in an interface, it will throw IncompatibleClassChangeErroran exception.

    • 2) When the first step, followed by a simple lookup contains the name and descriptor words are matched with the target class, a direct reference to returns directly to the process.

    • 3) Otherwise, the parent class contains a simple recursive lookup names and field descriptors are associated with a target match, then return to the direct method of direct reference.

    • 4) Otherwise, the class that implements a list of interfaces and their parent recursive lookup interface contains a simple name and field descriptors are matched with the target, if present, indicating that the class is an abstract class (if it is not an abstract class, this class will find this method), this time throws AbstractMethodErroran exception.

    • 5) the above steps will not work, throw an NoSuchMethodErrorexception.

    • 6) If you do not have the same access method returns a reference to the thrown IlleagalAccessErrorexception.

    4. Parses class method

    Same old, same interface methods also need to parse out the interface method table class_infosymbol class or interface methods want to index belongs reference. Then follow the steps subsequent search interface method.

    • 1) and analytical methods based contrast, if it is found in the interface method table is the interface corresponding to the interface rather than a class, is thrown IncompatibleClassChangeErrorexception.

    • 2) If you pass the first step, and then find that it contains the simple name and field descriptors are matched with the target in the interface, simply return a direct reference to the method.

    • 3) Otherwise, the parent interface interface recursive lookup, until the Object class, see if it contains a simple name and field descriptors are matched with the target, then direct return a direct reference to the method.

    • 4) the above steps will not work, throw an NoSuchMethodErrorexception.

    • 5) Because interface methods are public by default so there is no access, so the interface method does not throw IlleagalAccessErroran exception.

  • initialization

    The final step class initialization class loading process, the above class loading process, except during the load phase the user application can define the class loader from the outside of participation, the remaining operation is completely dominated by the virtual machine and control. To the initialization phase, really began to execute Java code in the class definition (or byte code).

    During the preparation phase, the variable has been assigned an initial value once the system requirements, and in the initialization phase, to initialize the class variables and other resources in accordance with procedures developed plan, expressed another way: initialization phase is a pointer to the class constructor <clinit>()method process.

    <clinit>()The method is automatic compiler phone class all class variables (static variables) assignment operation and static statements statement in the block merger, the order compiler collection was in order of appearance by the statement in the source file of the decision, static statement block only access to the static variables defined in a block of statements before you, define variables after it, you can assign a static statement in front of the block, but can not be accessed .

public class Parent {
    static {
        a=2;
        System.out.println("Parent init"+a);
    }
    public static  int a = 1;
}
复制代码

Above a code may be assigned in the code block, but nothing role, as will be later reassign a 1, and can not call the following class variable within a code block, displays illeagal forward referencean error

<clinit>()Construction methods and classes, that is, instances constructor <init>()different, it does not need to show it to call the parent class constructor, virtual opportunity to ensure that the subclass <clinit>()before execution method, the parent class's <clinit>()methods have been implemented, that is, the Father static class statement block due to the variables defined in the subclass assignment , so in a first virtual machine to be executed <clinit>()based method is certainly Object.

Results in the following example 2 is output, because the static assignment parent class than the first implementation subclass

public class Parent {
    public static  int a = 1;
    static {
        a=2;
    }
}
public class Son extends Parent{
      public static int b=a;
}
 public static void main(String[] args) {
        System.out.println("args = [" + Son.b + "]");
    }
复制代码

<clinit>()The method is not necessary, if a class is not a static block of statements, there is no assignment of class variables, the compiler may not be generated for the class <clinit>()method.

Interface block static statements can not be used, but there is still initialized variable assignment, so as to generate interfaces and classes are <clinit>()methods, but the classes are different interfaces, heart magic interface <clinit>()does not require to execute the parent interface <clinit>()method, only when the variable defined in the interface using the parent, the parent will initialize interfaces, Further, the interface implementation class initialization will not be performed when the interface <clinit>()method.

Virtual opportunity to ensure a class <clinit>()method is correct locking in a multithreaded environment, with no, if multiple threads to initialize a class, then only one execution thread back to class this <clinit>()method, other threads need to block waiting for, and this is the principle of static singleton implementation.

  • to sum up

    In this article from the "in-depth Java Virtual Machine", interested friends can look into this book.

Guess you like

Origin juejin.im/post/5dc0dc986fb9a04a680f5183