Java virtual machine JVM principle

[Transfer] From http://blog.csdn.net/witsmakemen/article/details/28600127/

1. The life cycle of

　　the Java virtual machine: The life cycle of the Java virtual machine A running Java virtual machine has a clear task: Execute Java programs. It runs when the program starts and stops when the program ends. You run three programs on the same machine, and you have three running Java virtual machines. The Java virtual machine always starts with a main() method, which must be public, return void, and directly accept an array of strings. During program execution, you must specify to the Java virtual machine the class name that wraps the main() method. The Main() method is the starting point of the program, and the thread being executed is initialized as the initial thread of the program. All other threads in the program are started by him. There are two types of threads in Java: daemon and non-daemon. A daemon thread is a thread used by the Java virtual machine itself. For example, the thread responsible for garbage collection is a daemon thread. Of course, you can also set up your own program as a daemon thread. The initial thread containing the Main() method is not a daemon thread. As long as there are ordinary threads executing in the Java virtual machine, the Java virtual machine will not stop. If you have sufficient permissions, you can call the exit() method to terminate the program.

Second, the architecture of the java virtual machine:

a series of subsystems, memory areas, data types and usage guidelines are defined in the specification of the Java virtual machine. These components constitute the internal structure of the Java virtual machine. They not only provide a clear internal structure for the implementation of the Java virtual machine, but also strictly stipulate the external behavior of the Java virtual machine implementation.
     Each Java virtual machine consists of a class loader subsystem (class loader subsystem), which is responsible for loading the types (classes and interfaces) in the program and assigning a unique name. Every Java virtual machine has an execution engine (execution engine) responsible for executing the instructions contained in the loaded classes.
     The execution of the program requires a certain amount of memory space, such as bytecode, other additional information of the loaded class, objects in the program, method parameters, return values, local variables, intermediate variables processed, and so on. The Java virtual machine stores all this information in data areas. Although the data area is included in the implementation of each Java virtual machine, the data area is very abstract in the Java Virtual Machine Specification. Many of the architectural details are left to the Java virtual machine implementer. The memory structure of different Java virtual machine implementations varies widely. Some implementations may use a lot of memory, while others may use very little; some implementations may use virtual memory while others do not. This relatively refined Java virtual machine memory specification enables the Java virtual machine to be implemented on a wide range of platforms.
     Part of the data area is shared by the entire program, and other parts are controlled by separate threads. Each Java virtual machine contains a method area and a heap, which are shared by the entire program. After the Java virtual machine loads and parses a class, the information parsed from the class file is stored in the method area. Objects created during program execution are stored in the heap.
     When a thread is created, it is allocated only its own PC register "pc register" (program counter) and Java stack (Java stack). When the thread does not use the native method, the PC register holds the next instruction executed by the thread. The Java stack saves the state of a thread when a method is called, including local variables, parameters of the calling method, return values, and intermediate variables processed. State when calling native methods is kept in native method stacks, possibly in registers or other non-platform independent memory.
     The Java stack consists of stack frames (or frames). The stack block contains the state of the Java method invocation. When a thread calls a method, the Java virtual machine pushes a new block onto the Java stack, and when the method finishes, the Java virtual machine pops and discards the corresponding block.
     The Java virtual machine does not use registers to store intermediate results of calculations, but uses the Java stack to store intermediate results. This makes the Java virtual machine's instructions more compact and easier to implement on a device without registers.
     The Java stack in the figure grows downward, and thread three in the PC register is gray, because it is executing a native method, and its next execution instruction is not saved in the PC register.
Third, the class loader subsystem:

The class loader in the Java virtual machine is divided into two types: the original class loader (primordial class loader) and the class loader object (class loader objects). The raw class loader is part of the Java virtual machine implementation, and the class loader object is part of the running program. Classes loaded by different class loaders are separated by different namespaces.
     The class loader calls many other parts of the Java virtual machine and many classes in the java.lang package. For example, a class loading object is an instance of a subclass of java.lang.ClassLoader, and methods in the ClassLoader class can access the class loading mechanism in the virtual machine; each class loaded by the Java virtual machine will be represented as a java.lang.Class instance of the class. Like other objects, class loader objects and Class objects are stored in the heap, and the loaded information is stored in the method area.
     1. Loading, Linking and Initialization (Loading, Linking and Initialization)
The class loading subsystem is not only responsible for locating and loading class files, it does many other things according to the following strict steps: Class life cycle")
          1), Load: find and import binary information of specified types (classes and interfaces)
          2), Connect: verify, prepare and parse
               ①Verification: Ensure the correctness of the imported type ②Preparation
               : Allocate memory for the type and initialize it to the default value
               ③Parsing: Parse the character reference to direct drinking
          3) Initialization: Call Java code, initialize the class variable to the appropriate value
     2. Original The Primordial Class Loader (The Primordial Class Loader)
     Every Java virtual machine must implement a primitive class loader that can load classes that conform to the class file format and are trusted. However, the specification of the Java Virtual Machine does not define how to load classes, it is up to the Java Virtual Machine implementer to decide. For a type with a given type name, the original loader must find that type name plus ".class" file and load it into the virtual machine.
     3. Class loader object
     Although the class loader object is part of the Java program, three methods in the ClassLoader class can access the class loader subsystem in the Java virtual machine.
          1), protected final Class defineClass(…): Use this method to enter and exit a byte array to define a new type.
          2), protected Class findSystemClass(String name): Load the specified class, if it has been loaded, it will return directly.
          3), protected final void resolveClass(Class c): The defineClass() method just loads a class, and this method is responsible for subsequent dynamic connection and initialization.
     For details, see Chapter 8, "The Linking Model."
     4. Namespace
     When multiple class loaders load the same class, in order to ensure the uniqueness of their names, the identifier of the class loader that loads the class needs to be added before the class name. For details, see Chapter 8, "The Linking Model."
4. Method area:

In the Java virtual machine, the information of the loaded type is stored in the method area. The organization of this information in memory is defined by the implementer of the virtual machine. For example, if the virtual machine works on a "little-endian" processor, he can save the information in "little-endian" format, although In Java class files they are saved in "big-endian" format. Designers can use the most suitable representation format for the machine to store data to ensure that the program can be executed at the fastest speed. However, on a device with only a small amount of memory, the implementer of the virtual machine will not take up a lot of memory.
     All threads in a program share a method area, so methods to access method area information must be thread-safe. If you have two threads both loading a class called Lava, then only one thread is allowed to load the class, the other has to wait.
     When the program is running, the size of the method area is variable, and the program can be expanded when it is running. Some Java virtual machine implementations can also customize the initial size, minimum and maximum value of the method area through parameters.
     The method area can also be garbage collected. Because the classes in the program are dynamically loaded by the class loader, all classes may become unreferenced. When a class becomes this state, it may be garbage collected. Unloaded classes include two states, one is really not loaded, and the other is "unreferenced". See Chapter 7, The Lifetime of a Class, for details.
     1. Type Information (Type Information)
          Each loaded type will save the following information in the method area in the Java virtual machine:
          1) The fully qualified name of the type
          2) The type of The full name of the supertype (unless there is no supertype, or the Frey form java.lang.Object) (The fully qualified name of the typeís direct superclass)
          3), give the type a class or an interface (whether or not the type is a class)
          4), type modifiers (public, private, protected, static, final, volatile, transient, etc.) (The typeís modifiers)
          5), a list of all parent interface full names (An ordered list of the fully qualified names of any direct superinterfaces)
          The data structures held by the fully qualified names of any direct superinterfaces are defined by the virtual machine implementer. In addition, the Java virtual machine also saves the following information for each type:
          1), the constant pool for the type (The constant pool for the type)
          2), the type field information (Field information)
          3), type method information (Method information)
          4), all static class variables (non-constant) information (All class (static) variables declared in the type, except constants)
          5), a reference to the class loader ( A reference to class ClassLoader)
          6), a reference to a Class class (A reference to class Class)

          1), the constant pool for the type (The constant pool for the type)
          All types stored in the constant pool are used in order A collection of constants, containing literals such as strings, integers, floating-point constants, and symbolic references to types, fields, and methods. Each stored constant in the constant pool has an index, just like a field in an array. Because the character reference of types, fields, and methods used by all types in the constant pool is stored, it is also the main object of dynamic connection. See Chapter 6, "The Java Class File" for details.
          2), field information (Field information)
          field name, field type, field modifiers (public, private, protected, static, final, volatile, transient, etc.), and the order in which fields are defined in the class.
          3), type method information (Method information)
          Method name, method return value type (or void), number of method parameters, types and their order, field modifiers (public, private, protected, static, final, volatile, transient, etc.), method in class If the order defined in
          is not abstract and local, this method also needs to save
          the bytecode of the method, the size of the method's operand stack and the size of the local variable area (for details later), exception list (for details, see Chapter 17 Chapter "Exceptions".)
          4), class (static) variables (Class Variables)
          Class variables are shared by all instances of the class, and can be accessed even without the instance of the class. These variables are bound to the class (not to the instance of the class), so they are part of the logical data of the class. Before the Java virtual machine can use this class, it is necessary to allocate memory for class variables (non-final)
          constants (final) are handled differently from such class variables (non-final). When each type uses a constant, it will copy a copy to its own constant pool. Constants are also stored in the method area like class variables, except that they are stored in the constant pool. (Probably, class variables are shared by all instances, while constant pools are unique to each instance). Non-final class variables are kept as part of the data for the type that declares them, while final constants are kept as part of the data for any type that uses them. For details, see Chapter 6 "The Java Class FileThe Java Class File"
          5), A reference to class ClassLoader
          For each type loaded by the Java virtual machine, the virtual machine must save whether the type was loaded by the original class loader or the class loader. Those types loaded by the class loader must hold a reference to the class loader. This information is used when the class loader is dynamically linked. When a class references another class, the virtual machine must save that the referenced type is loaded by the same class loader, which is also the process by which the virtual machine maintains different namespaces. For details, see Chapter 8 "The Linking Model"
          6), A reference to class Class The
          Java virtual machine creates an instance of the java.lang.Class class for each loaded type. You can also
find or load a class through the methods of the Class class: public static Class forName(String className), and obtain an instance of the corresponding Class class. Through this instance of the Class class, we can access the information in the method area of the Java virtual machine. For details, refer to the JavaDoc of the Class class.
     2. Method Tables
     In order to access all the data stored in the method area more efficiently, the storage structure of these data must be carefully designed. In all method areas, in addition to saving the original information above, there is also a data structure designed to speed up access, such as a method list. For each loaded non-abstract class, the Java virtual machine will generate a method list for them, which saves the reference of all instance methods that the class may call, and reports the methods called in the parent class. For details, see Chapter 8 "The Linking Model"
5. Heap:

When a Java program creates an instance of a class or an array, it allocates memory for the new object in the heap. There is only one heap in the virtual machine, and all threads share it.
     1. Garbage Collection
     Garbage collection is the main method of releasing objects that are not referenced. It may also move objects to reduce heap fragmentation. Garbage collection is not strictly defined in the specification of the Java Virtual Machine, only that an implementation of a Java Virtual Machine must manage its own heap in some way. See Chapter 9, "Garbage Collection" for details.
     2. Object Representation
     The specification of the Java virtual machine does not define how objects are stored in the heap. Each object mainly stores the object variables defined in his class and parent class. For a given object reference, the virtual machine must quickly locate the object's data. In addition, it is necessary to provide a method of method object data through the reference of the object, such as the reference of the object in the method area, so the data saved by an object often contains a pointer to the method area in some form.
     One possible heap design is to divide the heap into two parts: the reference pool and the object pool. An object reference is a local pointer to the reference pool. Each entry in the reference pool contains two parts: a pointer to the object data in the object pool and a pointer to the object class data in the method area. This design can facilitate the defragmentation of the Java virtual machine heap. When the virtual machine moves an object in the object pool, it only needs to modify the pointer address in the corresponding reference pool. But every time the data of the object is accessed, the pointer needs to be processed twice. The diagram below demonstrates the design of such a heap. The HeapOfFish Applet in "Garbage Collection" in Chapter 9 demonstrates this design.
     Another design of the heap is that a reference to an object is a pointer to a bunch of data and an offset pointer to the corresponding object. This design facilitates the access of objects, but the movement of objects becomes extremely complicated. The image below demonstrates this design
     When a program attempts to convert an object to another type, the virtual machine needs to determine whether the conversion is the object's type, or its supertype. Similar things are done when the program uses the instanceof statement. When a program calls a method of an object, the virtual machine needs to perform dynamic binding, and it must determine which type of method to call. This also requires the above judgment.
     Regardless of which design a virtual machine implementer uses, he may keep a method list-like information for each object. Because it can improve the speed of object method calls, it is very important to improve the performance of the virtual machine, but the specification of the virtual machine does not require that a similar data structure must be implemented. The figure below depicts this structure. The figure shows all the data structures associated with an object reference, including:
          1), a pointer to type data
          2), a list of methods of an object. The method list is an array of pointers to all possible methods of the object to be called. The method data consists of three parts: the size of the opcode stack and the local variable area of the method stack; the bytecode of the method; and the exception list.
          Every object in the Java virtual machine must be associated with a lock (mutex) used to synchronize multiple threads. At the same time, only one object can own the lock of this object. When a lock owns the object, he can apply for the lock multiple times, but the lock must be released for the corresponding number of times before the object lock can be released. Many objects are not locked throughout their lifetime, so this information only needs to be added when needed. Many implementations of the Java virtual machine do not include "lock data" in the object's data, and only generate the corresponding data when needed. In addition to implementing object locking, each object is also logically associated with a "wait set" implementation. Locking helps group threads process shared data independently, without interfering with other threads. A "wait set" helps groups of threads cooperate to accomplish the same goal. "wait set" is often implemented through the wait() and notify() methods of the Object class.
     Garbage collection also needs information about whether objects in the heap are associated. The Java Virtual Machine specification states that garbage collection runs an object's finalizer method once, but allows the finalizer method to re-reference the object. When the object is not referenced again, the finalize method does not need to be called again. So the virtual machine also needs to save the information about whether the finalize method has been run. For more information, see "Garbage Collection" in Chapter 9.
     3. Array Representation
In Java, an array is an object in the full sense. Like an object, it is stored in the heap and has a pointer to an instance of the Class class. citations. All arrays of the same dimension and type have the same Class, regardless of the length of the array. The name of the corresponding Class is represented as dimension and type. For example, the Class name of an integer data is "[I", the Class name of a three-dimensional array of bytes is "[[[B", and the Class name of two-dimensional object data is "[[Ljava.lang.Object".
     Arrays must store on the heap the length of the array, the data of the array and a reference to some object array type data. Through an array reference, the virtual machine should be able to obtain the length of an array, access specific data through an index, and be able to call methods defined by Object. Object is the direct parent class of all data classes. See Chapter 6, "Class Files" for more information.
6. Basic structure:

From the logical structure of the Java platform, we can understand the JVM from the following figure:

From the above figure, we can clearly see the various logical modules contained in the Java platform, and we can also understand the difference between JDK and JRE.

The physical structure of the JVM itself

This figure shows the jvm memory structure

The JVM memory structure mainly includes two subsystems and two components. The two subsystems are the Classloader subsystem and the Executionengine (execution engine) subsystem; the two components are the Runtimedataarea (runtime data area) component and the Nativeinterface (local interface) component.

The role of the Classloader subsystem:

load the content of the class file to the methodarea (method area) in the Runtimedataarea according to the given fully qualified class name (such as java.lang.Object). Java programmers can extendjava.lang.ClassLoader class to write their own Classloader.

The role of the Executionengine subsystem:

Execute the instructions in classes. The core of any JVM specification implementation (JDK) is the Executionengine. The quality of different JDKs such as Sun's JDK and IBM's JDK mainly depends on the quality of the Executionengine they implement.

Nativeinterface component:

interacts with nativelibraries and is an interface for interacting with other programming languages. When you call the native method, you enter a new world that is no longer restricted by the virtual machine, so it is easy to have nativeheapOutOfMemory that the JVM cannot control.

RuntimeDataArea component:

This is what we often call the memory of the JVM. It is mainly divided into five parts -

1. Heap (heap): there is only one heap space in a Java virtual instance

2. MethodArea (method area): The information of the loaded class is stored in the memory of the Methodarea. When the virtual machine loads a type, it uses the class loader to locate the corresponding class file, then reads the content of the class file and transfers it to the virtual machine.

3. JavaStack (java stack): The virtual machine will only perform two operations directly on the Javastack: push or pop the stack in frame units

4. ProgramCounter (program counter): each thread has its own PC register, Also created when the thread starts. The contents of the PC register always point to the address of the next instruction to be executed, where the address can be a local pointer or an offset in the method area corresponding to the start instruction of the method.

5. Nativemethodstack (native method stack): save the address of the native method entry area

For the learning of JVM, in my opinion, these parts are the most important:

The entire process of Java code compilation and execution
JVM memory management and garbage collection mechanism

Java code compilation The entire process of
Java code compilation and execution is completed by the Java source code compiler. The flow chart is as follows:

The execution of Java bytecode is completed by the JVM execution engine. The flow chart is as follows:

The entire process of Java code compilation and execution The process includes the following three important mechanisms:

Java source code compilation mechanism
Class loading mechanism
Class execution mechanism
Java source code compilation mechanism
Java source code compilation consists of the following three processes: (javac –verbose outputs messages about what the compiler is doing)

Analysis and input to symbol table
annotation processing
Semantic analysis and generation of class files

The final generated class file consists of the following parts:

Structural information. Contains information about the class file format version number and
the . Information corresponding to declarations and constants in Java source code. Contains class/inherited superclass/implemented interface declaration information, field and method declaration information, and constant pool
method information. Corresponds to the information corresponding to the statements and expressions in the Java source code. Contains bytecode, exception handler table, evaluation stack and local variable area size, type record of evaluation stack, debugging symbol information
,

class loading mechanism
JVM class loading is done through ClassLoader and its subclasses. The hierarchical relationship and loading order can be described by the following figure:

1) Bootstrap ClassLoader /

All classes in jre/lib/rt.jar in $JAVA_HOME of Bootstrap ClassLoader are implemented by C++, not ClassLoader subclasses

2) Extension ClassLoader / Extension Class loader is

responsible for loading some jar packages of extended functions in the java platform, including jre/lib/*.jar in $JAVA_HOME or jar packages in the directory specified by -Djava.ext.dirs

3) App ClassLoader/ System class loader is

responsible for recording The jar package specified in the classpath and the class in the directory

4) Custom ClassLoader/user-defined class loader (subclass of java.lang.ClassLoader)

belongs to the ClassLoader customized by the application according to its own needs, such as tomcat and jboss according to the j2ee specification Implement ClassLoader by yourself

During the loading process, it will first check whether the class has been loaded. The inspection order is bottom-up, from the Custom ClassLoader to the BootStrap ClassLoader. It is checked layer by layer. As long as a classloader is loaded, it is considered to be loaded, and it is guaranteed that only all ClassLoaders of this class are loaded. once. The loading order is top-down, that is, the upper layer tries to load this class layer by layer.

Introduction and Analysis
of Here, it should be emphasized that the JVM uses the parent delegation mechanism by default when loading classes. In layman's terms, when a specific class loader receives a request to load a class, it first delegates the loading task to the parent class loader, recursively, and returns successfully if the parent class loader can complete the class loading task; Only when the parent class loader cannot complete the loading task, it will load it by itself.

Class Execution Mechanism
　　The JVM executes class bytecodes based on a stack-based architecture. After the thread is created, a program counter (PC) and a stack (Stack) will be generated. The program counter stores the offset of the next instruction to be executed in the method, and the stack stores one stack frame, each stack frame corresponds to each Each time the method is called, the stack frame is composed of two parts: the local variable area and the operand stack. The local variable area is used to store the local variables and parameters in the method, and the operand stack is used to store the generated data during the execution of the method. Intermediate results.

Memory management and garbage collection
JVM memory composition structure
The JVM stack consists of heap, stack, native method stack, method area and other parts. The structure diagram is as follows:

JVM memory recycling

Sun's JVMGenerationalCollecting (garbage collection) principle is as follows: divide objects into For Young, Tenured, and Perm, different algorithms are used for objects with different life cycles. (Based on object life cycle analysis)

1.Young (young generation)

The young generation is divided into three districts. One Eden area, two Survivor areas. Most objects spawn in the Eden area. When the Eden area is full, the surviving objects will be copied to the Survivor area (one of the two). When the Survivor area is full, the surviving objects in this area will be copied to another Survivor area. When it is full, the objects copied from the first Survivor area and still alive at this time will be copied to the Tenured area. It should be noted that the two areas of the Survivor are symmetrical and have no relationship, so the same area There may be objects copied from Eden and objects copied from the previous Survivor at the same time, and only the objects copied from the first Survivor are copied to the old area. Moreover, there is always an empty Survivor area.

2. Tenured (Old generation)

The old generation stores objects that survive from the young generation. Generally speaking, the old generation stores objects with a longer lifespan.

3. Perm (persistent generation)

is used to store static files. Today, Java classes, Methods, etc. Persistent generation has no significant impact on garbage collection, but some applications may dynamically generate or call some classes, such as hibernate, etc. In this case, a relatively large persistent generation space needs to be set to store these newly added classes during operation .The persistent generation size is set by -XX:MaxPermSize=.

For example: when objects are generated in the program, normal objects will allocate space in the young generation, and if they are too large, they may be directly generated in the old generation (It is observed that when a program is running, a space of 10 megabytes will be generated each time for sending and receiving messages, and this part of the memory will be allocated directly in the old generation.) The young generation will initiate memory reclamation when the space is allocated. Part of the memory will be reclaimed, and part of the surviving memory will be copied to the from area of the Survivor. After multiple reclamations, if the memory in the from area is also allocated, the memory will also be reclaimed and the remaining objects will be copied to the to area. Wait until to When the area is also full, memory reclamation occurs again and the surviving objects are copied to the old area.

Usually, the JVM memory reclamation we are talking about always refers to the heap memory reclamation. Indeed, only the contents of the heap are allocated dynamically. Therefore, the young and old generations of the above objects refer to the JVM's Heap space, while the persistent generation is It is the MethodArea mentioned earlier, not a Heap.

Some suggestions on JVM memory management
1. Manually set the generated useless objects and intermediate objects to null to speed up memory recovery.

2. Object pooling technology If the generated objects are reusable objects, but the attributes are different, you can consider using object pooling to reduce the generation of objects. If there are free objects, they will be taken out of the object pool for use, and no new objects will be generated, which greatly improves the reuse rate of objects.

3. JVM tuning can improve the speed of garbage collection by configuring JVM parameters. If there is no memory leak and the above two methods cannot guarantee JVM memory recovery, JVM tuning can be considered to solve the problem, but it must be After a long-term test of the physical machine, because different parameters may cause different effects. Such as -Xnoclassgc parameters and so on.

Java virtual machine JVM principle

Guess you like