Virtual machine bytecode execution engine (a)

Code is compiled results from native code into byte code, is a small step in the development of storage format, but it is a big step in the development of programming languages.

Outline

The execution engine is one of the core component of the Java virtual machine. Virtual machine is a concept with respect to a physical machine, the two codes have the ability to execute the code, which is the difference between a physical machine execution engine is built directly on the processor, the hardware, the instruction set and an operating system, and the virtual machine execution engine is implemented by themselves, and therefore can develop their own instruction set architecture and execution engine. And is not capable of executing instructions directly supported by hardware set format.

Developed a conceptual model of a virtual machine byte code execution engine in the Java Virtual Machine Specification, this conceptual model into a unified appearance of a variety of virtual machine execution engine. Inside achieved in different virtual machine execution engine in the implementation of java code compiler may be interpreted and executed two options may also be a combination of both, even may contain several different levels of compiler execution engine. But in appearance, all of the Java virtual machine execution engine is the same, the input is byte code file, the process is equivalent byte code parsing process, the output is the result.

Runtime stack frame structure

Stack frames are used to support virtual machine data structures and methods for performing the method calls, which is a virtual machine runtime data area from virtual machine stack stack elements. Storing a local variable stack frame methodology, the operand stack, and a method for dynamic link return address and other information. Each method invocation process from start to execution completion, corresponds to a stack frame to push the process stack in the virtual machine .

Each stack frame includes a local variable table, the operand stack, the dynamic connection, the method returns an address and some additional extensions . At compile program code, the stack frame requires much local variable table, how deep the operand stack have been completely determined, and written into the method table of code property respectively, so that a stack frame how much memory to allocate, are not the impact of the program will be run variable data, but only depending on the particular virtual machine implementation.

Chain method calls a thread can be long, many methods are simultaneously being executed. For the execution engine, in active threads, only the top of the stack stack frame is the most effective, called the current stack frame . This method is called a stack frame associated with the current method . All bytecode instruction execution engine is operated only for the current stack frame.

Detailed local variable table below about the stack frame, the operand stack, dynamic linking, the data structures and the method returns to the role of various portions of address.

Local variable table

Local variable table is a set of variable values storage for storing local variables inside method parameters and method definitions. When a Java program is compiled class files, the method in max_locals code attribute data item to determine which local variables allocated tables required maximum capacity .

The capacity of the local variable table with variable slot (slot) is the smallest unit , the virtual machine specification does not explicitly specify a slot should be occupied memory space , but it comes very oriented to each slot should be able to store a boolean, byte , char, short, int, float , reference or returnAddress types of data, these eight data types, you can use 32 or less physical memory to hold, but the description clearly states that "each slot occupied by 32 the length of the memory space, "there are some differences, which allows slot length can vary considerably with different processors, operating systems or virtual machines. As long as even if the 64-bit physical memory space 64 to achieve a virtual slot machine, the virtual machine still use the alignment and padding means makes a consistent look with the slot 32 in the virtual machine appearance.

For 64-bit data types , virtual machine to the upper alignment manner assign two consecutive slot space . Java language, explicit (reference type is probably 32 may be 64) 64-bit data types only long and double two kinds. It is worth mentioning, where the data types long and double split bank practices a write data types long and double division and "non-atomic agreements long and double" in the read-write approach some two 32-bit Similarly, however, because the local variable table based on the thread's stack , a thread private data , regardless of whether read and write two consecutive slot is an atomic operation, no cause of data security problems .

Virtual machine index locating a local variable table, indexed by value ranges from 0 to start the maximum number of local variable table slot. If the access is a 32-bit data types of variables, using the index n represents the n-th slot, if the data type is a 64-bit variable, it will be described using both n and n + 1 two slot. For two 64-bit data stored together in a slot adjacent to, does not allow in any way in which access to a separate, Java Virtual Machine Specification clearly requires a sequence of bytecodes If you have this operation, the virtual machine in stage should check the class loading phase of throwing an exception .

In performing the methods of the virtual machine using a local variable table complete a parameter value to the transmission process parameter variable list , and if the execution is an instance method (a non-static method), then the local variable table 0th bit index of slot defaults a method for transmitting a reference to the object instance belongs, in the method by this to access the implicit parameter. The remaining parameters are arranged in the order of the parameter table, from a local variable occupancy starting slot, after completion of the parameter allocated to the table, then the remaining slot assignment according to the order of variables and methods defined within the scope thereof.

In order to save space as the stack frame, the local variable table slot can be reused , variables defined within the body method, which scope is not necessarily cover the entire body of the method , if the current byte code value of the counter has exceeded the PC scope of a variable , then this variable corresponding slot can be passed on to other variables. This variable corresponding slot that can be passed on to other variables. However, such a design in addition to saving space outside the stack frame, will be accompanied by a number of additional side effects , such as, in some cases, slot reuse garbage collection will directly affect the behavior of the system, look at the following three examples:

One:

 public static void main(String[] args) {
         byte[] placeholder = new byte[64*1024*1024];
         System.gc();
   }

Code is a very simple, namely memory 64M filled with data, and then notify the virtual machine garbage collection. We add in a virtual machine operating parameters " -verbose: gc " to observe the garbage collection process, and did not find it recovered after System.gc 64MB of memory to run. No recovery placeholder share memory can be said in the past, because in the implementation of gc, is still in the placeholder variable within the scope of the virtual machine will naturally not recovered placeholer memory. Then we modify the code into the code II:

public static void main(String[] args) {
      {
         byte[] placeholder = new byte[64*1024*1024];
      }
      System.gc();
   }

After the addition of braces, placeholder scope is limited to the braces, from the code logically, in the implementation of system.gc, placeholder has no longer been visited, but about the implementation of this program, you will find operating results following, or have not been recovered 64M of memory, this is why?

Before explaining why, let's take a second to modify this code, adding a line before calling System.gc int a = 0, the code becomes like 3.

public static void main(String[] args) {
      {
         byte[] placeholder = new byte[64*1024*1024];
      }
      int a = 0;
      System.gc();
   }

This change seems Mermin wonderful, but run the program, but found that the memory really is properly recycled. Code in one to three, the root cause is the placeholder can be recovered: if the local variable table there is a reference slot on placeholder array object. The first modification, although the code has left the scope of the placeholder, but after this, no read or write operation on the local variable table, placeholder slot originally occupied by yet other variables are multiplexed , so as GC Roots local variable table that is part remained associated to it. This association was not interrupted in time, in most cases the influence is very slight. However, if you encounter a method that after the code has some very long time-consuming operation, while the front has defined a large amount of memory, variables will not actually be used to manually set it to null (used in place of it a = 0, the variable corresponding local variable table empty slot) it is not necessarily an absolute meaningless operation, which can serve as a very special case of a large (memory for the object, the stack frame of this method can not be prolonged is recovered, the number of method calls reach the JIT compiler conditions) "clever" under use.

While the above describes the actions assigned null values in some cases indeed be useful, but the author's view is that the operation should not be null value to excessive dependence, but no need to think of it as a universal coding rules to promote . There are two reasons, from coding perspective, with the proper scope of the variable to control the variable time is the most elegant solution recovery . As the code above scenario is rare, but the key is, from the perspective of the implementation, using a null value assigned to optimize memory recovery operation is built on top of the bytecode execution engine concept model , the concept model and the actual in the implementation process looks equivalent external , internal looked it may be completely different . In the virtual machine when performing an interpreter , usually relatively close to the conceptual model , but after the JIT compiler , it is the virtual machine main way to execute code, assigned a null value of the operation will be eliminated after a JIT compiler optimization , this time the variable is set to null does not make sense. After the bytecode JIT compiled to native code, the enumeration of GC Roots also have significant differences with the run-time interpretation , in view of the previous example, after the above code when JIT compiler through, can be performed correctly System.gc recovery of lost memory, without having to change the way the code three.

About the local variable table, there is little actual development may have an impact, that is, class variables like local variables exist before the introduction in the preparation phase, there is a class variable assignment process twice, once in the preparation stage, giving the system an initial value, Also at initialization time, given the initial value defined by the program. Therefore, there is no real-time during the initialization phase programmers assigned to the class variable does not matter, class variables still have a defined initial value. However, local variables are not the same, if a local variable is defined but not assigned an initial value can not be used , do not think under any circumstances exist in Java as an integer variable defaults to 0, boolean defaults to the default value so false and so on. Time compilation or by following the effect produced by the code generated manually bytecode embodiment, the virtual machine will be found resulting class code check bytes failed to load.

Operand stack

Also known as operand stack stack operation, it is a last in, first out stack. The same as the local variable table, the maximum depth of the operand stack are written into the code at compile time max_stacks property data items. Each operand stack elements can be arbitrary Java data types, including long and double. 32 occupied by data type stack capacity 1,64-bit data type capacity occupied by the stack 2. In the process performed at any time, the depth of the operand stack max_stacks will not exceed the maximum value set data items.

When a method has just started, this method operand stack is empty, during the execution of the method, have a variety of bytecode instructions and writes the extracted content to the operand stack, and the stack is stack operations. For example, when doing an arithmetic operation is performed by the operand stack, or other methods when calling is carried out parameters passed via the operand stack.

For chestnut, integer addition operation bytecode instructions iadd two elements of the operand stack has been stored in the stack closest two int values ​​at run time, when this instruction is executed, the value will both int and adding the stack, and then the added result onto the stack.

Data Type Operand stack elements must exactly match the sequence of bytecode instructions, program code, when compiled, the compiler should strictly guarantee this, in the data flow analysis were also verified in a verification stage it again. Iadd instruction to an example, the instructions for integer addition, when it is executed, the data type of the stack closest to the two elements must be an int, a float can not appear a long usage command iadd added.

An additional model concept, two stack virtual machine as an element of the frame is completely independent. But to achieve most of the virtual machine's will do some optimization, so that two stack frame appears partially overlap. Let the following stack frame number part above the stack frame operand stack local variable table portions overlap, can share a portion of the data is performed such method calls, no additional copy transfer parameters. Java virtual machine interpretation execution engine called stack-based execution engine, which stack is referred to the operand stack.

Dynamic Link

Each stack frame contains a pointer to the stack frame run-time constants associated method references, hold the reference is to support the process of dynamic linking method call. class file contains a lot of symbolic reference, the method call instruction byte code symbol on a reference constant pool directed to a method as a parameter. These symbols will translate to quote part of the stage or the class loader used for the first time a direct reference, this transformation become static resolution. Another part will be converted directly referenced during each operation, this part is called dynamic linking.

Methods return address

When a method begins execution, there are only two ways to exit this method. The first way is to perform any engine encounters a bytecode instruction method returns, this time there may be a method return value to the caller upper (called the current method is a method called caller), and whether there is a return value the return value will be determined according to what type of method encounters a return instruction, which quit method is called normal completion exports.

Another way is to exit, an exception is encountered during execution of the method, and this exception is not handled in vivo, whether abnormality generated inside the Java virtual machine, or throw exception code is generated bytecode instructions, As long as the exception table this method does not search for an exception handler to match, it will cause the method to quit, quit this kind of method is called abnormal completion exports. One way to use the exception to complete the export way to the exit, will not give it the upper caller produce any return value.

Regardless of the exit way, after the method exits, need to return to the position method is called, the program can continue, you may need some of the information stored in the stack frame when the method returns, to help restore its upper method of execution status. In general, when the method exits normally, the value of the counter PC of the caller as the return address, the stack frame could save the counter values. A method while abnormal exit, the return address is to be determined by the exception handler table, the stack frame information in this section is generally not saved.

Method exit process is actually equivalent to the current stack frame of the stack, so the operation may be performed when there exit: local variables and operand stack table top recovery method, the return value (if any) into the caller operand stack the stack frame, the adjustment value of the counter PC to point to the method call instruction one day later instructions and the like.

extra information

Virtual Machine Specification allows increasing the specific virtual machine implementation some information not described in the specification of the stack frame, for example, associated with the debugging information, this information is completely dependent on the specific part of the virtual machine to achieve, in the actual development, will generally dynamic connection. The method of the return address and other additional information all into one category, called stack frame information.
 

Guess you like

Origin blog.csdn.net/qq_37113604/article/details/90572841