Execution engine, escape analysis, JIT just-in-time compilation

What is the execution engine?

The execution engine is a set of subsystems of the JVM. When the program is running, the execution engine starts to work; the conceptual model of the virtual machine bytecode execution engine is formulated in the Java Virtual Machine Specification, and this conceptual model becomes the execution of various virtual machines The uniform appearance of the engine (Facade). In different virtual machine implementations, when the execution engine executes Java code, there may be two methods of interpreted execution (executed by the interpreter) and compiled execution (local code execution generated by the just-in-time compiler), or both , And may even include several different levels of compiler execution engines. But from the appearance point of view, the execution engines of all Java virtual machines are the same: the input is a bytecode file, the processing process is the equivalent process of bytecode analysis, and the output is the execution result.

Two interpreters for java

Why are there two kinds of interpreters in java? At the beginning of java, java only had a bytecode interpreter, but the bytecode interpreter was completed by a C++ program, that is, the program we write must be generated by C++ to be recognized by the computer Hard-coded, and this can’t be said that the operating efficiency is very low. It is certainly not directly generated hard-coded with high efficiency. In this bytecode interpreter mode, many C++ programmers actually look down on JAVA programmers. You can write by JAVA programmers. The code requires C++ to help you complete the hard-coded interpretation and execution. In fact, this situation is more prominent in foreign countries; so some big cows later wondered whether I can compile it directly into hard-coded by myself without converting it into C++ Hard-coded, so the template interpreter for java was born later; then, is the template interpreter used by default for my current java or bytecode interpreter, in fact, our java uses bytecode + template interpreter by default, It will be discussed below.

Just-in-time compilation

Bytecode interpreter

java bytecode -> C++ code -> hard code (05 06 07)
such as:

public class T0806 {
    
    

    public static void main(String[] args) {
    
    
        T0806_1 t = new T0806_1();
        System.out.println(t);
    }

}

class T0806_1{
    
    

    public T0806_1(){
    
    
        System.out.println("init ...");
    }
}

The bytecode generated by the main method is:

0 new #2 <com/bml/t0816/T0806_1>
 3 dup
 4 invokespecial #3 <com/bml/t0816/T0806_1.<init>>
 7 astore_1
 8 getstatic #4 <java/lang/System.out>
11 aload_1
12 invokevirtual #5 <java/io/PrintStream.println>
15 return

So how does the bytecode interpreter work in C++?
In fact, C++ explains the execution for us like this:
explanation line by line, which is a for or while loop:

whil(条件){
    
    
    char code = xxx;
    switch(code){
    
    
    case new:
        ....
    break;
    case dup:
        ....
        break;
    }
   
}

In fact, it is to read each line of our code, and perform corresponding operations according to the obtained bytecode instructions. Here, I intercepted the code fragments (c++) of the openjdk underlying code using the bytecode interpreter:

CASE(_new): {
    
    
        u2 index = Bytes::get_Java_u2(pc+1);
        ConstantPool* constants = istate->method()->constants();
        if (!constants->tag_at(index).is_unresolved_klass()) {
    
    
          // Make sure klass is initialized and doesn't have a finalizer
          Klass* entry = constants->slot_at(index).get_klass();
          assert(entry->is_klass(), "Should be resolved klass");
          Klass* k_entry = (Klass*) entry;
          assert(k_entry->oop_is_instance(), "Should be InstanceKlass");
          InstanceKlass* ik = (InstanceKlass*) k_entry;
          if ( ik->is_initialized() && ik->can_be_fastpath_allocated() ) {
    
    
 ……

In fact, the principle is described in the above example; we are not here to study the opdnjdk source code, the source code is posted to clarify the principle.

Template interpreter

What needs to be done is: java bytecode -> hard code
1. Apply for a piece of memory, which is readable, writable and executable (Although the bottom layer of the mac is uninx, but for protection, the mac cannot run JIT, that is It is said that mac cannot apply for executable memory space);
2. If we encounter our new keyword or dup, then this is a bytecode instruction, then the template interpreter will directly take the corresponding hard code of new or dup;
3. Write the hard-coded instructions for processing new or other bytecode instructions directly into the applied memory space;
4. Apply for a function pointer, and use this function pointer to execute this memory (function pointer -> pointer to execute function );
5. When the program is called, it can be called directly through the pointer of this function.

Two interpreter execution processes

Insert picture description here
The above figure is a graphical representation of the two interpreters. The difference from the previous description is represented by graphics. If it is a bytecode interpreter, then hardcoded will be generated through the code in C++, and the template interpreter will directly Generate hard code.

Three operating modes of JIT (just-in-time compilation)

-Xint: pure bytecode interpreter mode
-Xcomp: pure template interpreter mode
-Xmixed: bytecode interpreter + template interpreter mode
So what kind of interpreter my jdk uses by default? What kind of interpreter is efficient?
In fact, jdk uses the -Xmixed interpreter by default, which is a mixed interpreter. There are both bytecode interpreters and template interpreters. The bytecode interpreter is used in the early stage of the program, and the template interpretation is used in the later stage. The device is actually controlled by parameters, this will be described later;
we execute in cmd:

$ java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

See that our jdk runs in server mode and is in mixed mode, so our jvm runs in bytecode interpreter + template interpreter. Among the
above three modes, the pure template interpreter runs more efficiently. But the nuclear mixed is not much different, and when the program is small, the bytecode is faster, and the pure template interpreter is used when the program is very large, so the mixed interpreter is the best overall performance;
if you need to use non For the default interpreter, just add the corresponding interpreter parameters to the startup parameters.

Two just-in-time compilers in java

1. C1 compiler

-Client mode is started, the C1 compiler is started by default. What are the characteristics?
1. Less data needs to be collected, that is, the conditions for triggering just-in-time compilation are looser
2. The built-in compilation optimization optimization points are less
3. Compile time is less CPU-intensive than C2, and the result is generated after compilation The code execution efficiency is lower than C2

2. C2 compiler

-server mode starts. What are the characteristics?
1. More data to be collected
2. CPU is very consuming during compilation
3. There are many points of compilation optimization
4. The code generated by compilation is highly efficient

3. Hybrid compilation

The current -server mode is not purely using C2 anymore. At the beginning of the program operation, because the data generated is less, execute C1 compilation at this time. After the program is executed for a period of time, enough data is collected and the C2 compiler is executed.

JIT is not available on Mac! Because Mac cannot apply for a readable, writable and executable memory block
, the code generated by just-in-time compilation is used by the template compiler

Just-in-time compilation start trigger conditions

The smallest unit of just-in-time compilation is not a function or method, but a code block (such as for while, etc.).
We can use commands to view the configuration of the number of executions of our timely compiler:

java -XX:+PrintFlagsFinal -version|grep CompileThreshold
     intx CompileThreshold                          = 10000           {
    
    pd product}
    uintx IncreaseFirstTierCompileThresholdAt       = 50              {
    
    product}
     intx Tier2CompileThreshold                     = 0               {
    
    product}
     intx Tier3CompileThreshold                     = 2000            {
    
    product}
     intx Tier4CompileThreshold                     = 15000           {
    
    product}
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

In Client compiler mode, the default value of CompileThreshold is 1500. In
Server compiler mode, the default value of CompileThreshold is 10000.
This is the default value of jdk. We can adjust this parameter for jvm tuning, but we must pay special attention. It is not very clear. So don't modify the default value.

In this mode, for example, one of our code blocks is executed under the conditions of just-in-time compilation, but if it is not executed for a long time, the conditions of template interpretation will be introduced. This is the concept of heat decay; a
very simple example It is our member who has activated a certain app. If you don’t renew after expiration, then its growth value will decay, the same reason, so if we set the program to execute 3000 times, then after exiting the template interpreter , Will decay down at a rate of 2 times. To achieve the previous conditions, it will be executed twice as much.

Hot code buffer

What is the hot code? The hot code is our hard code, that is, the hard code generated by the template interpreter. After the template interpreter is enabled, it is impossible to generate a hard code for us every time, then it will cache it, so where is the cache? ?
Hot code is cached in our method area; then in the method area, will it overflow? Is oom abnormal?
In fact, it will not appear. It has an elimination mechanism inside it, which is similar to the elimination mechanism of redis, LRU, so there will be no OOM exceptions.
So how big is the hot code cache? The default size of jvm is as follows: The
code cache size in server compiler mode starts at 2496KB and the code cache size in
client compiler mode starts at 160KB.
Note: jdk defaults to server mode under 64, without client mode, client mode is
Only under 32-bit machines, specify to enable client mode with java -client

$ java -XX:+PrintFlagsFinal -version|grep CodeCache
    uintx CodeCacheExpansionSize                    = 65536           {
    
    pd product}
    uintx CodeCacheMinimumFreeSpace                 = 512000          {
    
    product}
    uintx InitialCodeCacheSize                      = 25559042496k)         {
    
    pd product}
     bool PrintCodeCache                            = false           {
    
    product}
     bool PrintCodeCacheOnCompilation               = false           {
    
    product}
    uintx ReservedCodeCacheSize                     = 251658240       {
    
    pd product}
     bool UseCodeCacheFlushing                      = true            {
    
    product}
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

Therefore, the CodeCache parameter is also one of our tuning, but it is also necessary to tune the system architecture, otherwise it is better to use the default, tuning can adjust these two parameters:
InitialCodeCacheSize
ReservedCodeCacheSize

How does just-in-time compilation work?

In fact, just-in-time compilation and GC both use the internal thread VM_THREAD of jvm. It maintains a queue internally. When a request thread joins in the queue, it will run. And just-in-time compilation is executed asynchronously, so just-in-time compilation is executed. How many threads are there and how to tune them?

$ java -XX:+PrintFlagsFinal -version|grep CICompilerCount
     intx CICompilerCount                           = 2               {
    
    product}
     bool CICompilerCountPerCPU                     = true            {
    
    product}
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)

CICompilerCount looks at this parameter, that is, there are only 2 internal threads of our just-in-time compilation. We can use this parameter to tune;
here is a case we encountered many years ago, when our banking system used hot and cold Mode, that is, our system has been running on the hot machine, and the cold machine is used as disaster recovery. When there is a problem with the hot machine and switch to the cold machine room, a large number of requests are slow and the system is stuck, resulting in a failure rate of transactions. It was very high, which caused the system to hang directly at the end;
I checked this problem for a long time and did not solve the problem. The configuration of the two machines are exactly the same, the program configuration is exactly the same, the system parameters have been adjusted to the optimal, but it has not been checked, and I accidentally later Seeing the hot code cache area of ​​JVM, it suddenly became clear;
because our hot engine has cached a large amount of hot code, it can withstand large concurrency, and the cold engine has not executed the request, so suddenly large concurrent transactions will come in and execute Slow, when the concurrency reaches a certain level, it cannot be processed immediately and downtime occurs.

Why is Java a semi-interpreted and semi-compiled language

1. Javac compiles, java runs;
2. Bytecode interpreter interprets and executes, template interpreter compiles and executes

Escape analysis

The word escape analysis can be divided into two words to understand: escape, analysis (escape analysis is enabled by default)

Under what circumstances is there an escape, and under what circumstances need to be analyzed?
What is escape?
To put it bluntly, it is shared variables, return values, parameters, etc. In one sentence, this variable is not local. For
example, the return value of our method. When the method is executed, the return value escapes, and we have no way to control it. Shared variables also make sense. To sum up, there are two situations:
outside the method and outside the
thread.
These two exist escapes.
What is non-escape?
That is, the scope of the object is local, so there is no way to escape.
Analysis:
Analysis is a technical method. Why do you want to analyze the escape of the object? ?
Because it needs to be optimized, optimize the execution of the code.
Based on escape analysis, JVM has developed three optimization techniques

Scalar substitution

Scalar: can not be divided, the basic data type in java (the basic data type in 8) is scalar
Aggregate amount: can be divided, object

class T0806_1{
    
    
     public static int x = 2;
     public static int y =3;
    public T0806_1(){
    
    
        System.out.println("init ...");
    }
}


class T2{
    
    
    public static void main(String[] args) {
    
    

        System.out.println(T0806_1.x);
        System.out.println(T0806_1.y);
    }
}

We pay attention

System.out.println(T0806_1.x);
System.out.println(T0806_1.y);

These two lines of code are optimized in the escape analysis. When running, what we actually see is

System.out.println(2);
System.out.println(3);

This is scalar substitution

Lock elimination

What is lock elimination is that this lock is useless at all. When jvm is doing escape analysis, it performs lock elimination. For
example, we look at the following code:

synchronized (new Object()){
    
    
    System.out.println(T0806_1.x);
    System.out.println(T0806_1.y);
}

In fact, in the method, the synchronized has no effect at all, it is a useless lock. Therefore, the lock is eliminated when the code is executed, that is, there is no synchronized code at all, and it is actually:

System.out.println(T0806_1.x);
 System.out.println(T0806_1.y);

This is lock elimination. JVM eliminates it during escape analysis.

Allocation on the stack

If escape analysis is enabled, the allocation on the stack exists.
Pass code test

对象在堆区分配
对象在虚拟机栈上分配

-XX:+/-DoEscapeAnalysis(这个参数就是开启栈上分配)

If it is empty, we can test the allocation on the lower stack. We can create 10w objects. Use the HSDB tool to check whether there is any allocation by opening and closing the allocation on the stack.

Guess you like

Origin blog.csdn.net/scjava/article/details/108603633