JVM Detailed Explanation - Execution Engine

If you are interested in learning more about it, please visit my personal website: Yetong Space

1: Introduction to Execution Engine

"Virtual machine" is a concept relative to "physical machine". Both machines have code execution capabilities. The difference is that the execution engine of a physical machine is directly built on the processor, cache, instruction set, and operating system levels. , while the execution engine of the virtual machine is implemented by software itself, so the structure of the instruction set and execution engine can be customized without being restricted by physical conditions, and it can execute those instruction set formats that are not directly supported by the hardware.

The main task of the JVM is to load bytecodes into it, but bytecodes cannot run directly on the operating system, because bytecode instructions are not equivalent to local machine instructions, and what it contains are only some that can Bytecode instructions, symbol tables, and other auxiliary information recognized by the JVM. Then, if you want to run a Java program, the task of the execution engine is to interpret/compile bytecode instructions into local machine instructions on the corresponding platform. In simple terms, the execution engine in the JVM acts as a translator from high-level language to machine language.

From the appearance point of view, the input and output of the execution engine of all Java virtual machines are consistent: the input is the bytecode binary stream, the processing process is the equivalent process of bytecode parsing and execution, and the output is the execution result.

insert image description here

Two: Understanding of compilation and interpretation

Before talking about the compilation and interpretation of the JVM, you can first look at the compilation and interpretation at the language level.

type principle advantage shortcoming
compiled language Through a special compiler, all source codes are converted into platform-specific machine codes at one time, and exist in the form of executable files After compiling once, it can run without the compiler, and the running efficiency is high Poor portability, not flexible enough
interpreted language Through a dedicated interpreter, some or all of the source code can be converted into machine code for a specific platform as needed Cross-platform is good, through different interpreters, the same source code is interpreted into machine code under different platforms. Converting while executing, the efficiency is low

It is often misunderstood that compiled languages ​​are not cross-platform. C language is a compiled language, but C programs can also be run on windows, and C programs can also be run on linux, because they are compiled into exe files for execution on windows, and can also be compiled into corresponding executable files and then executed on linux. So why is it called not cross-platform?

In fact, compiled languages ​​cannot be cross-platform in two ways:

  • Executable programs cannot be cross-platform: because different operating systems have completely different requirements for the internal structure of executable files, and they are not compatible with each other.
  • The source code cannot be cross-platform: the functions, types, variables, etc. supported by different platforms may be different, and the source code written based on one platform generally cannot be directly run on another platform. Take C language as an example:
    • In C language, to pause the program, we can use the "sleep" function. The function is Sleep() under the Windows platform, and the time unit is milliseconds, while it is sleep() under the Linux platform, and the unit is seconds. It can be seen that the first letter of the two functions is different in case, and the parameter of Sleep() is milliseconds, while the parameter of sleep() is seconds, and the unit is different.
    • Although C languages ​​on different platforms support the long type, the length of bytes occupied by the long type is different on different platforms. For example, the long under the Windows 64-bit platform occupies 4 bytes, but the long under the Linux 64-bit platform occupies 8 bytes. If you assign an 8-byte value to a variable of type long when writing code on the Linux 64-bit platform, there is no problem at all, but if it is on the Windows platform, it will cause numerical overflow and cause the program to run incorrectly result.

It can be seen that to achieve cross-platform for an interpreted language, it is necessary to deal with platform compatibility at the code level, but this is very troublesome.

Back to the JVM, the figure below shows the execution flow of the Java program. As you can see from the figure, there are two compilations in the process. The first time is to compile the class file from the java file, and the second time is to compile the class file by the JIT compiler. These two compilations are also called front-end compilation and back-end compilation respectively.

  • Front-end compilation: related to the source language, not to the target machine (.java -> .class).
  • Back-end compilation: It has nothing to do with the source language, it has nothing to do with the target machine (.class -> machine instructions).
    insert image description here
    As for whether to use the Java interpreter or the JIT compiler after passing the bytecode verifier, we will introduce it below.

Three: JIT compiler

JIT (Just In Time), that is, just-in-time compilation, through JIT technology, can accelerate the execution speed of Java programs. So, how? We all know that Java is an interpreted language (or semi-compiled, semi-interpreted language). Java first compiles the source program into a platform-independent Java bytecode file (.class) through the compiler javac, and then interprets and executes the bytecode file by the JVM, thereby achieving platform independence. However, there are pros and cons. The essence of the interpretation and execution process of the bytecode is: the JVM first translates the bytecode into corresponding machine instructions, and then executes the machine instructions. Obviously, after interpreting and executing in this way, its execution speed must not be as good as directly executing binary bytecode files.

In order to improve the execution speed, JIT technology was introduced. When the JVM finds that a method or block of code runs particularly frequently, it considers it a "hot spot code". Then JIT will compile some "hot codes" into local machine-related machine codes, optimize them, and then cache the compiled machine codes for next use.

Some developers will be surprised, since the HotSpot VM has a built-in JIT compiler, why do you need to use an interpreter to "drag" the execution performance of the program? For example, JRockit VM does not contain an interpreter inside, and all bytecodes are compiled and executed by a just-in-time compiler.

  • When the program is started, the interpreter can take effect immediately, saving the compilation time and executing it immediately. In order for the compiler to function, it takes a certain amount of execution time to compile the code into native code. However, after compiling to native code, the execution efficiency is high. So although the execution performance of the program in the JRockit VM will be very efficient, the program will inevitably take longer to compile when it starts. For server-side applications, startup time is not the focus, but for those application scenarios that are concerned about startup time, it may be necessary to adopt an architecture where an interpreter and a just-in-time compiler coexist in exchange for a balance point. In this mode, when the Java virtual machine starts, the interpreter can take effect first, without having to wait for the JIT compiler to complete all compilations before executing, which can save a lot of unnecessary compilation time. Over time, the compiler comes into play, compiling more and more code into native code for higher execution efficiency.
  • When the memory resources in the program running environment are limited (such as in some embedded systems), the interpreter can be used to save memory, otherwise the compiler can be used to improve efficiency. In addition, if a "rare trap" occurs after compilation, you can fall back to interpreted execution through deoptimization.
  • It is said that JIT is faster than interpretation. In fact, it is said that "executing compiled code" is faster than "interpreter interpreting and executing". It does not mean that the action of "compiling" is faster than the action of "interpreting". No matter how fast JIT compilation is, it is at least slightly slower than interpreting once, and to get the final execution result, you have to go through a process of "executing the compiled code". Therefore, for code that is "executed only once", interpretation execution is always faster than JIT compilation execution. How can it be considered "code that is executed only once"? Roughly speaking, when these two conditions are met at the same time, it is strictly "executed only once": "only called once, such as the constructor of a class", "no loop". It can be said that the gain outweighs the gain by JIT compiling and executing the code that is only executed once. For code that is only executed a small number of times, the improvement in execution speed brought about by JIT compilation may not necessarily offset the overhead caused by the initial compilation. Only for frequently executed code, JIT compilation can guarantee positive benefits.

Pay attention to the subtle dialectical relationship between interpreted execution and compiled execution in the online environment. The load that the machine can withstand in the hot state is greater than that in the cold state. If the traffic in the hot state is used to cut the flow, the server in the cold state may die because it cannot carry the traffic. During the release process of the production environment, the release is performed in batches, divided into multiple batches according to the number of machines, and the number of machines in each batch accounts for at most 1/8 of the entire cluster. There was once such a failure case: a programmer released in batches on the publishing platform, and when entering the total number of releases, he mistakenly entered the number of releases into two batches. If it is in a hot state, half of the machines can barely carry the traffic under normal circumstances. However, since the newly started JVMs are interpreted and executed, hot code statistics and JIT dynamic compilation have not yet been performed. After the machines are started, the current 1/2 release is successful. All of the servers crashed at once, which indicates the presence of the JIT. — Ali Team

To trigger JIT compilation, you must first identify hot code. At present, the main hot spot code identification method is hot spot detection (Hot Spot Detection), which has the following two types:

  • Sample Based Hot Spot Detection: Periodically detect the stack tops of each thread, and if a method often appears on the top of the stack, it is considered a hot spot method. The advantage is simplicity, and the disadvantage is that it is impossible to accurately confirm the popularity of a method. It is easy to be disturbed by thread blocking or other reasons for hotspot detection.
  • Counter Based Hot Spot Detection: The virtual machine using this method creates a counter for each method, even a code block, and counts the execution times of the method. If a method exceeds the threshold, it is considered a hot spot method. Trigger JIT compilation.

The second type is used in the HotSpot virtual machine-the counter-based hotspot detection method, so it prepares two counters for each method: method call counter (recording the number of times a method is called) and backside counter (looping) Number of runs).

Four: AOT compiler

JDK9 introduced the AOT compiler (static ahead of time compiler, Ahead of Time Compiler). This is the opposite concept of just-in-time compilation. Just-in-time compilation refers to the process of converting bytecode into machine code that can be directly run on the hardware during the running of the program, and deploying it to the hosting environment. AOT compilation refers to converting bytecode into machine code before the program runs, so that the native code can be used directly when the program is running.

The advantages of AOT are obvious. The Java virtual machine loads the precompiled binary library and can execute it directly. There is no need to wait for the warm-up of the just-in-time compiler, reducing the bad experience of "running slowly at the first time" for Java applications.

But the shortcomings are also obvious. The dynamic nature of the Java language itself brings additional complexity to it, which affects the quality of the statically compiled code of the Java program. For example, dynamic class loading in the Java language, because AOT is compiled before the program runs, so this information cannot be obtained, so it will cause some problems.

In general, AOT compilers are definitely not as good as JIT compilers in terms of compilation quality. The purpose of its existence is to avoid the runtime performance consumption or memory consumption of the JIT compiler, or to avoid the early performance overhead of the interpreter.

In terms of running speed, the code compiled by the AOT compiler is slower than that compiled by JIT, but faster than the interpreted execution. In terms of compilation time, AOT is also a constant speed. Therefore, the existence of the AOT compiler is a strategy for the JVM to sacrifice quality for performance. Just like the Mixed mode is selected in the JVM's operating mode, the C1 compilation mode only performs simple optimization, while the C2 compilation mode performs more aggressive optimization. Make full use of the advantages of the two modes to achieve optimal operating efficiency.

Finally, the introduction of AOT in Spring 6, which was officially released in November 2022, means that the Spring ecosystem has officially introduced ahead-of-time compilation technology. issues such as recycling.

Guess you like

Origin blog.csdn.net/tongkongyu/article/details/129327219