Java JVM(1)-Into the JVM

Into the JVM

JVM is more difficult to learn than the Java application layer.Recommended prerequisite knowledge at the beginning: C/C++ (key), microcomputer Principles and interface technology, computer composition principles, operating systems, data structures and algorithms, and compilation principles (not recommended for students who have just completed JavaSE). If you do not master more than half of the recommended preparatory knowledge, it may be difficult to learn.

Development tools needed in this course: CLion, IDEA, Jetbrains Gateway

At this stage, we need to delve into the underlying execution principles of Java and understand the nature of Java program operation. Before starting, I recommend that everyone buy a copy of "In-depth Understanding of the Java Virtual Machine, Third Edition". This book describes the JVM in great detail:

Click to view image source

We introduced it at the beginning of the JavaSE stage. The essence of why our Java program can be cross-platform is because it runs on a virtual machine. Different platforms only need to install the Java virtual machine of the corresponding platform to run it. (included in JRE), all Java programs adopt unified standards, and the bytecode files (.class) compiled on any platform are the same. In the end, the compiled bytecode is actually handed over to the JVM for processing. implement.

Click to view image source

It is precisely thanks to this unified specification that in addition to Java, there are many JVM languages, such as Kotlin, Groovy, etc. Although their syntax is different from Java, the bytecode file finally compiled is the same as Java. The same specifications can also be handed over to the JVM.

Click to view image source

Therefore, JVM is a part that we need to pay attention to. By understanding the underlying operating mechanism of Java, our technology will be qualitatively improved.

Technical overview

First of all, we need to understand the specific definition of a virtual machine. The virtual machines we have come into contact with include virtual machines that install operating systems, and our Java virtual machines. However, they are oriented to different objects. The Java virtual machine is only a virtual machine oriented to a single application. machine, but it is the same as the system-level virtual machine we are exposed to. We can also allocate actual hardware resources to it, such as the maximum memory size, etc.

And the Java virtual machine does not use the traditional PC architecture. For example, the current HotSpot virtual machine actually uses 基于栈的指令集架构, and our traditional programming is generally a>基于寄存器的指令集架构, here we need to review the CPU structure in 计算机组成原理:

image-20230306164318560

Among them,AX, BX, CX, and DX are called data registers:

  • AX (Accumulator): Accumulation register, also called accumulator;
  • BX (Base): base address register;
  • CX (Count): counter register;
  • DX (Data): data register;

These registers can be used to transmit data and temporarily store data, and they can also be subdivided into an 8-bit high-order register and an 8-bit low-order register. In addition to these general functions, each of them also has its own exclusive responsibilities, such as AX A register dedicated to accumulation is also used more frequently.

SP and BP are also called pointer registers:

  • SP (Stack Pointer): Stack pointer register, used in conjunction with SS, used to access the top of the stack;
  • BP (Base Pointer): Base pointer register, which can be used as a relative base address position of SS. It can be used to directly access data in the stack;

SI and DI are also called index registers:

  • SI (Source Index): source index register;
  • DI (Destination Index): destination index register;

They are mainly used to store the offset of the storage unit within the segment. They can be used to implement a variety of addressing modes for memory operands, providing convenience for accessing the storage unit in different address forms.

Control register:

  • IP (Instruction Pointer): Instruction pointer register;
  • FLAG: flag register;

Segment register:

  • CS (Code Segment): code segment register;
  • DS (Data Segment): Data segment register;
  • SS (Stack Segment): stack segment register;
  • ES (Extra Segment): additional segment register;

Here we compare the differences in assembly instructions after compilation in C language under x86 architecture and under arm architecture:

int main() {
    
         //实现一个最简的a+b功能,并存入变量c
    int a = 10;
    int b = 20;
    int c = a + b;
    return c;
}
gcc -S main.c
	.file	"main.c"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc  ;rbp寄存器是64位CPU下的基址寄存器,和8086CPU的16位bp一样
	pushq	%rbp     ;该函数中需要用到rbp寄存器,所以需要先把他原来的值压栈保护起来
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp    ;rsp是64位下的栈指针寄存器,这里是将rsp的值丢给rbp,因为局部变量是存放在栈中的,之后会使用rbp来访问局部变量
	.cfi_def_cfa_register 6
	movl	$10, -12(%rbp)    ;将10存入rbp所指向位置-12的位置 ->  int a = 10;
	movl	$20, -8(%rbp)     ;将20存入rbp所指向位置-8的位置  -> int b = 20;
	movl	-12(%rbp), %edx   ;将变量a的值交给DX寄存器(32位下叫edx,因为是int,这里只使用了32位)
	movl	-8(%rbp), %eax    ;同上,变量b的值丢给AX寄存器
	addl	%edx, %eax        ;将DX和AX寄存器中的值相加,并将结果存在AX中  ->  tmp = a + b
	movl	%eax, -4(%rbp)    ;将20存入rbp所指向位置-4的位置  -> int c = tmp;与上面合在一起就是int c = a + b;
	movl	-4(%rbp), %eax    ;根据约定,将函数返回值放在AX   -> return c;
	popq	%rbp     ;函数执行完毕,出栈
	.cfi_def_cfa 7, 8
	ret      ;函数返回
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 7.5.0-6ubuntu2) 7.5.0"
	.section	.note.GNU-stack,"",@progbits

The result compiled under arm architecture (Apple M1 Pro chip) is:

    .section   __TEXT,__text,regular,pure_instructions
   .build_version macos, 12, 0    sdk_version 12, 1
   .globl _main                           ; -- Begin function main
   .p2align   2
_main:                                  ; @main
   .cfi_startproc
; %bb.0:
   sub    sp, sp, #16                     ; =16
   .cfi_def_cfa_offset 16
   str    wzr, [sp, #12]
   mov    w8, #10
   str    w8, [sp, #8]
   mov    w8, #20
   str    w8, [sp, #4]
   ldr    w8, [sp, #8]
   ldr    w9, [sp, #4]
   add    w8, w8, w9
   str    w8, [sp]
   ldr    w0, [sp]
   add    sp, sp, #16                     ; =16
   ret
   .cfi_endproc
                                        ; -- End function
.subsections_via_symbols

We found that under different CPU architectures, the actual assembly codes obtained are also different, and under the arm architecture there is no register structure the same as the x86 architecture, so we can only use different assembly instruction operations to accomplish. So this is also the reason why C language does not support cross-platform. We can only run our program on the corresponding platform after compiling the same code on different platforms. Java uses the JVM, which provides good platform independence (of course, the JVM itself is not cross-platform). After our Java program is compiled, it is not a program that can be run directly by the platform, but is run by the JVM. At the same time, as we said earlier, JVM (such as HotSpot virtual machine) actually uses 基于栈的指令集架构. It does not rely on registers, but uses more operation stacks to complete. This not only It is simpler to design and implement, and can be implemented across platforms more conveniently, without relying much on hardware support.

Here we decompile and view a class:

public class Main {
    
    
    public int test(){
    
        //和上面的例子一样
        int a = 10;
        int b = 20;
        int c = a + b;
        return c;
    }
}
javap -v target/classes/com/test/Main.class #使用javap命令对class文件进行反编译

Get the following results:

...
public int test();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=4, args_size=1
         0: bipush        10
         2: istore_1
         3: bipush        20
         5: istore_2
         6: iload_1
         7: iload_2
         8: iadd
         9: istore_3
        10: iload_3
        11: ireturn
      LineNumberTable:
        line 5: 0
        line 6: 3
        line 7: 6
        line 8: 10
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      12     0  this   Lcom/test/Main;
            3       9     1     a   I
            6       6     2     b   I
           10       2     3     c   I

We can see that after the Java file is compiled, assembly instructions similar to C language will also be generated, but these commands are all given to the JVM for execution (in fact, the virtual machine provides a The running environment of the computer (there are also things like program counters). The bottom is stored in the local variable (local variable) table, which represents the local variables that appear in this method. In fact, this is also among them, so we can use it in non-static methods. The keyword this is used in it, and the return value type, access permissions, etc. of the method are marked at the top. First, let’s introduce what the commands in the example mean:

  • bipush pushes a single-byte constant value to the top of the stack
  • istore_1 stores the int type value on the top of the stack into the second local variable
  • istore_2 stores the int type value on the top of the stack into the third local variable
  • istore_3 stores the int type value on the top of the stack into the fourth local variable
  • iload_1 pushes the second local variable to the top of the stack
  • iload_2 pushes the third local variable to the top of the stack
  • iload_3 pushes the fourth local variable to the top of the stack
  • iadd adds the two int type variables on the top of the stack and pushes the result onto the top of the stack
  • Return operation of ireturn method

For a detailed introduction list of instructions, please refer to Appendix C of "In-depth Understanding of Java Virtual Machines, Third Edition".

When the JVM runs bytecode, all operations basically revolve around two data structures, one is the stack (essentially a stack structure), and the other is the queue. If the JVM executes a certain instruction, the instruction needs to process the data. operation, then the operated data must be pushed onto the stack before the instruction is executed, and the JVM will automatically use the top data of the stack as the operand. If the data on the stack needs to be saved temporarily, it will be stored in the local variable queue.

Let’s start with the first instruction and read it down to display the method-related attributes:

descriptor: ()I     //参数以及返回值类型,()I就表示没有形式参数,返回值为基本类型int
flags: ACC_PUBLIC   //public访问权限
Code:
  stack=2, locals=4, args_size=1    //stack表示要用到的最大栈深度,本地变量数,堆栈上最大对象数量(这里指的是this)

We will explain the detailed attribute introduction of descriptor in the class structure later.

Next let's look at the instructions:

0: bipush        10     //0是程序偏移地址,然后是指令,最后是操作数
2: istore_1

This step actually uses bipush to push 10 to the top of the stack, and then uses istore_1 to store the current top data of the stack into the second part. variable, which is a, so this step performs the int a = 10 operation.

3: bipush        20
5: istore_2

Same as above, the operation performed here isint b = 20.

6: iload_1
7: iload_2
8: iadd

Here, the second and third local variables are put on the stack, that is, the values ​​of a and b are put on the stack. Finally, the operationiadd puts the two local variables on the stack. The values ​​are added, and the result remains on the top of the stack.

9: istore_3
10: iload_3
11: ireturn

stores the top data of the stack into the fourth local variable, which is c. The execution is int c = 30. Finally, the value of c is taken out and placed on the top of the stack using ireturn returns the top value of the stack, which is the return value of the method.

At this point, the method execution is completed.

In fact, we found that the commands executed by the JVM are basically push and pop, and most of the instructions have no operands. Traditional assembly instructions have one, two or even three operands. Compared with Java, The assembly instructions compiled by C will be more complex to execute, and the number of instructions to implement a certain function will be more. Therefore, the execution efficiency of Java is actually not as good as C/C++. Although it can be easily implemented across platforms, The performance is greatly reduced, so on Android, which has relatively demanding performance requirements, a customized version of the JVM is used, and it is a register-based instruction set architecture. In addition, in some cases, we can also use the JNI mechanism to call programs written in C/C++ through Java to improve performance (that is, native methods, using the native keyword)

present and future

With the changes of the times, there are various implementations of JVM, and we have to start with the original virtual machine.

The development history of virtual machines

In 1996, when Java 1.0 came out, the first commercial virtual machine, Sun Classic VM, began its mission. This virtual machine provided a Java interpreter, which read our class files, and finally looked like the above After getting the commands one by one, the JVM executes the instructions one by one. Although this operation method is very simple and easy to understand, its efficiency is actually very low. It is like playing Level 6 listening in your headphones. You must keep it in your mind at the same time and wait to ask questions before making a choice. The answer to the question is the same, but more importantly, the same code needs to be re-translated and executed every time.

At this time we need a more efficient way to run Java programs. With subsequent development, most mainstream JVMs now include just-in-timecompilers . The JVM will make judgments based on the current code. When the virtual machine finds that a certain method or code block is running particularly frequently, it will identify these codes as "hot code". In order to improve the execution efficiency of hot code, during runtime, the virtual machine will compile these codes into machine code related to the local platform and perform various levels of optimization. The compiler that completes this task is called a just-in-time compiler (Just-in-time compiler). In Time Compiler)

img

In JDK1.4, Sun Classic VM completely withdrew from the stage of history and was replaced by HotSpot VM, which is still in use today. It is currently the most widely used virtual machine and has the hot code detection technology and accurate memory management mentioned above. (The virtual machine can know the specific type of data at a certain location in the memory) and other technologies, and our subsequent chapters will be explained based on the HotSpot virtual machine.

The future of virtual machine development

In April 2018, Oracle Labs released the latest GraalVM, which is a brand-new virtual machine that enables all languages ​​to run uniformly in the virtual machine.

img

Graal VM is officially called "Universal VM" and "Polyglot VM". It is a cross-language full-stack virtual machine enhanced on the basis of the HotSpot virtual machine. It can be used as a running platform for "any language". Here "any "Language" includes Java, Scala, Groovy, Kotlin and other languages ​​based on the Java virtual machine, as well as C, C++, Rust and other LLVM-based languages, and also supports other languages ​​​​such as JavaScript, Ruby, Python and R, etc. Graal VM can mix these programming languages ​​without additional overhead, supports the mixing of each other's interfaces and objects in different languages, and can also support the use of local library files that have been written in these languages.

The basic working principle of Graal VM is to convert the source code of these languages ​​​​(such as JavaScript) or the intermediate format after the source code is compiled (such as LLVM bytecode) into an intermediate representation (Intermediate Representation, IR) that can be accepted by Graal VM through the interpreter. ), for example, designing an interpreter to specifically convert the bytecode output by LLVM to support C and C++ languages. This process is called "Specialized" (also often called Partial Evaluation). Graal VM provides the Truffle toolset to quickly build an interpreter for a new language, and used it to build a high-performance LLVM bytecode interpreter called Sulong.

The latest SpringBoot currently provides a local running solution: https://docs.spring.io/spring-native/docs/current/reference/htmlsingle/

Spring Native supports usingGraalVMnative image compiler to compile Spring applications into native executable file.

Native images enable simpler and more sustained hosting for many types of workloads than Java virtual machines. Includes microservices, functional workloads ideal for containers, andKubernetes

Using native images provides key benefits such as instant startup, instant peak performance, and reduced memory consumption.

The GraalVM native project expects to improve some of the shortcomings and trade-offs over time. Building native images is a laborious process that is slower than regular applications. A warmed-up native image runs with fewer optimizations. Finally, it is less mature than the JVM and behaves differently.

The main differences between the regular JVM and this native imaging platform are:

  • Perform static analysis of the application from the main entry point, at build time.
  • Unused parts will be removed at build time.
  • Reflection, resources and dynamic proxies require configuration.
  • Classpath is fixed at build time.
  • There is no lazy loading of classes: everything shipped in the executable will be loaded into memory on startup.
  • Some code will be run at build time.
  • Some aspects of Java applications have limitations that are not fully supported.

The goal of this project is to incubate support for Spring Native, a replacement for the Spring JVM and provide native deployment options designed to be packaged in lightweight containers. In practice, the goal is to support your Spring applications on this new platform with almost no modifications.

advantage:

  1. Start immediately, usually the startup time is less than 100ms
  2. Lower memory consumption
  3. Independent deployment, no JVM required anymore
  4. The same peak performance consumes less memory than the JVM

shortcoming:

  1. Long build time
  2. Only supports new Springboot versions (2.4.4+)

Manually compile JDK8

The most important thing in learning JVM is to study the underlying C/C++ source code. We first need to set up a test environment to facilitate us to debug the underlying source code later. However, there are many pitfalls in the compilation step. Please make sure that the environment in the tutorial is consistent, especially the compilation environment. The version cannot be too high, because JDK8 is an earlier version, otherwise you will encounter all kinds of strange things. question.

Environment configuration

  • Operating system: Ubuntu 20.04 Server
  • Hardware configuration: i7-4790 4C8T/ 16G memory / 128G hard drive (you cannot use Raspberry Pi or ARM chip Mac virtual machine, the higher the configuration, the better, otherwise it will burst)
  • Debugging tool: Jetbrains Gateway (the server runs the CLion Backend program and the interface is displayed on the Mac)
  • OpenJDK source code: https://codeload.github.com/openjdk/jdk/zip/refs/tags/jdk8-b120
  • Compiler Environment:
    • gcc-4.8
    • g+±4.8
    • make-3.81
    • openjdk-8

Start tossing

The first option is to install the Ubuntu 20.04 Server system on our test server and log in to the server via ssh:

Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-96-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sat 29 Jan 2022 10:33:03 AM UTC

  System load:  0.08               Processes:               156
  Usage of /:   5.5% of 108.05GB   Users logged in:         0
  Memory usage: 5%                 IPv4 address for enp2s0: 192.168.10.66
  Swap usage:   0%                 IPv4 address for enp2s0: 192.168.10.75
  Temperature:  32.0 C


37 updates can be applied immediately.
To see these additional updates run: apt list --upgradable


Last login: Sat Jan 29 10:27:06 2022
nagocoler@ubuntu-server:~$ 

First install some basic dependencies:

sudo apt install build-essential libxrender-dev xorg-dev libasound2-dev libcups2-dev gawk zip libxtst-dev libxi-dev libxt-dev gobjc

Then we first configure the JDK compilation environment. First, install the 4.8 version of gcc and g++. However, the latest source does not have this version. We first import the old version of the software source:

sudo vim /etc/apt/sources.list

Add the old version source address at the bottom and save:

deb http://archive.ubuntu.com/ubuntu xenial main
deb http://archive.ubuntu.com/ubuntu xenial universe

Then update the apt source information and install gcc and g++:

sudo apt update
sudo apt install gcc-4.8 g++-4.8

Then configure:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 100

Finally check whether the version is version 4.8:

nagocoler@ubuntu-server:~$ gcc --version
gcc (Ubuntu 4.8.5-4ubuntu2) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nagocoler@ubuntu-server:~$ g++ --version
g++ (Ubuntu 4.8.5-4ubuntu2) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Then install make 3.81 version, which needs to be downloaded from the official website:

wget https://ftp.gnu.org/gnu/make/make-3.81.tar.gz

After downloading, unzip it and enter the directory:

tar -zxvf make-3.81.tar.gz 
cd make-3.81/

Then let’s modify the code and open the glob/glob.c file:

...
#ifdef  HAVE_CONFIG_H
# include <config.h>
#endif

#define __alloca alloca   <- 添加这一句
/* Enable GNU extensions 
...

Then configure and complete compilation and installation:

bash configure
sudo make install

After the installation is complete, change make to version 3.81:

nagocoler@ubuntu-server:~/make-3.81$ make -verison
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

Since some codes in the JDK are written in Java, we also need to install a startup JDK. The startup JDK can be the current version or a lower version. For example, if we want to compile the source code of JDK8, then we can use JDK7 or JDK8 as the startup JDK. Compile some java files in the source code. Here we choose to install OpenJDK8 as the startup JDK:

sudo apt install openjdk-8-jdk

In this way, our system environment is ready, and then we need to download the source code of OpenJDK8 (already placed on the network disk) and decompress it:

unzip jdk-jdk8-b120.zip

Then we need to install JetBrains Gateway to import the project on our server. Here we use the CLion backend and wait for the remote backend to be downloaded. In this way, although there is no graphical interface on our Linux server, we can still use tools such as IDEA and CLion, but There are only back-end programs on the server, and the interface is provided by the front-end program on our computer (currently this function is still in the Beta stage and does not support arm architecture Linux servers). The whole process may take 5-20 minutes depending on the server configuration.

After completion, it is very convenient for us to operate. The interface is actually similar to IDEA. We open the terminal and start configuration:

bash configure --with-debug-level=slowdebug --enable-debug-symbols ZIP_DEBUGINFO_FIELS=0

After the configuration is completed, confirm again whether it is consistent with the configuration information in the tutorial:

Configuration summary:
* Debug level:    slowdebug
* JDK variant:    normal
* JVM variants:   server
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 64

Tools summary:
* Boot JDK:       openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~20.04-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)  (at /usr/lib/jvm/java-8-openjdk-amd64)
* C Compiler:     gcc-4.8 (Ubuntu 4.8.5-4ubuntu2) version 4.8.5 (at /usr/bin/gcc-4.8)
* C++ Compiler:   g++-4.8 (Ubuntu 4.8.5-4ubuntu2) version 4.8.5 (at /usr/bin/g++-4.8)

Build performance summary:
* Cores to use:   3
* Memory limit:   3824 MB
* ccache status:  not installed (consider installing)

WARNING: The result of this configuration has overridden an older
configuration. You *should* run 'make clean' to make sure you get a
proper build. Failure to do so might result in strange build problems.

Then we need to modify several files, otherwise the compilation will fail after a while. The first ishotspot/make/linux/Makefilefile:

原有的 SUPPORTED_OS_VERSION = 2.4% 2.5% 2.6% 3%
修改为 SUPPORTED_OS_VERSION = 2.4% 2.5% 2.6% 3% 4% 5%

Adhesion ishotspot/make/linux/makefiles/gcc.makeText:

原有的 WARNINGS_ARE_ERRORS = -Werror
修改为 #WARNINGS_ARE_ERRORS = -Werror

Adhesion isnashorn/make/BuildNashorn.gmkText:

  $(CP) -R -p $(NASHORN_OUTPUTDIR)/nashorn_classes/* $(@D)/
  $(FIXPATH) $(JAVA) \
原有的 -cp "$(NASHORN_OUTPUTDIR)/nasgen_classes$(PATH_SEP)$(NASHORN_OUTPUTDIR)/nashorn_classes" \
修改为  -Xbootclasspath/p:"$(NASHORN_OUTPUTDIR)/nasgen_classes$(PATH_SEP)$(NASHORN_OUTPUTDIR)/nashorn_classes" \
   jdk.nashorn.internal.tools.nasgen.Main $(@D) jdk.nashorn.internal.objects $(@D)

OK, the modification is completed, and then we can start compiling:

make all

The entire compilation process will take approximately 10-20 minutes, please be patient. Prompt after the build is completed:

----- Build times -------
Start 2022-01-29 11:36:35
End   2022-01-29 11:48:20
00:00:30 corba
00:00:25 demos
00:02:39 docs
00:03:05 hotspot
00:00:27 images
00:00:17 jaxp
00:00:31 jaxws
00:03:02 jdk
00:00:38 langtools
00:00:11 nashorn
00:11:45 TOTAL
-------------------------
Finished building OpenJDK for target 'all'

As long as you follow our tutorial step by step, don't miss anything, it should be completed directly. Of course, it is inevitable that some students may have strange problems. Come on, work slowly, and you will always succeed~

Then we can create a test configuration. First open the settings page and find自定义构建目标:

image-20230306164504510

Click应用, then open the running configuration and add a new custom configuration:

image-20230306164521873

Select the java program we compiled, then test-version to view the version information, and remove the build below.

Then run it directly:

/home/nagocoler/jdk-jdk8-b120/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java -version
openjdk version "1.8.0-internal-debug"
OpenJDK Runtime Environment (build 1.8.0-internal-debug-nagocoler_2022_01_29_11_36-b00)
OpenJDK 64-Bit Server VM (build 25.0-b62-debug, mixed mode)

Process finished with exit code 0

We can modify the working directory to another place, then we create a Java file and complete the compilation, and then test whether it can run using our compiled JDK:

image-20230306164535518

Write a Java program in this directory and compile it:

public class Main{
    
    
        public static void main(String[] args){
    
    
                System.out.println("Hello World!");
        }       
}       
nagocoler@ubuntu-server:~$ cd JavaHelloWorld/
nagocoler@ubuntu-server:~/JavaHelloWorld$ vim Main.java
nagocoler@ubuntu-server:~/JavaHelloWorld$ javac Main.java 
nagocoler@ubuntu-server:~/JavaHelloWorld$ ls
Main.class  Main.java

Click Run and get the result successfully:

/home/nagocoler/jdk-jdk8-b120/build/linux-x86_64-normal-server-slowdebug/jdk/bin/java Main
Hello World!

Process finished with exit code 0

We can also perform breakpoint debugging in the CLion front-end page. For example, if we test an entry point JavaMain, the JavaMain method in jdk/src/share/bin/java.c:

image-20230306164549328

Click the debug button in the upper right corner to successfully debug:

image-20230306164602205

At this point, manual compilation of OpenJDK8 on the Ubuntu system is completed.

Research on JVM startup process

We have completed the compilation of JDK8 and learned how to perform breakpoint debugging. Now we can study the startup process of JVM. First, we must make it clear that the startup entry of the virtual machine is located function, the whole process is divided into the following steps: jdk/src/share/bin/java.cJLI_Launch

  1. Configure JVM loading environment
  2. Parse virtual machine parameters
  3. Set thread stack size
  4. Execute JavaMain method

First, let’s take a lookJLI_LaunchHow the function is defined:

int
JLI_Launch(int argc, char ** argv,              /* main argc, argc */
        int jargc, const char** jargv,          /* java args */
        int appclassc, const char** appclassv,  /* app classpath */
        const char* fullversion,                /* full version defined */
        const char* dotversion,                 /* dot version defined */
        const char* pname,                      /* program name */
        const char* lname,                      /* launcher name */
        jboolean javaargs,                      /* JAVA_ARGS */
        jboolean cpwildcard,                    /* classpath wildcard */
        jboolean javaw,                         /* windows-only javaw */
        jint     ergo_class                     /* ergnomics policy */
);

You can see that there are many parameters at the entry point, including the current full version name, short version name, running parameters, program name, launcher name, etc.

First, some initialization operations and Debug information printing configuration will be performed:

InitLauncher(javaw);
DumpState();
if (JLI_IsTraceLauncher()) {
    
    
    int i;
    printf("Command line args:\n");
    for (i = 0; i < argc ; i++) {
    
    
        printf("argv[%d] = %s\n", i, argv[i]);
    }
    AddOption("-Dsun.java.launcher.diag=true", NULL);
}

The next step is to choose a suitable JRE version:

/*
 * Make sure the specified version of the JRE is running.
 *
 * There are three things to note about the SelectVersion() routine:
 *  1) If the version running isn't correct, this routine doesn't
 *     return (either the correct version has been exec'd or an error
 *     was issued).
 *  2) Argc and Argv in this scope are *not* altered by this routine.
 *     It is the responsibility of subsequent code to ignore the
 *     arguments handled by this routine.
 *  3) As a side-effect, the variable "main_class" is guaranteed to
 *     be set (if it should ever be set).  This isn't exactly the
 *     poster child for structured programming, but it is a small
 *     price to pay for not processing a jar file operand twice.
 *     (Note: This side effect has been disabled.  See comment on
 *     bugid 5030265 below.)
 */
SelectVersion(argc, argv, &main_class);

The next step is to create a JVM execution environment. For example, you need to determine whether the data model is 32-bit or 64-bit, and some configurations of the jvm itself are read and parsed in the jvm.cfg file:

CreateExecutionEnvironment(&argc, &argv,
                               jrepath, sizeof(jrepath),
                               jvmpath, sizeof(jvmpath),
                               jvmcfg,  sizeof(jvmcfg));

This function is only defined in the header file, and the specific implementation depends on different platforms. Then the shared library jvm.so will be dynamically loaded, and the relevant functions in jvm.so will be exported and initialized, and the function to start the JVM is also included:

if (!LoadJavaVM(jvmpath, &ifn)) {
    
    
    return(6);
}

For example, the implementation under the mac platform:

jboolean
LoadJavaVM(const char *jvmpath, InvocationFunctions *ifn)
{
    
    
    Dl_info dlinfo;
    void *libjvm;

    JLI_TraceLauncher("JVM path is %s\n", jvmpath);

    libjvm = dlopen(jvmpath, RTLD_NOW + RTLD_GLOBAL);
    if (libjvm == NULL) {
    
    
        JLI_ReportErrorMessage(DLL_ERROR1, __LINE__);
        JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror());
        return JNI_FALSE;
    }

    ifn->CreateJavaVM = (CreateJavaVM_t)
        dlsym(libjvm, "JNI_CreateJavaVM");
    if (ifn->CreateJavaVM == NULL) {
    
    
        JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror());
        return JNI_FALSE;
    }

    ifn->GetDefaultJavaVMInitArgs = (GetDefaultJavaVMInitArgs_t)
        dlsym(libjvm, "JNI_GetDefaultJavaVMInitArgs");
    if (ifn->GetDefaultJavaVMInitArgs == NULL) {
    
    
        JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror());
        return JNI_FALSE;
    }

    ifn->GetCreatedJavaVMs = (GetCreatedJavaVMs_t)
    dlsym(libjvm, "JNI_GetCreatedJavaVMs");
    if (ifn->GetCreatedJavaVMs == NULL) {
    
    
        JLI_ReportErrorMessage(DLL_ERROR2, jvmpath, dlerror());
        return JNI_FALSE;
    }

    return JNI_TRUE;
}

The last step is to initialize the JVM:

return JVMInit(&ifn, threadStackSize, argc, argv, mode, what, ret);

This is also determined by the platform. For example, the implementation under Mac is:

int
JVMInit(InvocationFunctions* ifn, jlong threadStackSize,
                 int argc, char **argv,
                 int mode, char *what, int ret) {
    
    
    if (sameThread) {
    
    
        //无需关心....
    } else {
    
    
      	//正常情况下走这个
        return ContinueInNewThread(ifn, threadStackSize, argc, argv, mode, what, ret);
    }
}

You can see that it finally entered a ContinueInNewThread function (implemented in java.c just now). This function will create a new thread for execution:

int
ContinueInNewThread(InvocationFunctions* ifn, jlong threadStackSize,
                    int argc, char **argv,
                    int mode, char *what, int ret)
{
    
    

    ...

      rslt = ContinueInNewThread0(JavaMain, threadStackSize, (void*)&args);
      /* If the caller has deemed there is an error we
       * simply return that, otherwise we return the value of
       * the callee
       */
      return (ret != 0) ? ret : rslt;
    }
}

then enters a function named ContinueInNewThread0. You can see that it passes the JavaMain function as a parameter, and the first definition of this function The parameter type is a function pointer:

int
ContinueInNewThread0(int (JNICALL *continuation)(void *), jlong stack_size, void * args) {
    
    
    int rslt;
    pthread_t tid;
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    if (stack_size > 0) {
    
    
      pthread_attr_setstacksize(&attr, stack_size);
    }

    if (pthread_create(&tid, &attr, (void *(*)(void*))continuation, (void*)args) == 0) {
    
    
      void * tmp;
      pthread_join(tid, &tmp);
      rslt = (int)tmp;
    } else {
    
    
     /*
      * Continue execution in current thread if for some reason (e.g. out of
      * memory/LWP)  a new thread can't be created. This will likely fail
      * later in continuation as JNI_CreateJavaVM needs to create quite a
      * few new threads, anyway, just give it a try..
      */
      rslt = continuation(args);
    }

    pthread_attr_destroy(&attr);
    return rslt;
}

In the end, the JavaMain function is actually executed in a new thread. Finally, let’s take a look at what is done in this function:

/* Initialize the virtual machine */
start = CounterGet();
if (!InitializeJVM(&vm, &env, &ifn)) {
    
    
    JLI_ReportErrorMessage(JVM_ERROR1);
    exit(1);
}

The first step is to initialize the virtual machine and exit directly if an error is reported. The next step is to load the main class (as for how to load a class, we will explain it later), because the main class is the entry point of our Java program:

/*
 * Get the application's main class.
 *
 * See bugid 5030265.  The Main-Class name has already been parsed
 * from the manifest, but not parsed properly for UTF-8 support.
 * Hence the code here ignores the value previously extracted and
 * uses the pre-existing code to reextract the value.  This is
 * possibly an end of release cycle expedient.  However, it has
 * also been discovered that passing some character sets through
 * the environment has "strange" behavior on some variants of
 * Windows.  Hence, maybe the manifest parsing code local to the
 * launcher should never be enhanced.
 *
 * Hence, future work should either:
 *     1)   Correct the local parsing code and verify that the
 *          Main-Class attribute gets properly passed through
 *          all environments,
 *     2)   Remove the vestages of maintaining main_class through
 *          the environment (and remove these comments).
 *
 * This method also correctly handles launching existing JavaFX
 * applications that may or may not have a Main-Class manifest entry.
 */
mainClass = LoadMainClass(env, mode, what);

Some Java programs that do not have a main method, such as JavaFX applications, will obtain ApplicationMainClass:

/*
 * In some cases when launching an application that needs a helper, e.g., a
 * JavaFX application with no main method, the mainClass will not be the
 * applications own main class but rather a helper class. To keep things
 * consistent in the UI we need to track and report the application main class.
 */
appClass = GetApplicationClass(env);

loading finished:

/*
 * PostJVMInit uses the class name as the application name for GUI purposes,
 * for example, on OSX this sets the application name in the menu bar for
 * both SWT and JavaFX. So we'll pass the actual application class here
 * instead of mainClass as that may be a launcher or helper class instead
 * of the application class.
 */
PostJVMInit(env, appClass, vm);

The next step is to get the main method in the main class:

/*
 * The LoadMainClass not only loads the main class, it will also ensure
 * that the main method's signature is correct, therefore further checking
 * is not required. The main method is invoked here so that extraneous java
 * stacks are not in the application stack trace.
 */
mainID = (*env)->GetStaticMethodID(env, mainClass, "main",
                                   "([Ljava/lang/String;)V");

Yes, in bytecodevoid main(String[] args) is represented as([Ljava/lang/String;)V We will introduce it in detail later. The next step is to call the main method:

/* Invoke main method. */
(*env)->CallStaticVoidMethod(env, mainClass, mainID, mainArgs);

After the call, our Java program starts running quickly until it reaches the last line of the main method and returns:

/*
 * The launcher's exit code (in the absence of calls to
 * System.exit) will be non-zero if main threw an exception.
 */
ret = (*env)->ExceptionOccurred(env) == NULL ? 0 : 1;
LEAVE();

At this point, the running process of a Java program ends, and the JVM will be destroyed in the final LEAVE function. We can perform breakpoint debugging to see if it is consistent with the conclusion we derived:

image-20230306164622940

Still using the test class we wrote before, first before calling, we see that before the main method is executed, the console does not output anything, then we execute this function, and then observe the changes in the console:

image-20230306164639620

It can be seen that after the execution of the main method is completed, the console also successfully outputs Hello World!

Continue to the next step. The entire Java program is executed and the exit status code is obtained0:

image-20230306164706976

Successful verification, and finally summarize the entire execution process:

image-20230306164716949

JNI calls native methods

Java also has a JNI mechanism, its full name is: Java Native Interface, which is the Java native interface. It allows Java code running within the Java virtual machine to interact with programs and libraries written in other programming languages ​​​​(such as C/C++ and assembly language) (more commonly used in Android development). For example, we now want to make C language programs To help our Java program implement the operation of a+b, first we need to create a local method:

public class Main {
    
    
    public static void main(String[] args) {
    
    
        System.out.println(sum(1, 2));
    }

    //本地方法使用native关键字标记,无需任何实现,交给C语言实现
    public static native int sum(int a, int b);
}

After creating it, click the build button, and an out folder will appear, which is where the generated class file is. Then we directly generate the corresponding C header file:

javah -classpath out/production/SimpleHelloWorld -d ./jni com.test.Main

The generated header files are located in the jni folder:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class com_test_Main */

#ifndef _Included_com_test_Main
#define _Included_com_test_Main
#ifdef __cplusplus
extern "C" {
    
    
#endif
/*
 * Class:     com_test_Main
 * Method:    sum
 * Signature: (II)V
 */
JNIEXPORT void JNICALL Java_com_test_Main_sum
  (JNIEnv *, jclass, jint, jint);

#ifdef __cplusplus
}
#endif
#endif

Then we create a new C++ project in CLion, introduce the header file just generated, and import the jni-related header file (in the JDK folder). First modify the CMake file:

cmake_minimum_required(VERSION 3.21)
project(JNITest)

include_directories(/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/include)
include_directories(/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/include/darwin)
set(CMAKE_CXX_STANDARD 14)

add_executable(JNITest com_test_Main.cpp com_test_Main.h)

Then you can write the implementation. First, get to know the reference type comparison table:

image-20230306164733817

So we can just return a+b here:

#include "com_test_Main.h"

JNIEXPORT jint JNICALL Java_com_test_Main_sum
        (JNIEnv * env, jclass clazz, jint a, jint b){
    
    
    return a + b;
}

Then we can compile cpp into a dynamic link library, which will generate .dylib files under MacOS and .dll files under Windows. Here we Just taking MacOS as an example, the command is a bit long because it also needs to include the header files in the JDK directory:

gcc com_test_Main.cpp -I /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/include -I /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/include/darwin -fPIC -shared -o test.dylib -lstdc++

After the compilation is completed, thetest.dylib file is obtained, which is the dynamic link library.

Finally, we put it on the desktop and load it in the Java program:

public class Main {
    
    
    static {
    
    
        System.load("/Users/nagocoler/Desktop/test.dylib");
    }

    public static void main(String[] args) {
    
    
        System.out.println(sum(1, 2));
    }

    public static native int sum(int a, int b);
}

Run and get the result successfully:

image-20230306164747347

By understanding some basic knowledge of JVM, we have a rough model of JVM in our mind. In the next chapter, we will continue to study in depth the memory management mechanism and garbage collector mechanism of JVM, as well as some practical tools.

Guess you like

Origin blog.csdn.net/qq_56098191/article/details/133211028
JVM
JVM