The Story of JVM - Class File Structure

Class file structure


I. Overview

Computers only recognize binary codes composed of 0s and 1s, but with development, the programs we write can be compiled into a format that is independent of the instruction set and platform-neutral.

2. The cornerstone of irrelevance

For different platforms and Java virtual machines on different platforms, they all support a program storage format—bytecode. The Java virtual machine is not bound to the Java language, it is only bound to class files. Class files generated after compilation in any language can be run on the Java virtual machine.
Insert image description here

3. Structure of Class class file

The Java language has always maintained good backward compatibility, and the stability of the Class file structure is indispensable. The Java language has experienced many improvements and updates, and the structure and functions of Class files have almost remained unchanged, with only some new and supplementary content.
Each class file corresponds to the definition information of a unique class or interface, but not every class or interface necessarily has a corresponding class file (it can be dynamically generated and directly entered into the class loader).
Class files are a set of binary streams based on 8 bytes without any separators in between. When encountering a space storage larger than 8 bytes, the variable will be divided into several 8 bytes for storage according to the high order. Class files store data through only two structures: unsigned numbers and tables. u1, u2, and u8 represent unsigned numbers of 1, 2, and 8 bytes. A table is a structure composed of multiple unsigned numbers or other tables, and the name of the table usually ends with "_info". The entire class file can also be regarded as a table, and the structure of the table is shown in the figure below.
Insert image description here
1. Magic number and Class file version
The first four bytes of each class file are the magic number, which is used to identify whether the class file can be accepted by the virtual machine. Many types of files have magic numbers, such as gif, jpg, etc. Using a magic number is safer than using a suffix to identify the file format, because the suffix can be changed. The magic number of the file format is defined by the developer himself. The magic number of the class file is 0XCAFEBABE.
The four bytes immediately following the magic number are the version number definition of the class file. The fifth and sixth bytes define the minor version number. ), the seventh octet defines the major version number (major version). The Java class file version number starts from 45, and each major version of Java is released, and the main version number is increased by one. Higher version JDKs are compatible with lower version class files.
Insert image description here
Analyzing this figure, we can conclude that the minor version number is 0 from the 5th and 6th digits being 0x0000, and the major version number is 50 from the 7th and 8th digits being 0x0032. Therefore, the class file version number is 50.0, and the corresponding JDK version should be It is JDK1.6. JDK1.6 can support class files with version numbers 45.0-50.65535.

2. Constant pool
After the major and minor version numbers is the constant pool. It is the resource warehouse in the class file. It occupies a large amount of data and is also the first table type data item to appear.
The entrance of the constant pool needs to place a u2 type data to indicate how many constants there are. The count of this capacity starts from 1 instead of 0.
Insert image description here
As shown in the picture, you can see that at the offset address 0x00000008, the value is 0x0016, that is, the capacity of the constant pool is 22, so there are 21 constants with indexes 1-21. Leaving 0 empty is designed to express that no constant is referenced. However, except for other collections in the constant pool, the indexes still start from 0.
Each constant in the constant pool is a table. As of JDK13, there are a total of 17 different types of constants in the constant table. These constant tables all have the same characteristic, that is, their first bit is the u1 type flag.
Insert image description here
Let's start the analysis of constants.
Insert image description here
The offset bit 0x00000008 shows that there are 21 constants. Starting from 0x0000000A, 0x07 is the flag bit (tag) of the first constant. According to the item type of the constant pool in 6-3, we can know that this is a symbolic reference to a class or interface.
Insert image description here

According to Table 6-4, you can know that the next two bytes should represent the name_index of this constant, which is the index value of the constant pool. It points to a CONSTANT_Utf8_info type constant, which represents the fully qualified name of the class. Starting from offset 0x0000000B, you can know that the index value of this constant is 0x0002, which points to the second constant.
Then start from 0x0000000D and look at the second constant. The flag bit is 0x01. From Table 6-3, we can see that the constant is a UTF-8 encoded string. UTF-8 abbreviated encoding is used here. The difference is: the abbreviated encoding of characters between '\u0001' and '\u007f' (equivalent to ASCII codes 1 to 127) is represented by one byte, from ' The abbreviation encoding of all characters between \u0080' and '\u07ff' is represented by two bytes. The abbreviation encoding of all characters from '\u0800' to '\uffff' is according to ordinary UTF-8 Encoding rules are expressed using three bytes. (In fact, all the preceding 0s are omitted)
Insert image description here
From Table 6-5, you can start from the offset 0x0000000E. Looking at the length of the string, it is 0x001D, which is 29. The next 29 bytes are all within the ASCII code range of 1 to 127, and the content is "org/fenixsoft/clazz/TestClass. You
can use the javap command to output the constant table. Comparing the output constants, it is found that the two constants analyzed are Correct.
Insert image description here
Things like "I", "V" and "" in the constant table do not exist in the program. They are automatically generated by the compiler and are used to describe some content that is inconvenient to use fixed bytes to describe.

3. Access flag
After the constant pool ends, the next two bytes are the access flag. The access flag represents the access information of the class or interface

Insert image description here
For example, in the code in 6-1, TestJava is an ordinary Java class. It is modified by public and compiled using a compiler after JDK1.2, so its access_flag should be 0x0001|0x0020=0x0021


Guess you like

Origin blog.csdn.net/weixin_45841848/article/details/132594529