Class file structure and bytecode instructions

First, the structure of the Class class file

Class files are a set of binary streams based on 8-bit bytes. Each data item is arranged in the Class file in strict order and compactly, without adding any separators in the middle, which makes the entire Class file store almost all programs. Necessary data to run, no gaps exist. When encountering a data item that needs to occupy more than 8-bit bytes, it will be separated into several 8-bit bytes for storage according to the high-order first method.


The Class file format uses a pseudo-structure similar to the C language structure to store data. There are only two data types in this pseudo-structure: unsigned numbers and tables .

  • Unsigned number: It belongs to the basic data type. U1, u2, u4, and u8 represent unsigned numbers of 1 byte, 2 bytes, 4 bytes and 8 bytes respectively. Unsigned numbers can be used to describe numbers, index references, quantity values, or form string values ​​according to UTF-8 encoding.
  • Table: It is a composite data type composed of multiple unsigned numbers or other tables as data items, and all tables habitually end with "_info". A table is used to describe data with a composite structure with a hierarchical relationship, and the entire Class file is essentially a table.

Class file format:

type name quantity
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count-1
u2 access_flags 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
filed_info fileds fields_count
u2 methods_count 1
method_info methods methods_count
u2 attributes_count 1
attribute_info attributes attributes_count

The structure of Class is not like XML and other description languages. Since it does not have any separators, the data items in it, whether in order or quantity, are strictly limited. Which byte represents what meaning, what is the length, and the order No matter what, no changes are allowed.


1.1, the version of the magic number and Class file

The first 4 bytes (0xcafebabe) of each Class file is called the Magic Number, and its only function is to determine whether the file is a Class file that can be accepted by the virtual machine. Using magic numbers instead of extensions for identification is mainly based on security considerations, because file extensions can be changed at will. File format authors are free to choose the magic value, as long as the magic value has not been widely adopted and does not cause confusion.

Here we write a simple Java code, compile the Class file, and use the Sublime editor to open and view it

public class Client {
    
    
    public int calc() {
    
    
        int a = 100;
        int b = 200;
        int c = 300;
        return (a + b) * c;
    }
}

insert image description here


The 4 bytes following the magic number store the version number of the Class file: the 5th and 6th are the minor version number (Minor Version), and the 7th and 8th are the major version number (Major Version). The version number of Java starts from 45. After JDK 1.1, the main version number of each JDK major version is increased by 1. The higher version of JDK can be backward compatible with the previous version of the Class file, but cannot run the later version of the Class file, even if The file format has not changed in any way, and the virtual machine must also refuse to execute Class files that exceed its version number.
insert image description here

JDK version number Class version number Hexadecimal
1.1 45.0 00 00 00 2D
1.2 46.0 00 00 00 2E
1.3 47.0 00 00 00 2F
1.4 48.0 00 00 00 30
1.5 49.0 00 00 00 31
1.6 50.0 00 00 00 32
1.7 51.0 00 00 00 33
1.8 52.0 00 00 00 34

1.2. Constant pool

Immediately after the main version number is the entry of the constant pool. The constant pool can be understood as the resource warehouse in the Class file. It is the data structure most associated with other projects in the Class file structure, and it is also the data item that occupies the largest space in the Class file. One, and it is also the first table type data item that appears in the Class file.


The constant pool mainly stores two types of constants: Literal and Symbolic References.

  • Literal quantity: It is relatively close to the constant concept at the Java language level, such as text strings, constant values ​​declared as final, etc.
  • Symbol reference: it belongs to the concept of compilation principle, including the following three types of constants:
    • Fully Qualified Names of classes and interfaces
    • Field name and descriptor (Descriptor)
    • The name and descriptor of the method.

The number of constants in the constant pool is not fixed, so a u2 type of data needs to be placed at the entrance of the constant pool, representing the constant pool capacity count value (constant_pool_count). Unlike the language habit in Java, this capacity count starts from 1 instead of 0. (Because there are special considerations for the 0th item constant to be empty, the purpose is to satisfy the data of some index values ​​pointing to the constant pool in the following cases that need to express the meaning of "do not refer to any constant pool item", in this case It can be represented by setting the data of the index value to 0)
insert image description here
The number of constants in the above figure is 0x0016, which is 22 in decimal, so in our Class file, there are a total of 21 constants, which are the constants immediately below, here It is more obscure to look at directly. In fact, our JDK also provides us with corresponding tools javap. We switch to the directory where the class file is located in cmd, and then we can use javap -verbosethe command to view the class file.
insert image description here


1.3. Access flag

After the end of the constant pool, the next two bytes represent access flags (access_flags), which are used to identify access information at the class or interface level, including: whether the Class is a class or an interface; whether it is defined as a public type; whether Defined as an abstract type; if it is a class, whether it is declared as final, etc.
insert image description here

logo name flag value meaning
ACC_PUBLIC 0x0001 Whether it is public type
ACC_FINAL 0x0010 Whether it is declared final, only classes can be set
ACC_SUPER 0xxx20 Whether to allow the use of the new semantics of the invokespecial bytecode instruction.
The invokespecial instruction has changed in JDK1.2.
In order to distinguish which semantics this instruction uses,
this flag must be true for classes compiled after JDK1.2
ACC_INTERFACE 0x0200 Identifies that this is an interface
ACC_ABSTRACT 0x0400 Whether it is an abstract type, for interfaces or abstract classes,
the value of this flag is true, and the value of other classes is false
ACC_SYNCTHETIC 0x1000 Indicates that this class was not generated by user code
ACC_ANNOTATION 0x2000 Identifies that this is an annotation
ACC_ENUM 0x4000 Identifies that this is an enumeration

1.4, class index, parent class index and interface index collection

Both the class index (this_class) and the parent class index (super_class) are data of type u2, and the interface index collection (interfaces) is a collection of data of type u2.
insert image description here

These three items of data are used to determine the inheritance relationship of this class.The class index is used to determine the fully qualified name of this classThe parent class index is used to determine the fully qualified name of the parent class of this class

Since the Java language does not allow multiple inheritance, there is only one parent class index. Except for java.lang.Object, all Java classes have parent classes. Therefore, except for java.lang.Object, the parent class indexes of all Java classes are is not 0.The interface index collection is used to describe which interfaces this class implements,这些被实现的接口将按implements语句(如果这个类本身是一个接口,则应当是extends语句)后的接口顺序从左到右排列在接口索引集合中。


1.5、字段表集合

字段表(field_info)描述接口或者类中声明的变量。字段(field)包括类级变量以及实例级变量。

而字段叫什么名字、字段被定义为什么数据类型,这些都是无法固定的,只能引用常量池中的常量来描述。

字段表集合中不会列出从超类或者父接口中继承而来的字段,但有可能列出原本Java代码之中不存在的字段,譬如在内部类中为了保持对外部类的访问性,会自动添加指向外部类实例的字段。


1.6、方法表集合

描述了方法的定义,但是方法里的Java代码,经过编译器编译成字节码指令后,存放在属性表集合中的方法属性表集合中一个名为“Code”的属性里面。

与字段表集合相类似的,如果父类方法在子类中没有被重写(Override),方法表集合中就不会出现来自父类的方法信息。但同样的,有可能会出现由编译器自动添加的方法,最典型的便是类构造器“<clinit>”方法和实例构造器“<init>”


1.7、属性表集合

存储Class文件、字段表、方法表都自己的属性表集合,以用于描述某些场景专有的信息。如方法的代码就存储在 Code 属性表中、final关键字定义的常量值就存储在 ConstantValue 表中,除此之外我们的属性表中还有很多很多其他的信息,如记录方法签名信息的Signature、记录源文件的SourceFile、内部类列表的InnerClasses等等。



二、字节码指令

Java虚拟机的指令由一个字节长度的、代表着某种特定操作含义的数字(称为操作码,Opcode)以及跟随其后的零至多个代表此操作所需参数(称为操作数,Operands)而构成。

由于限制了Java虚拟机操作码的长度为一个字节(即0~255),这意味着指令集的操作码总数不可能超过256条。


大多数的指令都包含了其操作所对应的数据类型信息。例如:iload 指令用于从局部变量表中加载int型的数据到操作数栈中,而 fload 指令加载的则是float类型的数据。

大部分的指令都没有支持整数类型byte、char和short,甚至没有任何指令支持boolean类型。大多数对于boolean、byte、short和char类型数据的操作,实际上都是使用相应的int类型作为运算类型。


2.1、加载和存储指令

用于将数据在栈帧中的局部变量表和操作数栈之间来回传输,这类指令包括如下内容:

  • 将一个局部变量加载到操作栈: iloadiload_<n>lloadlload_<n>floadfload_<n>dloaddload_<n>aloadaload_<n>
  • 将一个数值从操作数栈存储到局部变量表: istoreistore_<n>lstorelstore_<n>fstorefstore_<n>dstoredstore_<n>astoreastore_<n>
  • 将一个常量加载到操作数栈: bipushsipushldcdc_wldc2_waconst_nulliconst_m1iconst_<i>lconst_<l>fconst_<f>dconst_<d>
  • 扩充局部变量表的访问索引的指令: wide

2.2、运算指令

用于对两个操作数栈上的值进行某种特定运算,并把结果重新存入到操作栈顶。

  • 加法指令: iaddladdfadddadd
  • 减法指令: isublsubfsubdsub
  • 乘法指令: imullmulfmuldmul
  • 除法指令: idivldivfdivddiv
  • 取余指令: iremlremfremdrem
  • 取反指令: ineglnegfnegdneg
  • 位移指令: ishlishriushrlshllshrlushr
  • 按位或指令: iorlor
  • 按位与指令: iandland
  • 按位异或指令: ixorlxor
  • 局部变量自增指令: iinc
  • 比较指令: dcmpgdcmplfcmpgfcmpllcmp

2.3、类型转换指令

可以将两种不同的数值类型进行相互转换

Java虚拟机直接支持以下数值类型的宽化类型转换(即小范围类型向大范围类型的安全转换):
int 类型到 long、float 或者 double 类型。
long 类型到 float、double 类型。
float 类型到 double 类型。

处理窄化类型转换(Narrowing Numeric Conversions)时,必须显式地使用转换指令来完成,这些转换指令包括:i2bi2ci2sl2if2if2ld2id2ld2f


2.4、对象创建及访问指令

  • 创建类实例的指令: new
  • 创建数组的指令: newarrayanewarraymultianewarray
  • 访问字段指令: getfieldputfieldgetstaticputstatic
  • 把一个数组元素加载到操作数栈的指令: baloadcaloadsaloadialoadlaloadfaloaddaloadaaload
  • 将一个操作数栈的值存储到数组元素中的指令: bastorecastoresastoreiastorefastoredastoreaastore
  • 取数组长度的指令: arraylength
  • 检查类实例类型的指令: instanceofcheckcast

2.5、操作数栈管理指令

如同操作一个普通数据结构中的堆栈那样,Java虚拟机提供了一些用于直接操作操作数栈的指令,包括:将操作数栈的栈顶一个或两个元素出栈:poppop2

复制栈顶一个或两个数值并将复制值或双份的复制值重新压入栈顶:dupdup2dup_x1dup2_x1dup_x2dup2_x2

将栈最顶端的两个数值互换:swap


2.6、控制转移指令

控制转移指令可以让Java虚拟机有条件或无条件地从指定的位置指令而不是控制转移指令的下一条指令继续执行程序,从概念模型上理解,可以认为控制转移指令就是在有条件或无条件地修改PC寄存器的值。控制转移指令如下。

条件分支: ifeqifltifleifneifgtifgeifnullifnonnullif_icmpeqif_icmpneif_icmpltif_icmpgtif_icmpleif_icmpgeif_acmpeqif_acmpne
复合条件分支: tableswitchlookupswitch
无条件分支: gotogoto_wjsrjsr_wret


2.7、方法调用指令

invokevirtual 指令用于调用对象的实例方法,根据对象的实际类型进行分派(虚方法分派),这也是Java语言中最常见的方法分派方式。

invokeinterface指令用于调用接口方法,它会在运行时搜索一个实现了这个接口方法的对象,找出适合的方法进行调用。

invokespecial指令用于调用一些需要特殊处理的实例方法,包括实例初始化方法、私有方法和父类方法。

invokestatic指令用于调用类方法(static方法)。

invokedynamic指令用于在运行时动态解析出调用点限定符所引用的方法,并执行该方法,前面4条调用指令的分派逻辑都固化在Java虚拟机内部,而invokedynamic指令的分派逻辑是由用户所设定的引导方法决定的。


方法调用指令与数据类型无关。


2.8、方法返回指令

是根据返回值的类型区分的,包括ireturn(当返回值是boolean、byte、char、short和int类型时使用)、lreturnfreturndreturnareturn,另外还有一条return指令供声明为void的方法、实例初始化方法以及类和接口的类初始化方法使用。


2.9、异常处理指令

在Java程序中显式抛出异常的操作(throw语句)都由athrow指令来实现


2.10、同步指令

There are monitorentertwo monitorexitinstructions to support the semantics of the synchronized keyword

Guess you like

Origin blog.csdn.net/rockvine/article/details/124802743