In-depth Java virtual machine (two) Java Class file

What is Java Class?

The Java Class file is a precise definition of the binary file format of the Java program. Each Java Class file has a comprehensive description of a Java class or interface. A Java Class file can only contain one class or interface. The class file does not necessarily have to be related to the Java language, you can use other languages ​​to write the program, and then compile it into a class file

The Java class file is a binary stream of 8-bit bytes. The data items are stored in the class file in order, and there is no interval between adjacent items, which can make the class file compact. Items occupying more than one byte space are divided into several consecutive bytes and stored in the order of high order first.

In the class file, the size and length of variable-length items are located before the actual data. This feature allows the class file stream to be parsed sequentially from beginning to end, first reading the item size, and then reading the item data.

Class file content

The Java class file contains all the information about the class or interface that the Java virtual machine needs to know.

Basic types of class files

All the values ​​stored in the items of type u2, u4, u8 appear in the form of high order first in the class file.

Types of description
u1 1 byte, unsigned type
u2 2 bytes, unsigned type
u4 4 bytes, unsigned type
u8 8 bytes, unsigned type

The format of the ClassFile table

The items in the variable-length ClassFile table list the main parts in the order in which they appear in the class file.

Types of name Quantity
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count-1
u2 access_flags 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
field_info fields fields_count
u2 methods_count 1
method_info methods methods_count
u2 attributes_count 1
attribute_info attributes attributes_count

The brief introduction is as follows:

magic (magic number)

The first four bytes of each Java class file is called its magic number: 0xCAFEBABE. The function of the magic number is to easily distinguish Java class files.

When Java was still called "Oak", this magic number was already set. The choice of 0xCAFEBABE is just a coincidence.

minor_version 和 major_version

The 4 bytes behind the magic number of the class file contain the major and minor version numbers. With the development of Java technology, the Java class file format may have new features. Once the class file format changes, the version number will also change.

For the Java virtual machine, the version number determines the specific class file format. Generally, the Java virtual machine can read the class file only after a given major version number and a series of minor version numbers. If the version number of the class file exceeds the valid range that the Java virtual machine can handle, the Java virtual machine will not process the class file. (Follow the principle of backward compatibility)

major version jdk
52 8
51 7
50 6
49 5
48 4

constant_pool_count 和 constant_pool

After the version number is the constant pool. As mentioned in the previous article, the constant pool contains constants related to the classes and interfaces in the file. The constant pool stores constants such as text strings, final variable values, class names, and method names. The Java virtual machine organizes the constant pool in the form of an entry list, and the constant_pool_count before the actual list constant_pool is the entry.

Many entries in the constant pool point to other constant pool entries, and many entries following the constant pool in the class file will also point to the entries in the constant pool. The index of the first item in the constant pool list is 1, the index of the second item is 2, and so on. Although there is no entry with an index value of 0 in the constant_pool list, the missing entry is also counted by the constant_pool_count. For example, when there are 14 items in constant_pool, the value of constant_pool_count is 15.

Each constant pool entry starts with a one-byte flag, which indicates the constant type at that position in the list. Once the Java virtual machine obtains and parses this flag, the Java virtual machine will know what the constant type is after the flag.

Constant pool entry flag

Entry type Flag value description
CONSTANT_Utf8 1 UTF-8 encoded Unicode string
CONSTANT_Integer 3 int type literal
CONSTANT_Float 4 float type literal
CONSTANT_Long 5 lang type literal
CONSTANT_Double 6 double type literal
CONSTANT_Class 7 Symbolic reference to a class or interface
CONSTANT_String 8 String type literal
CONSTANT_Fieldref 9 Symbolic reference to a field
CONSTANT_Methodref 10 Symbolic references to methods declared in a class
CONSTANT_InterfaceMethodref 11 Symbolic reference to a method declared in an interface
CONSTANT_NameAndType 12 Partial symbolic reference to a field or method

Each sign in the table has a corresponding table, and the table name is generated by adding the suffix "_info" to the sign name.

In dynamically connected Java programs, the constant pool plays a very important role. In addition to literal constant values, the constant pool can also accommodate the following types of symbol references:

  • Fully qualified names of classes and interfaces
  • Field name and descriptor
  • Method name and descriptor

Fields are instance variables or class variables of a class or interface. The field descriptor is a string indicating the type of the field.
The method descriptor is also a string indicating the number, order, and type of the return value and parameters of the method.
At runtime, the Java virtual machine uses the fully qualified name of the constant pool, method and field descriptors to connect the code in the current class or interface with the code in other classes or interfaces. Since the class file does not contain information about the final memory layout of its internal components (different virtual machine implementations may have different memory allocation algorithms), the classes, fields, and methods cannot be directly referenced by the bytecode in the class file . The Java virtual machine obtains the symbol reference from the constant pool, and then resolves the actual address of the reference item at runtime.

access_flags

紧接着常量池后的两个字节称为access_flags。它展示了文件中定义的类或接口的几段信息。
例如,访问标志指明文件中定义的是类还是接口;访问标志还定义了在类或接口的声明中,使用了那种修饰符:抽象的还是公共的;累的类型可以为final,而final类不可能是抽象的;接口不能为final类型等。

access_flags的标志位

标志名 设置后的含义 设置者
ACC_PUBLIC 0x0001 public类型 类和接口
ACC_FINAL 0x0010 类为final类型 只有类
ACC_SUPER 0x0020 使用新型的invokespecial语义 类和接口
ACC+INTERFACE 0x0200 接口类型,不是类类型 所有接口,没有类
ACC_ABSTRACT 0x0400 abstract类型 所有接口,部分类

在access_flags中所有未使用的位都必须由编译器置0,而且Java虚拟机必须忽略它。

this_class

access_flas 后面的两个字节为this_class项,它是一个对常量池的索引。在this_class位置的常量池入口必须为CONSTANT_Class_info表。该表有两个部分组成:标签和name_index。标签部分是一个具有CONSTANT_Class值的常量;name_index位置的常量池入口是一个包含了类或接口全限定名的CONSTANT_Utf8_info表。

this_class项提供了一个如何使用常量池的范例,流程如下:
在这里插入图片描述
对于this_class来说,它只是一个指向常量池的索引。当Java虚拟机在this_class位置查询常量池入口的时候,它会发现一个把自己的标签设为CONSTANT_Class来识别自身的项。虚拟机知道CONSTANT_Class_info入口中,标签的后面总会有一个名为name_index的、指向常量池的索引,于是虚拟机根据name_index查找常量池入口,在这个位置Java虚拟机应该能找到一个容纳了类或者接口全限定名的CONSTANT_Utf8_info入口。

super_class

在class文件中,紧接在this_class之后的是super_class项,它是一个两个字节的常量池索引。

在super_class位置的常量池入口是一个指向该类超类全限定名的CONSTANT_Class_info入口。因为Java程序中所有对象的基类都是java.lang.Object类,除了Object外,常量池索引super_class对所有类均有效。对于Object类,super_class的值为0.对于接口,在常量池入口super_class位置的项为java.lang.Object

interfaces_count 和 interfaces

紧接着super_class的是interfaces_count。此项的含义为:在文件中由该类直接实现或者由该接口所扩展额父接口的数量。在这个计数的后面,是名为interfaces的数组,它包含了对每个直接实现的父接口常量池索引。

每个父接口都是用一个常量池中的CONSTANT_Class_info入口来描述,指向接口的全限定名。

fields_count 和 fields

紧接在interfaces后面的是对该类或者接口中所声明的字段的描述。首先是名为fields_count的计数,它是类变量和实例变量的字段的数量总和。在计数后面的是不同长度的field_info表的序列(fields_count指出了序列中有多少个fields_info表)

只有在class文件中由类或接口声明的字段才能在fields列表中列出,并且不会列出从超类或者父接口中继承而来的字段。

另外。fields列表可能会包含在对应的Java源文件中没有的字段,这是因为Java编译器可能会在编译时向类或接口添加字段。例如,对于一个内部类的fields列表来说,为了保持对外部类实例的引用,Java编译器会为每个外围类实例添加实例变量。而源码中并没有任何该实例的描述,它们是被Java编译器在编译时添加到fields列表中的,这些字段会通过Synthetic属性标识。

每一个field——info表都展示了一个字段的信息,包含了字段的名字、描述符和修饰符。如果该字段被声明为final,还会展示其常量值。这样的信息有些会直接放在field_info表中,有些则会放在由field_info表所指向的常量池中。

methods_count 和 methods

紧接着fields后面的是对在该类或者接口中所声明的方法的描述。首先是methods_count的计数,它是一个双字节长度的对于该类或者接口中声明的所有方法的总计数。这个总计数只包括在该类或者接口中显式定义的方法,从父类或者父接口中继承来的方法不被计入。在methods_count后面的就是方法本身,通过一个method_info表的列表进行阐述(methods_count指出了列表中有多少个method_info表)。

method_info表包含了与方法相关的一些信息,包括方法名和描述符(方法的返回值类型和参数类型)。如果方式既不是抽象的,也不是本地的,那么method_info表就包含方法局部变量所需的栈空间长度、为方法所捕获的异常表、字节码序列以及可选的行数和局部变量表。

attributes_count 和 attributes

class文件中最后的部分是属性,它给出了在该文件中类或者接口所定义的属性的基本信息。属性部分由attributes_count开始,attributes_count指出后续attributes列表中attribute_info表的数量总和。每个attribute_info的第一项是指向常量池中CONSTANT_Utf8_info表的索引,该表给出了属性的名称。

为了正确的解释Java class文件,定义了9种属性

名称 使用者 描述
Code method_info 方法的字节码和其他数据
ConstantValue field_info final变量的值
Deprecated field_info、method_info 字段或者方法被禁用的指示符
Exceptions method_info 方法可能跑出的可被检测的异常
InnerClasses ClassFile 内部、外部类的列表
LineNumberTable Code_attribute 方法的行号与字节码的映射
LocalVariableTable Code_attribute 方法的局部变量的描述
SourceFile ClassFile 源文件名
Synthetic field_info、method_info 编译器产生的字段或者方法的指示符

结语

class文件结构到这里就结束了,本文只是从深入Java虚拟机书中对class文件结构进行简单的总结和记录,要想深究class文件的结构,还需费一番功夫。

总之前进的脚步不能停,在此篇的基础上预告下一期:Java类型的生命周期

Guess you like

Origin blog.csdn.net/lijie2664989/article/details/106749327