In-depth understanding of the Java Virtual Machine (class file structure)

Welcome attention to micro-channel public number: BaronTalk , get more exciting Good text!

ASM before reading the document, compiled for the class structure, method descriptor, many of the concepts access flag, ACC_PUBLIC, ACC_PRIVATE, various bytecode instructions, etc. Sounds are a muddle, a little knowledge, the reason lies in the structure of the class file and class loading mechanism is not understood. Until then perused "in-depth understanding of the Java Virtual Machine" relevant content in the virtual machine execution subsystem, before the establishment of a clear understanding. If you are like me, you do not understand the class structure and class loading, but the work also involves bytecode related content, I believe that behind the two articles will help you.

We each line of code written to run on a machine will eventually need to be compiled into machine code binary CPU to recognize. However, due to the existence of the virtual machine, shielding the differences in operating system and CPU instruction set, similar to the Java virtual machine built on top of this programming language is usually compiled into an intermediate file format to store, for example, we want to talk today bytecode (the byteCode) file.

I. language independence

Designer of the Java virtual machine early in the design considerations and realize the possibility of other languages ​​running on the Java Virtual Machine. So not only Java language can run on the Java Virtual Machine, to date Kotlin, Groovy, Jython, JRuby large number of JVM languages ​​such as to run on the Java Virtual Machine. And they will be the same as the Java language compiler into bytecode files, and then performed by a virtual machine. So that the class files (bytecodes file) having a language-independent.

Two. Class File Structure

Class file is a set of 8-bit binary stream in units of bytes based on respective data in strict accordance with the order of the compact Class file, without any intermediate separator, which makes the contents of the entire file is stored in Class almost all procedures data necessary to run, there is no gap. When it comes to data items need to occupy space above 8-bit bytes, splits endian manner into a plurality of 8-bit bytes for storage.

Class Java virtual machine specification file format uses a similar microstructure to C language structures to store data, such dummy structure bodies are only two types of data: unsigned and tables.

  • Unsigned belong to the basic data types to u1, u2, u4, u8 to represent one byte, unsigned 2-byte, four-byte and eight-byte, unsigned number can be used described figures, reference index, the value of the number of values or character strings in UTF-8 structure thereof.

  • Table is composed of multiple unsigned or other complex data types as table data items constituted, all the tables are accustomed to "_info 'end. Hierarchical data table is used to describe the relationship of the composite structure, the entire file is a Class tables, which the data items shown in the table below configuration.

Types of name Quantity
u4 magic 1
u2 minor_version 1
u2 major_version 1
u2 constant_pool_count 1
cp_info constant_pool constant_pool_count-1
u2 access_flags 1
u2 this_class 1
u2 super_class 1
u2 interfaces_count 1
u2 interfaces interfaces_count
u2 fields_count 1
field_info fields fields_count
u2 methods_count 1
method_info methods methods_count
u2 attributes_count 1
attribute_info attributes attributes_count

Class file byte stored in strict order on the table is aligned with the compact. What is the meaning which byte length is how much, how the order is strictly limited, does not allow any changes.

2.1 Class file version with the magic number

Class first four bytes of each file is called magic number (Magic Number), its only purpose is to determine whether the file is a file that can be Calss virtual machine receives. They are used magic instead of file extensions to be identified mainly based on security considerations, because the file extension is free to change. Class file magic value is "0xCAFEBABE."

Followed by 4 bytes stored magic number is the version number of the document Class: 5 and 6 are two-byte minor version number (Minor Version), 7 and 8 bytes is the major version (Major Version). High version of the JDK backward compatible version of the low Class file, the virtual refusal to perform beyond its version number of Class files.

2.2 constant pool

After the major version number is a constant pool entry, constant pool can be understood as a warehouse of resources among the Class file, which is Class file structure associated with most other items of data types, file space is occupied by one of the largest data Class project, with it is a table type data item or Class file first appears.

Because the number of constants constant pool is not fixed, it needs to be placed in a constant pool entry u2 type of data to represent the constant pool of capacity "constant_pool_count", and computer science counting method is not the same, this capacity is starting from 1 rather than starting to count from 0. The reason why the first 0 constant air out to meet some of the data points to the index value of the constant pool behind the need to express "does not refer to any project a constant pool" meaning in a particular case, this index value can be set to 0 to represent.

Class file structure only constant pool count capacity starting from 1, and the other set of types, including an interface index set, a set of fields of the table, the method table is set equal volume count from zero.

The main constant pool to store two types of constants: literal and symbolic references .

  • Literal closer to the concept of a constant level Java language, such as strings, declared as a constant value final and so on.

  • Symbol reference concept belongs to the principle aspects of the build, including the following three constants:

    • The fully qualified name of the class and interface
    • And a descriptor field name
    • The method name and descriptor

2.3 Access logo

Followed by two bytes represent access flag (access_flag) after the constant pool, this flag is used to identify some of the class or interface level access information, including the Class class or an interface; if the public is defined as a type; whether abstract type is defined as ; if it is the kind of thing, whether declared as final and so on. The specific meaning of the flag and a flag in the table below:

Flag Name Flag value meaning
ACC_PUBLIC 0x0001 Whether public type
ACC_FINAL 0x0010 Whether declared as final, only classes can be set
ACC_SUPER 0x0020 Whether to allow the use of bytecode instructions invokespecial new semantics, semantic invokespecial instructions had changed in JKD 1.0.2, the difference between these micro-chat instructions which use semantics, this flag JDK 1.0.2 compiled class must It is true
ACC_INTERFACE 0x0200 This is an interface identifier
ACC_ABSTRACT 0x0400 Whether the abstract types, or an abstract class is the interface for this flag is true, other classes is FALSE
ACC_SYNTHETIC 0x1000 This class does not identify the code generated by the user
ACC_ANNOTATION 0x2000 This logo is a comment
ACC_ENUM 0x4000 This logo is an enumeration

access_flags in a total of 16-bit flags may be used, currently only eight of which are defined, there is no requirement to use the flag is always 0.

2.4 category index, and the index of the parent class interface index set

Class index (this_class) and the parent index (super_class) u2 is a type of data, and the interface index set (the interfaces) u2 is a group of data types, Class file is determined by the class inherit three data relationship.

  • Class index for determining the fully qualified name of the class
  • Parent index for determining the fully qualified name of the parent class of the class
  • Interface index set is used to describe this class implements interfaces which

Field tables set 2.5

Field tables set (field_info) used to describe variables declared within a class or an interface. Field (field) comprises a class and instance variables, but does not include local variables declared within a method. Here we look at the structure of table fields:

Types of name Quantity
u2 access_flag 1
u2 name_index 1
u2 descriptor_index 1
u2 attributes_count 1
attribute_info attributes attributes_count

Access_flags modifiers in the field, it is very similar class access_flag, u2 are of a data type.

Flag Name Flag value meaning
ACC_PUBLIC 0x0001 Whether the field is public
ACC_PRIVATE 0x0002 Whether the field is private
ACC_PROTECTED 0x0004 Whether the field is protected
ACC_STATIC 0x0008 Whether the field is static
ACC_FINAL 0x0010 Whether the field is final
ACC_VOLATILE 0x0040 Whether the field is volatile
ACC_TRANSIENT 0x0080 Whether the field is transient
ACC_SYNTHETIC 0x1000 Whether the field is automatically generated by the compiler
ACC_ENUM 0x4000 Whether the field is enum

Method set of tables 2.6

Class file description and the description of the method is exactly the same field, and the fields of the table has the same construction method table.

Because the volatile keyword and keyword transient method can not be modified, so the access flag table method is not ACC_VOLATILE and ACC_TRANSIENT. By contrast, synchronizes, native, strictfp abstract and keywords can be modified method, the access flag table method adds ACC_SYNCHRONIZED, ACC_NATIVE, ACC_STRICTFP and ACC_ABSTRACT flag.

For the method in the code is compiled into the compiled byte code instructions, stored in the process attribute table attribute named "Code" inside.

2.7 attribute table collection

Can bring their own attribute table (attribute_info) Class document collection, field table, the method table, information describing certain scenarios proprietary.

属性表集合不像 Class 文件中的其它数据项要求这么严格,不强制要求各属性表的顺序,并且只要不与已有属性名重复,任何人实现的编译器都可以向属性表中写入自己定义的属性信息,Java 虚拟机在运行时会略掉它不认识的属性。

写在最后

为了控制篇幅,这篇文章里丢弃了很多细节,比如常量池的项目类型、方法表、属性表的具体内容等等。建议想要深入了解的同学可以自己动手将 Java 类编译成二进制字节码文件,根据文章里介绍的类文件结构逐个字符去对照和实验,有助于加深理解。

关于「类文件结构」我们就介绍到这里,下一篇我们来聊聊「虚拟机的类加载机制」。

参考资料:

  • 《深入理解 Java 虚拟机:JVM 高级特性与最佳实践(第 2 版)》

如果你喜欢我的文章,就关注下我的公众号 BaronTalk知乎专栏 或者在 GitHub 上添个 Star 吧!

Guess you like

Origin juejin.im/post/5d062bba6fb9a07ee742ddcc