class file structure

This article describes the class file structure in terms of:

 

(1) The structure of the class file

1. Magic number and class file version

2. Constant pool

3. Access sign

Four, class index, parent class index, interface index collection

Five, field table collection

6. Method table collection

Seven, attribute table collection

 

(1) The structure of the class file

 

1 Overview

            The class file is a binary stream with 8-bit bytes as the basic unit, and each structure is arranged in strict order without any separator in the middle. When encountering space data that needs to occupy more than 8 bytes, it will be divided into several bytes for storage according to the high-order first.

 

2. Data types in the class file

 The Java virtual machine specification stipulates that the format of the Class file is stored in a pseudo-structure similar to the structure of the C language.

 There are two data types:

 (1) Unsigned numbers

 (2) Table

 

2.1, unsigned numbers

         It belongs to the basic data type, and u1, u2, u4, and u8 represent unsigned numbers of 1 byte, 2 bytes, 4 bytes, and 8 bytes respectively;

         Unsigned numbers are used to describe: numbers, index references, quantity values, or string values ​​encoded in UTF-8.

 

2.2, table

         A table is a composite data type composed of multiple unsigned numbers or other tables as data items, and all tables are used to end with "_info";

 

2.3. Graphical class file format


 

 1. Magic number and class file version

 

1 Overview

             The first 4 bytes of the class file are represented as Magic Numbers. The starting four-byte value is 0xCAFEBABE.

2. Function

            Used to determine whether the file is a Class file that can be loaded by the virtual machine.

 3. Minor version, major version

            The u2 immediately after the magic is the minor version (minor_version), and the next two bytes of u2 are the major version (major_version)



 

4. The version number of the class file



 

2. Constant pool

 

 1 Overview

              The data items after the minor version and the major version are the constant pool. The constant pool is most closely related to other items in the class file. It is also one of the largest data items in the class file, and it is also the first table-structured data in the class file.

 

2. Constant pool capacity count value (constant_pool_count)

      The entry of the constant pool places a u2 type of data representing the constant pool capacity count value (constant_pool_count). The pool capacity count value starts from 1, and the 0 item has a special meaning (0 item, in order to satisfy some data pointing to the index value of the constant pool, it needs to express "does not refer to any constant pool item" in a specific case).

 

2.1. Attention             

        Only the constant pool in the class file is counted from 1, and other collection types, such as: interface index collection, field table collection, method table collection, etc. are counted from 0.

 

3. Constant types stored in the constant pool

(1) Literal

(2) Symbolic References

 

3.1, literal

        Similar to constants in java, such as: text strings, constant values ​​declared as final.

 

3.2. Symbolic references

         It belongs to the concept of compilation principle, including the following three types of constants:

       (1) The fully qualified name of the class and interface (Full Qualified Name)

       (2) Field names and descriptors

       (3) Method name and descriptor

 

3.2.1 Fully qualified name, simple name, descriptor

(1) Fully qualified name: a string consisting of a package name and a class name

(2) Simple names: method or field names without type and parameter modifiers

 

(3) Descriptor:

         ① Used to describe: the data type of the field, the parameter list of the method (including the parameter type, quantity, order) and the return value of the method;

         ②Descriptor rules: 8 basic data types and void with no return value are represented by a capital letter, as shown in the figure below. For the object type, it is represented by the character uppercase L plus the fully qualified name of the object, such as: Ljava/lang/String;

         ③Array representation: one-dimensional array such as: int[] is represented as [I, two-dimensional array java.lang.String[][] is represented as [[Ljava.lang.String

         ④ When descriptors are used to describe method parameters, they are described in the order of the parameter list first and then the return value. For example, the method void fun() is expressed as

()V, java.lang.String toString(), expressed as ()Ljava.lang.String

   

3.2.2. Descriptor representation meaning diagram



 

 4. Item types in the constant pool

       Each constant in the constant pool is a table. There are 11 table structure data with different item data. The 11 table structure data have one thing in common. The starting position is the flag bit of type u1 (tag, value). 1-12, the data type of flag 2 is missing), the tag flag bit indicates which constant type the current constant belongs to. The meaning of the constant type in 11 is shown in the following figure:


 

 4.1, 11 data type structure summary table

 

 

 

 3. Access flags (access_flags)

       

1 Overview:

      The data of type u2 immediately following the constant pool represents access flags (access_flags).

       

2. Function:

      Identify the access information of the class or interface, including whether the class is a class or interface, whether it is defined as public, whether it is defined as abstract; if it is a class, whether it is declared as final.

 

3. The meaning table of access representation



 
4. Description

      access_flag has a total of 32 flag bits that can be used, currently only 8 are defined, and the unused flag bits are all 0.

      For example: ACC_PUBLIC(0x0001) and ACC_SUPER(0x0020) are used, 0x0001 | 0x0020 = 0x0021 (the hexadecimal number of the file presented in the bytecode), so the other flags are all 0.  

   

Four, class index, parent class index, interface index collection

 

1 Overview

            The class index (this_class) and the super class index (super_class) are both a u2 data type, and the interface index set (interfaces) is a set of u2 type data sets. The class file determines the inheritance relationship through the above data types. These types of data are ordered after the access flags (access_flags).

 

1.1, class index (this_class)

            The fully qualified name used to determine the class.

1.2, parent class index (super_class)

            The fully qualified name used to determine the parent class of this class. Java does not support multiple inheritance. Except for java.lang.Object, all classes have parent classes, so the parent class index is not 0.

1.3, interface index collection (interfaces)

            Used to describe those interfaces that the class implements, and these implemented interfaces (multiple inheritance extends if the class itself is an interface) will be arranged in the interface index collection from left to right in the order of the interfaces after the implements statement.

 

2. Description

 

(1) The class index (this_class) and the super class index (super_class) refer to the index value representation of the u2 type, they point to a descriptor constant of type CONSTANTS_Class_info, through the constant index value in the CONSTANTS_Class_info type can find the constant defined in the CONSTANTS_Utf8_info type The fully qualified name string in .

 

(2) Interface index set, the first item is a u2 type interface counter (interfaces_count), which indicates the capacity of the index table.

 

 Five, the field table collection (fields) 

 

1 Overview

      The fields table (fields_info) is used to describe variables in a class or interface.

      Fields include class variables and instance variables, excluding variables declared inside methods.

     

1.1. Describe the information included in a field:

       ① The scope of the field (public, private, protected);

       ②Class variable The latter is an instance variable (static);

       ③Whether it is a constant final;

       ④ Whether concurrency visibility volatile is forced to read and write from memory;

       ⑤ Can it be serialized (transient);

       ⑥ The data type of the field (basic data type, array, object)

 

 1.2. The modifiers for fields are all boolean values, which are suitable to be represented by flag bits. For the data types and names of fields are uncertain, they can only be represented by constants in the constant pool.

 

2. Field table structure

 

 

Field access flag (access)

 

 3. Description

 (1) The following access flags are two index values: name_index, description_index. Both are references to the constant pool, representing the simple name of the field and the descriptor of the method, respectively. (Descriptors have been explained in previous chapters). For attribute_info information after description_index, it will be introduced in subsequent chapters.

 (2) The fields inherited from the parent class will not be listed in the field table collection collection. But there may be fields that do not exist in the code. For example, the inner class maintains access to the outer class, and an instance field pointing to the outer class is automatically added.

(3) Fields in java cannot be overloaded, and field names cannot be duplicated. For bytecode, it is legal that the descriptors of two fields (the data types describing the fields) are inconsistent, and the field names are the same.

 

 6. Method table collection (method_info)

 

1. Method table structure



 Method access flags (access_flags)


 

2. Signature signature

      The signature of the method in the java code only includes the name of the method, the order of parameters, and the type of the parameter, while the signature in the bytecode file also includes the return value of the method and the checked exception table.

 

3. Description

(1) The volatile and transient keywords cannot modify methods, so the access flags do not have the above two flags.

(2) The code in the java method is stored in an attribute named "code" in the attribute table collection, and the attribute table will be introduced later.

(3) If the method of the parent class is not overridden in the child class, the method information from the parent class will not appear in the method table collection (what you see is what you get, only the methods appearing in the class file are compiled). Likewise, it may occur that the compiler automatically adds methods, typically class constructor "<clinit>" and instance constructor "<init>" methods

(4) The overloading of methods in Java, in addition to the same name as the original method, must also have a different signature from the original method, and the return value will not be included in the signature. load.

(5) In a class file, two methods with the same name, the same feature signature, and different return values ​​of the methods can coexist in a class file.

 

 Seven, attribute table collection

         Attribute table (attribute_info): The field table and method table in the class file can have their own set of attribute tables, which are used to describe the proprietary information of certain scenarios.

 

Properties defined by the virtual machine specification:



 

 For the name of the above attribute, a constant representation of type CONSTANT_Utf8_info needs to be referenced from the constant pool, and the structure of the attribute value is completely customized.

 

7.1 Code property

      After the code in the java method body is compiled by javac, the final bytecode instruction is stored in the Code property. The Code attribute appears in the attribute collection of the method table, and not all methods have this attribute, such as: interfaces, abstract methods in abstract classes.

 

1. code attribute

 

The structure of the code attribute:

 

 (1) attribute_name_index is an index pointing to a constant of type CONSTANT_Utf8_info, and the constant value is fixed as "Code";

          attribute_length indicates the length of the attribute value, attribute name + attribute value = u6 bytes, so the entire code attribute minus 6 bytes is the length of the attribute value;

 

(2) max_stack: The maximum depth of the Oprand Stacks. At any point in the method execution, the operand stack will never exceed this depth. The virtual machine runtime needs to allocate the depth of the operand stack in the stack frame according to this value.

 

(3) max_locals: represents the storage space required for local variables. The unit of max_locals is slot.

         A slot is the smallest unit used by the virtual machine to allocate memory for local variables. 1 slot is 32 bits.

 

(4) code, code_length: used to store the bytecode instructions generated after the java source program is compiled.

         code_length is the length value of the u4 type. In theory, the maximum value can reach the 32nd power of 2 -1. The virtual machine specification stipulates that the method does not allow more than 65535 bytecode instructions. If the limit is exceeded, the javac compiler will refuse to compile .

         The code attribute is an important attribute in the class file. If the information in a java program is divided into two, the java code (code, the java code in the method body) and the metadata (Metadata, including classes, fields, methods and other information) in two parts. The code attribute is used to describe the code, and all other data items are used to describe the metadata.

 

(5) exception_info: exception table information

         There are four fields in the exception table: the start_pc row to the end_pc row (excluding the end_pc row). If a catch_type or its subtype exception occurs, jump to the handler_pc row to continue processing.

         If the value of catch_type is 0, it means that any abnormal situation should be turned to handler_pc for processing.

 

 2. Exceptions property

       The Exceptions attribute is an attribute in the method table that is level with the Code attribute. The Exceptions property lists the checked exceptions (compile-time exceptions) that can be thrown, that is, the exceptions listed after the throws keyword declared on the method.

 

3. LineNumberTable property

      This property is used to describe the correspondence between the java source code line number and the bytecode line number. This property is not required at runtime.

 

4、LocalVariableTable属性

      It is used to describe the relationship between the variables in the local variable table in the stack frame and the variables defined in the java source code. It is a non-essential attribute at runtime.

 

5. SourceFile property

      Describes the name of the source file that generates the class file. This attribute is optional.

 

6. ConstantValue property

      This property tells the virtual machine to automatically assign values ​​to static variables.

      Assignment to instance variables is performed in the instance constructor <init> method;

      Class variables are assigned in two ways:

              ①In the class constructor <client> method.

              ②Use the constantvalue attribute to assign values.

     If final and static modify a variable at the same time, it is a constant, and the constant type is a basic data type or a string type, and the ConstantValue property is generated for initialization; if the variable is not modified by final, and it is a non-basic type and character String type, choose to initialize in the <clinit> method.

 

7. innerclass attribute

      Describes the association between inner and outer classes.

 

 

The above content refers to the book "In-depth Understanding of Virtual Machines" by Zhou Zhiming. . .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326325080&siteId=291194637