A simple understanding of Class File Format

Overview

Class files are a set of binary streams based on 8-bit bytes, which can be opened with tools such as Hex Friend.
CLass structure:

  • Unsigned number: basic type, u1, u2, u4, u4 represent unsigned numbers of one byte, two bytes, four bytes, and eight bytes respectively.

Table: A composite data type composed of multiple unsigned numbers or other tables as data items. It is customary to end with _info. The entire class file is essentially a table.

 

 

class specific structure

Magic number

The first 4 bytes of each class file is called the magic number. Its only function is to determine whether the file is a Class file that can be accepted by the virtual machine. Most file storage standards are authenticated by magic numbers. . Because the extension can be changed at will. For Class files, the magic value is 0xCAFEBABE .

 

version number

The 4 bytes after the magic number store the version number of the class file, the first two bytes are the minor version number, and the last two are the major version number. The Java version number starts from 45. For example, JDK1.1 can support Class files with version numbers 45.0 ~ 45.65535, and JDK1.2 can execute Class files with 45.0 ~ 46.65535. The class file in the example here uses JDK 8 , so it should be 5 2 .

Constant pool

Constant pool: The constant pool can be understood as the resource warehouse in the class file. It is the data type most associated with other projects in the class file structure.

Since the number of methods and variables in a class is uncertain, the constant pool entry has a data of type u2, which represents the constant pool capacity count value. But this capacity count starts from 1. In other words, if the value is 16 , then there are 15 constants. The reason why 0 is left out is because if the index of a constant does not refer to any constant, it can be represented by 0. The capacity count value in the figure is 16 , so there are 15 constants in total .

 

The constant pool mainly stores two types of constants: literal and conforming reference

  • Literal: the constant concept of the Java language , final modified keywords, strings, etc.
  • Symbol reference:
    • Fully qualified names of classes and interfaces
    • Field name and descriptor
    • Method name and descriptor

Each constant in the constant pool is a table, and the structure data of the table is different, some are three columns, some are two columns, in order to distinguish their structure, the first bit of these tables is a flag bit of type u1, which represents which one belongs to The type of the table means which constant type it represents. For example, if it is 1, it means that it is a constant of utf8 type, and if it is 10, it is a symbolic reference of the method.

 

 

Check that the first constant is 0 a , which is 10, and 10 corresponds to the item type in the constant pool is CONSTANT_Methodref_info, and this type has three parameters, the first is tag, which is 10, the second and the third Both are of the u2 type, as shown in the figure, 0 0 ( 0 ) and 03 ( 3 ), which respectively indicate the index to the declaration method and the name.

You can use javap decompilation to confirm. The first constant is like this, which is the same as calculated with hex.

  #1 = Methodref   #3.#13

 

Access sign

After the constant pool is access_flags, this flag is used to identify some class or interface level access information, such as whether this class is a class or an interface, is it public or abstract, and so on. You can also see the logo using javap.

 

flags: (0x0021) ACC_PUBLIC, ACC_SUPER

 

Class index, parent class index, interface index collection

Both the class index and the parent class index are data of type u2, and the interface index collection is a set of data of type u2. These three items determine the inheritance relationship in the Class file.

The class index and the parent class index each point to a class descriptor constant of type CONSTANT_Class_info. The fully qualified name string defined in the constant of type CONSTANT_Utf8_info can be found through the index value in this class descriptor constant.

FIG e.g. this_class index is 2 , the two indexes a and point 14 , 14 represents the class name.

 

 

Field table collection

field_info is used to describe the variables declared in the interface or class. Fields include class-level variables and instance-level variables, excluding temporary variables in methods.

The structure of the field table uses access_flags to indicate the scope, whether it is final, whether it is static, etc. In addition, two indexes are used to indicate the name and type of this variable (there are mapped characters, for example, int corresponds to I). There are also two fields that store additional information. If we assign an initial value to a variable, then these two fields will have corresponding values.

Method table collection

Similar to the field table collection, because the method does not have volatile and transient keywords, there is no ACC_VOLATILE flag and ACC_TRANSIENT flag in access_flags. The keywords of modified methods such as synchronized and native are added. Since it is similar to the field table collection, I won't go into details.

But it should be noted that the code in the method is stored separately under the code field in the method table. You can see that the following figure is the main method decompiled with javap. access_flags are ACC_PUBLIC and ACC_STATIC, descriptor represents the incoming parameters and return value. Below the code field is the code in the method.

 

You can also install a plug-in in idea: jclasslib bytecode viewer

 

Find and open it:

 

Guess you like

Origin blog.csdn.net/huzhiliayanghao/article/details/106841210