Analysis of Java bytecode file structure-read bytecode from JVM perspective (4)

1. The overall structure of bytecode

Insert picture description here
Description:

Types of name Explanation length
u4 magic Magic number, identify Class file format 4 bytes
u2 minor_version Minor version number 2 bytes
u2 major_version Major version number 2 bytes
u2 constant_pool_count Constant Pool Calculator 2 bytes
cp_info constant_pool Constant pool n bytes
u2 access_flags Access flag 2 bytes
u2 this_class Class index 2 bytes
u2 super_class Parent index 2 bytes
u2 interfaces_count Interface counter 2 bytes
u2 interfaces Interface index collection 2 bytes
u2 fields_count Number of fields 2 bytes
field_info fields Field collection n bytes
u2 methods_count Method counter 2 bytes
method_info methods Method Collection n bytes
u2 attributes_count Additional attribute counter 2 bytes
attribute_info attributes Additional attribute collection n words

Class file structure description:

ClassFile {
          u4 magic;
          u2 minor_version;
          u2 major_version;
          u2 constant_pool_count;
          cp_info constant_pool[constant_pool_count-1];
          u2 access_flags;
          u2 this_class;
          u2 super_class;
          u2 interfaces_count;
          u2 interfaces[interfaces_count];
          u2 fields_count;
          field_info fields[fields_count];
          u2 methods_count;
          method_info methods[methods_count];
          u2 attributes_count;
          attribute_info attributes[attributes_count];

2. Java bytecode 2 data types

  • Direct data of byte data: This is the basic data type. It is subdivided into four types: u1, u2, u4, and u8, which represent the continuous 1 byte, 2 bytes, 4 bytes, and 8 bytes.
  • Table (array): A table is a large data collection composed of multiple basic data or other tables in a predetermined order. The table is structured, its structure is reflected in: the position and order of the components that make up the table are strictly defined.

3. Code case:

public class MyTest1 {
    private int a = 1;

    public MyTest1() {
    }

    public int getA() {
        return this.a;
    }

    public void setA(int a) {
        this.a = a;
    }
}

Execute the command to javap -c MyTest1.classdecompile:

Compiled from "MyTest1.java"
public class com.jvm.test.byteclass.MyTest1 {
  public com.jvm.test.byteclass.MyTest1(); // 构造方法
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: aload_0
       5: iconst_1
       6: putfield      #2                  // Field a:I
       9: return

  public int getA();  // getA方法
    Code:
       0: aload_0   load
       1: getfield      #2                  // Field a:I
       4: ireturn

  public void setA(int);
    Code:
       0: aload_0
       1: iload_1
       2: putfield      #2                  // Field a:I
       5: return
}

Run the command to javap -verbose MyTest1.classget more detailed information:

Classfile /Users/zhengyunwei/Documents/code/PingPong/selleros/selleros-demo/jvm-demo/target/classes/com/jvm/test/byteclass/MyTest1.class // 文件的位置
  Last modified 2020-3-1; size 491 bytes // 最后修改时间,占用字节大小
  MD5 checksum 26e935bab4c89b0f4ee9ba147467c653 //md5
  Compiled from "MyTest1.java" // 编译源文件

 真正的文件编译结果:
 
public class com.jvm.test.byteclass.MyTest1
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#20         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#21         // com/jvm/test/byteclass/MyTest1.a:I
   #3 = Class              #22            // com/jvm/test/byteclass/MyTest1
   #4 = Class              #23            // java/lang/Object
   #5 = Utf8               a
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               LocalVariableTable
  #12 = Utf8               this
  #13 = Utf8               Lcom/jvm/test/byteclass/MyTest1;
  #14 = Utf8               getA
  #15 = Utf8               ()I
  #16 = Utf8               setA
  #17 = Utf8               (I)V
  #18 = Utf8               SourceFile
  #19 = Utf8               MyTest1.java
  #20 = NameAndType        #7:#8          // "<init>":()V
  #21 = NameAndType        #5:#6          // a:I
  #22 = Utf8               com/jvm/test/byteclass/MyTest1
  #23 = Utf8               java/lang/Object
{
  public com.jvm.test.byteclass.MyTest1();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: aload_0
         5: iconst_1
         6: putfield      #2                  // Field a:I
         9: return
      LineNumberTable:
        line 10: 0
        line 13: 4
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      10     0  this   Lcom/jvm/test/byteclass/MyTest1;

  public int getA();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: getfield      #2                  // Field a:I
         4: ireturn
      LineNumberTable:
        line 16: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/jvm/test/byteclass/MyTest1;

  public void setA(int);
    descriptor: (I)V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=2, args_size=2
         0: aload_0
         1: iload_1
         2: putfield      #2                  // Field a:I
         5: return
      LineNumberTable:
        line 20: 0
        line 21: 5
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       6     0  this   Lcom/jvm/test/byteclass/MyTest1;
            0       6     1     a   I
}
SourceFile: "MyTest1.java"

The hexadecimal analysis of the class file (recommended to download Hex Fiend, then class file):

 CA FE BA BE  00 00 00 34 00 18 0A 00 04 00 14 09 00 03 00 15 07 00 16 07 00 17 01 00 01 61 01 00 01 49
 01 00 06 3C 69 6E 69 74 3E 01 00 03 28 29 56 01 00 04 43 6F 64 65 01 00 0F 4C 69 6E 65 4E 75 6D 62 65 
 72 54 61 62 6C 65 01 00 12 4C 6F 63 61 6C 56 61 72 69 61 62 6C 65 54 61 62 6C 65 01 00 04 74 68 69 73 
 01 00 20 4C 63 6F 6D 2F 6A 76 6D 2F 74 65 73 74 2F 62 79 74 65 63 6C 61 73 73 2F 4D 79 54 65 73 74 31 
 3B 01 00 04 67 65 74 41 01 00 03 28 29 49 01 00 04 73 65 74 41 01 00 04 28 49 29 56 01 00 0A 53 6F 75 72 
 63 65 46 69 6C 65 01 00 0C 4D 79 54 65 73 74 31 2E 6A 61 76 61 0C 00 07 00 08 0C 00 05 00 06 01 00 1E 
 63 6F 6D 2F 6A 76 6D 2F 74 65 73 74 2F 62 79 74 65 63 6C 61 73 73 2F 4D 79 54 65 73 74 31 01 00 10 6A 
 61 76 61 2F 6C 61 6E 67 2F 4F 62 6A 65 63 74 00 21 00 03 00 04 00 00 00 01 00 02 00 05 00 06 00 00 00 
 03 00 01 00 07 00 08 00 01 00 09 00 00 00 38 00 02 00 01 00 00 00 0A 2A B7 00 01 2A 04 B5 00 02 B1 00 
 00 00 02 00 0A 00 00 00 0A 00 02 00 00 00 0A 00 04 00 0D 00 0B 00 00 00 0C 00 01 00 00 00 0A 00 0C 00 
 0D 00 00 00 01 00 0E 00 0F 00 01 00 09 00 00 00 2F 00 01 00 01 00 00 00 05 2A B4 00 02 AC 00 00 00 02 
 00 0A 00 00 00 06 00 01 00 00 00 10 00 0B 00 00 00 0C 00 01 00 00 00 05 00 0C 00 0D 00 00 00 01 00 10 
 00 11 00 01 00 09 00 00 00 3E 00 02 00 02 00 00 00 06 2A 1B B5 00 02 B1 00 00 00 02 00 0A 00 00 00 0A 
 00 02 00 00 00 14 00 05 00 15 00 0B 00 00 00 16 00 02 00 00 00 06 00 0C 00 0D 00 00 00 00 00 06 00 05 
 00 06 00 01 00 01 00 12 00 00 00 02 00 13

4. The general structure of the data types in the constant pool:

Insert picture description here


5. Important instructions for bytecode analysis:
  • The first four bytes CA FE BA BEof all class files are magic numbers and are fixed
  • The CA FE BA BElast four bytes 00 00 00 34of the magic number are the version number information, the first two bytes are the minor version number (minor version: 0) 0, the last two bytes are the major version number (major version: 52) 3 * 16 + 4 = 52 corresponds to our java version number: 1.8.0
  • Immediately after the major version number is the constant pool entry, the length of the constant pool is uncertain. A lot of information defined in a java class is maintained and described by the constant pool (accounting for a relatively large part), the constant pool can be regarded as a repository of class file resources, such as the methods and variables defined in java Information is stored in the constant pool, which mainly stores two types of constants: literals and symbol references. Literals such as text strings, constant values ​​declared as final in Java, etc., and symbol references such as fully qualified names of classes and interfaces, field names and descriptors, method names and descriptors, etc.
  • The overall structure of the constant pool: the constant pool corresponding to the java class is mainly composed of the number of constant pools and the constant pool array (constant table). The number of constant pools immediately follows the major version number and occupies 2 bytes. The constant pool array immediately follows the length of the constant pool. The difference between the constant pool array and the general array is that the different element types and structures in the constant pool array are different 但是每一种元素的第一个数据都是u1类型. This byte is a flag bit and occupies 1 byte. When the JVM parses the constant pool, it will obtain the specific type of the element according to this u1 type.
    Note: The number of elements in the constant pool array = constant pool number -1 (where 0 is temporarily not used), the purpose is to meet 不引用任何一个常量池the meaning of certain constant pool index value data that needs to be expressed under specific circumstances , the fundamental reason is that index 0 It is also a constant (JVM reserved constant, but it is not in the constant table, this constant corresponds to the null value), so the index of the constant pool starts from 1 instead of 0.
  • In the JVM, each variable / field has descriptive information, the main role is to describe the field's data type, method parameter list (including number, type, and order) and return value. According to the data description rules, the basic data type and the void type representing no return value are represented by an uppercase character, and the object type is represented by L plus the fully qualified name of an object. In order to compress the volume of bytecode, the basic data type JVM only uses a capital letter to represent, for example: B-byte, C-char, D-double, F-float, I-int, J-long, S- short, Z-boolean, V-void; L corresponds to the object type such as: Ljava / lang / String;
  • For array types, each dimension [is represented by a leading one , such as int[]being recorded as [I, String[][]being recorded as [[L/java/lang/String;
  • When describing a method with a descriptor, describe it in the order of the parameter list first and then the return value. The parameter list is placed in a group () in the strict order of the parameters, as described in the method String getRealNameByIdAndName(int id,String name): (I, Ljava / lang / String) Ljava / lang / String.

6. Constant pool bytecode analysis:

Insert picture description here
Insert picture description here

  • CA FE BA BE The magic number is fixed

  • 00 00 Major version number corresponds to the descriptor:minor version: 0

  • 00 34 version number corresponding descriptor: major version: 52

  • Length constant pool 0018 described -> 16 + 8-1 = 23 constant pool of 23 descriptors, 0 vacant, in order to meet certain data point constant pool behind the index value required for expression in a particular case 不引用任何一个常量池项目the meaning of this In this case, the index value can be set to 0 to indicate. By the constant pool descriptor:

       #1 = Methodref          #4.#20         // java/lang/Object."<init>":()V 
                              .
                              .
                              .
       #23 = Utf8               java/lang/Object
    

    It can be seen that the length of the constant pool is 23, and the index starts from # 1.

  • The first constant description of the constant pool 0A 00 04 00 14 means: 0A-> flag bit: 10 (the general structure of the data type in the constant pool) query correspondence CONSTANT_Methodref_info, 00 04-> (point to the index of the class descriptor CONSTANT_Class_info of the declared method Item) The index is # 4, 00 14-> (index item pointing to the method name and type descriptor CONSTANT_NameAndType_info) The index value is # 20, the corresponding description of the byte code is : #1 = Methodref #4.#20 // java/lang/Object."<init>":()V, reference # 4 ( #4 = Class #23 // java/lang/Object, reference # 23 ( #23 = Utf8 java/lang/Object)), # 20 ( #20 = NameAndType #7:#8 // "<init>":()V), # 20 applies # 7 ( #7 = Utf8 <init>), # 8 (# 8 = Utf8 () V). So the final overall description is:java/lang/Object."<init>":()V

  • The second description of the constant pool is 09 00 03 00 15, where 09 queries the structure table of the data types in the constant pool CONSTANT_Fieldref_info, and the last two bytes 00 03 represent the index item # 3 pointing to the class or interface descriptor CONSTANT_Class_info of the declared field, 00 15 points to the index item # 21 of the field name and type descriptor CONSTANT_NameAndType_info, namely:#2 = Fieldref #3.#21 // com/jvm/test/byteclass/MyTest1.a:I

  • Subsequent bytecode analysis is based on the correspondence between the bytecode and the table below.


7. Class or interface modifier ACCESS_FLAG

Logo name Flag value Sign meaning Targeted object
ACC_PUBLIC 0x0001 public type All types
ACC_PRIVATE 0x000 private type All types
ACC_FINAL 0x0010 final type class
ACC_SUPER 0x0020 Use new invokespecial semantics Classes and interfaces
ACC_INTERFACE 0x0200 Interface Type interface
ACC_ABSTRACT 0x0400 Abstract type Classes and interfaces
ACC_SYNTHETIC 0x1000 This class is not generated by user code All types
ACC_ANNOTATION 0x2000 Annotation type annotation
ACC_ENUM 0x4000 Enumeration type enumerate

Description:
ACC_SUPER: invokespecial is a bytecode instruction used to call a method. Generally, this bytecode instruction is used when calling the constructor or using the super keyword to display the method of calling the parent class. This is the origin of the name ACC_SUPER. Prior to Java 1.2, invokespecial calls to methods were statically bound, and the ACC_SUPER flag was added to the class file at Java 1.2, which added dynamic binding to the invokespecial instruction.

Example bytecode:
ACC_FLAG: The last two bytes of the constant pool represent the class or interface modifier, such as 0021 in the hexadecimal code in the example code, which means that the corresponding 0x0020 + 0x0001 in the above table is ACC_SUPER and ACC_PUBLIC, which is It is a PUBLIC and can call the method of the parent class. The decompiled flags: ACC_PUBLIC, ACC_SUPER
bytecode in the example corresponds to the icon:

Insert picture description here

Correspondence of decompilation information

Insert picture description here


8. Class name and parent class name description

  • This ClassName: The 2 bytes after ACC_FLAG represent the name of the current class, ie 0003 in the sample binary code, which means # 3 in the constant pool.#3 = Class #22 // com/jvm/test/byteclass/MyTest1
  • Super class name: The last two bytes of This ClassName represent the name of the parent class 0004 means # 4 of the corresponding constant pool, namely:#4 = Class #23 // java/lang/Object

Bytecode icon:
Insert picture description here

Insert picture description here


9. Number of interfaces and description of interface information

Interface number (2 bytes) + interface information (n bytes)

  • Bytecode graphInsert picture description here
  • Bytecode analysis The
    number of interfaces in the bytecode is 2 bytes. 0000 means that the number of interfaces is 0. The interface is not implemented, so there are no more bytes to implement the interface.

10. Member variable information description

  • field The number of member variables (2 bytes) + member variable information (n bytes), the byte code indicates the number of member variables is 01 means that the member variable has a int afield that we define

  • Structure of the field table:

    Types of name Explanation Quantity
    u2 name_index Name index 1
    u2 access_flags Access modifier 1
    u2 name_index Name index 1
    u2 descriptor_index Descriptor index 1
    u2 attributes_count Number of attribute tables 1
    attribute_info attributes attributes_count n

    Insert picture description here
    Insert picture description here

  • Field table bytecode icon
    Insert picture description here

  • Variable table bytecode analysis:
    0001 indicates that the number of member variables in the member table is 1
    0002 indicates that the method description of the member variable corresponds to private.
    The last two bytes are the variable name. In the example, the 11116 hexadecimal code is 0005, which corresponds to the class decompilation # 5 index #5 = Utf8 a.
    0006 corresponds to class decompilation # 6 The index is #6 = Utf8 I
    0000, which means that the number of member variable attributes is 0, and the
    next two bytes are the attributes_count attribute number. In the example, the corresponding value is 0000, which means the attribute number is 0, which means that the attribute value information is not presence.


11. Method information description

Method table structure:

name Types of Quantity
access_flags u2 1
name_index u2 1
descriptor_index u2 1
attributes_count u2 1
attributes attribute_info attributes_count

Insert picture description here
Insert picture description here
attribute structure:

name Types of Quantity
attributes_name_index u2 1
attributes_name_length u4 1
descriptor_index u2 1
attributes_count u2 1
attribute_info u1 n

The structure of the Code attribute table is as follows:

Types of name Quantity meaning
u2 attribute_name_index 1 Attribute name index
u4 attribute_length 1 Attribute length
u2 max_stack 1 The maximum value of the operand stack depth
u2 max_locals 1 Survival space required by the local variable table
u4 code_length 1 Bytecode instruction length
u1 code code_length Store bytecode instructions
u2 exception_table_length 1 Abnormal table length
exception_info exception_table exception_length Exception table
u2 attributes_count 1 Attribute collection counter
attribute_info attributes attributes_count Attribute collection

The role of Code_attribute is to save the structure of the method. The following figure is a schematic diagram of Code_attribute:
Insert picture description here

  • attribute_length: indicates the number of bytes included in the attribute, excluding the attribute_name_index and attribute_length fields
  • max_stack:表示这个方法运行的任何时刻所能达到的操作数栈的最大深度
  • max_locals:表示方法执行期间创建的局部变量的容量,包含用来表示传入的参数的局部变量
  • code_length:表示该方法所包含的字节数以及具体的指令码
  • exception_table: 存放的是处理异常的信息,每个exception_table表项由start_pc,end_pc,hander_pc,catch_type组成。
  • start_pc和end_pc表示在code数组中从start_pc到end_pc(包含start_pc,不包含end_pc)的指令抛出的异常会由这个表项来处理
  • hander_pc表示处理异常的代码的开始处。catch_type表示会被处理的异常的类型,它执行常量池里的一个异常类。当catch_type为0时,表示处理所有的异常。

字节码示例构造器方法:
Insert picture description here
其中我们对code_length下的字节码指令详细分析,这边推荐一个idea插件jclasslib可以详细的看到字节码信息及其对应关系:
2A B7 00 01 2A 04 B5 00 02 B1 ( 指令)
Insert picture description here
​构造方法下的code指令为如图所示:
我们可以看出

  • 2A->aload_0,可以参考官方说明 访问地址:aload_0 = 42 (0x2a)
  • B7->invokespecial 调用父类的相应的构造方法,可以参考官方说明 访问地址:invokespecial = 183 (0xb7)
  • 0001->为B7的参数,对应常量池索引#1的内容:#1 = Methodref #4.#20 // java/lang/Object."<init>":()V
  • 2A->aload_0
  • 04->iconst_1 可以参考官方说明 访问地址:iconst_1 = 4 (0x4)
  • B5->putfield,为相应的字段赋值,可以参考官方说明 访问地址:putfield = 181 (0xb5)
  • 0002->为B5的参数,指向常量池#2,即:#2 = Fieldref #3.#21 // com/jvm/test/byteclass/MyTest1.a:I
  • B1->return 可以参考官方说明 访问地址:return = 177 (0xb1)

大致流程分析如下:
Insert picture description here
LineNumberTable:
Insert picture description here
对应是个字节长度10个字节:00 02 00 00 00 0A 00 04 00 0D

  • 0002 表示存在两对映射,即0000 000A 为一对,00 04 00 0D为一对
  • 0000 000A 表示 0000映射到0003,字节码的偏移量为0,映射到字节码的行号为10
  • 00 04 00 0D 表示偏移量为4,映射到字节码行号为13

LocalVariableTable:
Insert picture description here
对应12个字节:00 01 00 00 00 0A 00 0C 00 0D 00 00

  • 0001 表示局部变量的个数 1 编译器隐式的传入当前对象This
  • 0000-000A 局部变量的开始到结束为止0-10
  • 000C 对应常量池#12 this 当前对象
  • 000D 对局部变量的描述对应常量池#13
  • 0000 StackMyTable从1.6加入,校验检查的。

其他两个方法getA和setA雷同构造器方法的分析方式。

Published 41 original articles · Liked 14 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/Yunwei_Zheng/article/details/104595572