Manually parse java bytecode files

Preface

The java source file we usually write, that is, the .java file will become a .class file recognized by the jvm after being compiled, that is, compiled into a bytecode file, the jvm execution engine currently has two execution methods, word Section code interpretation and execution and template interpretation and execution. Our usual bytecode files must be interpreted by jvm (c++) into hard code that the computer can recognize, that is, assembly; while the template interpreter is directly interpreted and executed without C++ code. Instead, the template interpreter is directly interpreted into hard codes that the computer can recognize, and these hard codes interpreted by the template interpreter are also hot codes. The hot codes have a hot code cache area, which is also a part of jvm tuning, but jvm has The default hot code cache size, if you don’t know too well, it’s best not to adjust the size of this value; the notes behind the interpreter of the execution engine will be recorded again. This note mainly records how we parse the bytecode manually File, and then understand what kind of form the bytecode file is stored.

The original appearance of the bytecode file

The bytecode file is the .class file generated by the .java file through the javac command. We have compiled the java class in a compiler such as Eclipse or Idea, but the compilation tool will automatically compile us into a .class file, which is a bytecode file;
Look at the original appearance of the bytecode file. For
example, I have a java file here, as follows:

public class ByteCode {
    
    


    private static int count = 1;

    public static void main(String[] args) {
    
    
        System.out.println(count);
    }
}

Then we compiled the bytecode file for us through javac or development tools under windows as follows:

// class version 52.0 (52)
// access flags 0x21
public class com/bml/jvm/ByteCode {
    
    

  // compiled from: ByteCode.java

  // access flags 0xA
  private static I count

  // access flags 0x1
  public <init>()V
   L0
    LINENUMBER 3 L0
    ALOAD 0
    INVOKESPECIAL java/lang/Object.<init> ()V
    RETURN
   L1
    LOCALVARIABLE this Lcom/bml/jvm/ByteCode; L0 L1 0
    MAXSTACK = 1
    MAXLOCALS = 1

  // access flags 0x9
  public static main([Ljava/lang/String;)V
   L0
    LINENUMBER 9 L0
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    GETSTATIC com/bml/jvm/ByteCode.count : I
    INVOKEVIRTUAL java/io/PrintStream.println (I)V
   L1
    LINENUMBER 10 L1
    RETURN
   L2
    LOCALVARIABLE args [Ljava/lang/String; L0 L2 0
    MAXSTACK = 2
    MAXLOCALS = 1

  // access flags 0x8
  static <clinit>()V
   L0
    LINENUMBER 6 L0
    ICONST_1
    PUTSTATIC com/bml/jvm/ByteCode.count : I
    RETURN
    MAXSTACK = 1
    MAXLOCALS = 0
}

Pay attention to the first two lines of bytecode file above
// class version 52.0 (52)
// access flags 0x21

This means that the version number of the java class we wrote is 52, which version of jdk is 52? Let's look at a picture:
Insert picture description here

So the major version of our class is 52, which is the version of jdk1.8
and access flags 0x21 refers to the access rights of our class. Our class ByteCode must be public, because the java class file is already on it, so access_flags 0x21 is the public when marking our ByteCode access rights, so what does 0x21 mean, let's look at the following picture:
Insert picture description here
So 0x21 represents 0x0001?

See what private static int I, what does I mean? I means our int type, see the figure below for details
Insert picture description here
byte ->B
char->C
double->D

Ljava/lang.String like this above is the parameter of the String type on behalf of the original, if it is an array, it is [ Ljava/lang/String; For
example, I write a method descriptor:
([[Ljava/lang/String;, I, Ljava/bml/Test;)Ljava/lang/String;

String XXX(String[][] strArrs, int a, Test test)

Let's look at the bytecode hexadecimal file again
Insert picture description here

Manually parse bytecode files

Let's take a look at the bytecode file contains those parts, how do we parse it:
Insert picture description here
Look at the structure from top to bottom, U2 U4 on the left represents how many bytes are in the structure, we all know that 1 byte replaces 2 bits;
So the first four bytes of the bytecode file u4=cafebabe (little-endian mode), and the computer will usually switch to big-endian mode during transmission.
If it is big-endian mode, bebafdca

Generally, when JVM is parsing the class bytecode file, it will first determine whether the magic number starts with cafebabe during verification. If not, it is an illegal class file.

In the above figure, "!" means that it is uncertain and the length is uncertain. It is like the number of interfaces implemented. Our class may not have an interface. If it is parsed, then this domain will not exist.
Let’s analyze it once:
magic number (u4 ): cafebabe
minor version (u2): 0000 (minor version number is 0)
major version (major version number u2): 0034 (decimal 52)
constant pool size (u2): 0024 (decimal 36)
Let’s look at this constant pool Size, you can see it with jclasslib in idea
Insert picture description here

Constant pool analysis

Constant pool list:
Insert picture description here
starting from 01 to 35, the number of real constant pools = the number of constant pools in the bytecode file-1 The
constant pool must be resolved with the help of a table, the constant pool rule table:
Insert picture description here
our analysis Several constant pools, 36 are too many
Rules:
1. Look at the tag of each item in the above table is u1, then it means that our subsequent analysis takes 1 byte each time
2. After 1 byte is obtained , Look at the value, and then find the corresponding constant
3. After finding the corresponding constant, parse the constant pool in turn
. The first byte after 0024 in the above table is: 0a The
first constant:
tag(u1) :0a (0a decimal is 10, corresponding to Constant_Methodref_info in the constant pool structure table)
index(u2):0006 (is an index, 6 means it points to the sixth, generally not #6 in the bytecode)
name_and_type (u2):0017 (decimal 23, 23 means that it points to the 23rd, generally expressed in the bytecode #23
Let's verify it:
look at the screenshot: when
Insert picture description here
we parse, we can use the visualization generated by the idea and our own manual correspondence resolved to know there is no resolve to
type in the idea, if we click on # 6 or # 23 will jump directly to the corresponding up, # 6 # 23 is represented by reference.

The second constant:
tag(u1): 09 (9 represents Constant_Fieldref_info)
class_info(u2): 0018 (decimal 24, which means pointing to #24)
name_and_type_info(u2): 0019 (decimal 25, which means pointing to #25)
Let’s come again Verify it:
Insert picture description here

Completely correct, of course, the manual parsing here is just to understand the file structure of the class. In the real situation, the program is written for parsing, such as bytecode technology asm

access flag: 0021 (public, as mentioned above) as shown in the figure:
Insert picture description here
this_class(u2):0005 (decimal is 5, which is #5) See the screenshot of idea as follows:
Insert picture description here
super_class:(u2):0006 (see the picture above for a clear view)
interfacce_count(u2): 0000 (indicating that the interface is 0)
interfaces[]: Because interface_count is 0, this field will not exist
fields_count(u2): 00 01 (indicating that there is a field attribute)

Parse field attributes

Next is to parse our specific field attributes. The parsing rules for field attributes are as follows:
Insert picture description here
fields_1 (the first attribute):
Insert picture description here

access_flags(u2):00 0a (10 in decimal, private static),
see the following figure:
name_index(u2):00 07
descriptor_index(u2):00 08
attributes_count(u2):0000 (representing the number of attributes ConstantValue is 0)
attributes : If the number of attributes=0, this area will not appear in the bytecode file.

Method analysis

Before method analysis, let’s look at our method information:
Insert picture description here
From the above figure, we can see that there are three methods in our class ByteCode, namely, main,
init: bytecode is the construction method generated by our java class
client: when our The clinit method will be generated when there are static attributes or static code blocks in the
class. Because there are static attributes in our ByteCode class, there are three generated methods, the default construction method, the man method and the static initialization method.
Next is the number of methods
methods_count (u2): 00 03 (representing that we have 3 methods)
here to analyze the method, I demonstrate the analysis of our method
Insert picture description here
. The above figure is the rule of method analysis. The
first method init
access_flags: 00 01 (public)
name_index: 00 09 (reference to The 9th constant in the constant pool)
desc_index: 00 0A (referenced to the 10th constant in the constant pool)
attr_count: 00 01 (has an attribute)
attrs (if attribute=0, there is no such area in the bytecode file):
Attribute content analysis rules:
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
attribute_name_index: 00 0B (referenced to the 11th constant in the constant pool)
attribute_length: 00 00 00 2F (decimal 47, which means the length is 47)
max_stack: 00 01
max_locals: 00 01
code_length: 00 00 05
codecode_length : 2a b7 00 01 b1 (take the number of bytes according to the length)
exception_length: 00 00 (no declared exception)
attribute_count: 00 02 (representing our method has two attributes, namely the local variable table and the LineNumberTable)
attribute[attribute_count] :
Code attribute
Insert picture description here
attr_name_index: 00 0C (reference constant pool 12) LineNumberTable
att_length: 00 00 00 06 (length is 6)
line_number_length: 00 01 (with a LineNumber table)
[
start_pc: 00 00
line_number: 00 03 (specific code The number of lines is 3)
]
According to the partial LineNumberTable of ByteCode as shown in the idea:
Insert picture description here
LineNumberTable is why the jvm accurately defines the number of wrong lines of our code.

          attr_name_index:00 0D(引用常量池13)LocalVariableTable 方法的局部变量表
          attr_len:00 00 00 0C(变量表长度为12)
          table_length:00 01(有一个局部变量表)
               [
                  start_pc: 00 00
                  length: 00 05(长度为5)
                  name_index: 00 0E(引用的是常量表的14)
                  des_index: 00  0F(引用的是常量表的15)
                  index: 00 00
               ]

According to the local variable table of ByteCode as shown in idea:
Insert picture description here
the manual analysis corresponding to the above is clear at a glance

There is also the analysis of the last attribute, which is very simple. I will not analyze it here. The rules are as follows:
Insert picture description here
Finally, if we view the bytecode of our class file on the computer, if we view the binary, we directly open the class file with ue
Yes ; if it is to view the bytecode, you can also execute it in the directory where the class is located:
javap -verbose ByteCode.class

Last modified 2020-8-7; size 602 bytes
  MD5 checksum 367be1125b0a815faf617b48e78c0d20
  Compiled from "ByteCode.java"
public class com.bml.jvm.ByteCode
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#23         // java/lang/Object."<init>":()V
   #2 = Fieldref           #24.#25        // java/lang/System.out:Ljava/io/PrintStream;
   #3 = Fieldref           #5.#26         // com/bml/jvm/ByteCode.count:I
   #4 = Methodref          #27.#28        // java/io/PrintStream.println:(I)V
   #5 = Class              #29            // com/bml/jvm/ByteCode
   #6 = Class              #30            // java/lang/Object
   #7 = Utf8               count
   #8 = Utf8               I
   #9 = Utf8               <init>
  #10 = Utf8               ()V
  #11 = Utf8               Code
  #12 = Utf8               LineNumberTable
  #13 = Utf8               LocalVariableTable
  #14 = Utf8               this
  #15 = Utf8               Lcom/bml/jvm/ByteCode;
  #16 = Utf8               main
  #17 = Utf8               ([Ljava/lang/String;)V
  #18 = Utf8               args
  #19 = Utf8               [Ljava/lang/String;
  #20 = Utf8               <clinit>
  #21 = Utf8               SourceFile
  #22 = Utf8               ByteCode.java
  #23 = NameAndType        #9:#10         // "<init>":()V
  #24 = Class              #31            // java/lang/System
  #25 = NameAndType        #32:#33        // out:Ljava/io/PrintStream;
  #26 = NameAndType        #7:#8          // count:I
  #27 = Class              #34            // java/io/PrintStream
  #28 = NameAndType        #35:#36        // println:(I)V
  #29 = Utf8               com/bml/jvm/ByteCode
  #30 = Utf8               java/lang/Object
  #31 = Utf8               java/lang/System
  #32 = Utf8               out
  #33 = Utf8               Ljava/io/PrintStream;
  #34 = Utf8               java/io/PrintStream
  #35 = Utf8               println
  #36 = Utf8               (I)V
{
    
    
  public com.bml.jvm.ByteCode();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/bml/jvm/ByteCode;

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: getstatic     #3                  // Field count:I
         6: invokevirtual #4                  // Method java/io/PrintStream.println:(I)V
         9: return
      LineNumberTable:
        line 9: 0
        line 10: 9
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      10     0  args   [Ljava/lang/String;

  static {
    
    };
    descriptor: ()V
    flags: ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: iconst_1
         1: putstatic     #3                  // Field count:I
         4: return
      LineNumberTable:
        line 6: 0
}


Concluding remarks

The above is a process of manual parsing of almost a class. Of course, there are still many that have not been written, such as how to parse the inherited ones, how to parse the interfaces, in fact, the same principle, manual parsing is too troublesome, and it is intended Understand this process of parsing.
If you really want to parse class bytecode files, then we must write our own applications and analyze them according to the rules to complete our needs. Here we just analyze the bytecode structure and underlying principles. , According to this thought and idea to analyze our bytecode files; in fact, if you don’t understand bytecode files, it won’t affect your work or your code writing, but the reserve of knowledge is not to enrich your life and make you wonderful Life? Don't always stick to that bit of knowledge to "gnaw the old", learn and make progress, not only to survive, but also to make our lives more exciting.
In fact, understanding our JVM is also very useful for everyone, why? Because the language running on jvm is not just java now, many languages ​​are compiled by themselves and then run on jvm, so jvm is relatively eternal, java may be temporary, so we understand some of the underlying bottom There is no loss of knowledge points, and some spare time is wasted. The source files of
java
groovy
kotlin
Scala are compiled by the Scala compiler to generate .class files.
As long as the class files that conform to the jvm specification can be run on the jvm, not only
Insert picture description here
is the above picture of java running on the jvm in different languages, so learn well jvm is really necessary

Guess you like

Origin blog.csdn.net/scjava/article/details/108277662